Source author record

Amit Deshpande

Amit Deshpande appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Data Structures and Algorithms Computational Geometry Computer Vision cs.CY Computational Complexity Cryptography and Security Information Retrieval math.PR math.ST Statistics Theory

Catalog footprint

What is connected

13works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

One-pass additive-error subset selection for $\ell_{p}$ subspace approximation

We consider the problem of subset selection for $\ell_{p}$ subspace approximation, that is, to efficiently find a \emph{small} subset of data points such that solving the problem optimally for this subset gives a good approximation to solving the problem optimally for the original input. Previously known subset selection algorithms based on volume sampling and adaptive sampling \cite{DeshpandeV07}, for the general case of $p \in [1, \infty)$, require multiple passes over the data. In this paper, we give a one-pass subset selection with an additive approximation guarantee for $\ell_{p}$ subspace approximation, for any $p \in [1, \infty)$. Earlier subset selection algorithms that give a one-pass multiplicative $(1+ε)$ approximation work under the special cases. Cohen \textit{et al.} \cite{CohenMM17} gives a one-pass subset section that offers multiplicative $(1+ε)$ approximation guarantee for the special case of $\ell_{2}$ subspace approximation. Mahabadi \textit{et al.} \cite{MahabadiRWZ20} gives a one-pass \emph{noisy} subset selection with $(1+ε)$ approximation guarantee for $\ell_{p}$ subspace approximation when $p \in \{1, 2\}$. Our subset selection algorithm gives a weaker, additive approximation guarantee, but it works for any $p \in [1, \infty)$.

preprint2022arXiv

Socially Fair Center-based and Linear Subspace Clustering

Center-based clustering (e.g., $k$-means, $k$-medians) and clustering using linear subspaces are two most popular techniques to partition real-world data into smaller clusters. However, when the data consists of sensitive demographic groups, significantly different clustering cost per point for different sensitive groups can lead to fairness-related harms (e.g., different quality-of-service). The goal of socially fair clustering is to minimize the maximum cost of clustering per point over all groups. In this work, we propose a unified framework to solve socially fair center-based clustering and linear subspace clustering, and give practical, efficient approximation algorithms for these problems. We do extensive experiments to show that on multiple benchmark datasets our algorithms either closely match or outperform state-of-the-art baselines.

preprint2021arXiv

On the Problem of Underranking in Group-Fair Ranking

Search and recommendation systems, such as search engines, recruiting tools, online marketplaces, news, and social media, output ranked lists of content, products, and sometimes, people. Credit ratings, standardized tests, risk assessments output only a score, but are also used implicitly for ranking. Bias in such ranking systems, especially among the top ranks, can worsen social and economic inequalities, polarize opinions, and reinforce stereotypes. On the other hand, a bias correction for minority groups can cause more harm if perceived as favoring group-fair outcomes over meritocracy. In this paper, we formulate the problem of underranking in group-fair rankings, which was not addressed in previous work. Most group-fair ranking algorithms post-process a given ranking and output a group-fair ranking. We define underranking based on how close the group-fair rank of each item is to its original rank, and prove a lower bound on the trade-off achievable for simultaneous underranking and group fairness in ranking. We give a fair ranking algorithm that takes any given ranking and outputs another ranking with simultaneous underranking and group fairness guarantees comparable to the lower bound we prove. Our algorithm works with group fairness constraints for any number of groups. Our experimental results confirm the theoretical trade-off between underranking and group fairness, and also show that our algorithm achieves the best of both when compared to the state-of-the-art baselines.

preprint2020arXiv

How do SGD hyperparameters in natural training affect adversarial robustness?

Learning rate, batch size and momentum are three important hyperparameters in the SGD algorithm. It is known from the work of Jastrzebski et al. arXiv:1711.04623 that large batch size training of neural networks yields models which do not generalize well. Yao et al. arXiv:1802.08241 observe that large batch training yields models that have poor adversarial robustness. In the same paper, the authors train models with different batch sizes and compute the eigenvalues of the Hessian of loss function. They observe that as the batch size increases, the dominant eigenvalues of the Hessian become larger. They also show that both adversarial training and small-batch training leads to a drop in the dominant eigenvalues of the Hessian or lowering its spectrum. They combine adversarial training and second order information to come up with a new large-batch training algorithm and obtain robust models with good generalization. In this paper, we empirically observe the effect of the SGD hyperparameters on the accuracy and adversarial robustness of networks trained with unperturbed samples. Jastrzebski et al. considered training models with a fixed learning rate to batch size ratio. They observed that higher the ratio, better is the generalization. We observe that networks trained with constant learning rate to batch size ratio, as proposed in Jastrzebski et al., yield models which generalize well and also have almost constant adversarial robustness, independent of the batch size. We observe that momentum is more effective with varying batch sizes and a fixed learning rate than with constant learning rate to batch size ratio based SGD training.

preprint2020arXiv

On Universalized Adversarial and Invariant Perturbations

Convolutional neural networks or standard CNNs (StdCNNs) are translation-equivariant models that achieve translation invariance when trained on data augmented with sufficient translations. Recent work on equivariant models for a given group of transformations (e.g., rotations) has lead to group-equivariant convolutional neural networks (GCNNs). GCNNs trained on data augmented with sufficient rotations achieve rotation invariance. Recent work by authors arXiv:2002.11318 studies a trade-off between invariance and robustness to adversarial attacks. In another related work arXiv:2005.08632, given any model and any input-dependent attack that satisfies a certain spectral property, the authors propose a universalization technique called SVD-Universal to produce a universal adversarial perturbation by looking at very few test examples. In this paper, we study the effectiveness of SVD-Universal on GCNNs as they gain rotation invariance through higher degree of training augmentation. We empirically observe that as GCNNs gain rotation invariance through training augmented with larger rotations, the fooling rate of SVD-Universal gets better. To understand this phenomenon, we introduce universal invariant directions and study their relation to the universal adversarial direction produced by SVD-Universal.

preprint2020arXiv

Subspace approximation with outliers

The subspace approximation problem with outliers, for given $n$ points in $d$ dimensions $x_{1},\ldots, x_{n} \in R^{d}$, an integer $1 \leq k \leq d$, and an outlier parameter $0 \leq α\leq 1$, is to find a $k$-dimensional linear subspace of $R^{d}$ that minimizes the sum of squared distances to its nearest $(1-α)n$ points. More generally, the $\ell_{p}$ subspace approximation problem with outliers minimizes the sum of $p$-th powers of distances instead of the sum of squared distances. Even the case of robust PCA is non-trivial, and previous work requires additional assumptions on the input. Any multiplicative approximation algorithm for the subspace approximation problem with outliers must solve the robust subspace recovery problem, a special case in which the $(1-α)n$ inliers in the optimal solution are promised to lie exactly on a $k$-dimensional linear subspace. However, robust subspace recovery is Small Set Expansion (SSE)-hard. We show how to extend dimension reduction techniques and bi-criteria approximations based on sampling to the problem of subspace approximation with outliers. To get around the SSE-hardness of robust subspace recovery, we assume that the squared distance error of the optimal $k$-dimensional subspace summed over the optimal $(1-α)n$ inliers is at least $δ$ times its squared-error summed over all $n$ points, for some $0 < δ\leq 1 - α$. With this assumption, we give an efficient algorithm to find a subset of $poly(k/ε) \log(1/δ) \log\log(1/δ)$ points whose span contains a $k$-dimensional subspace that gives a multiplicative $(1+ε)$-approximation to the optimal solution. The running time of our algorithm is linear in $n$ and $d$. Interestingly, our results hold even when the fraction of outliers $α$ is large, as long as the obvious condition $0 < δ\leq 1 - α$ is satisfied.

preprint2016arXiv

Batched Gaussian Process Bandit Optimization via Determinantal Point Processes

Gaussian Process bandit optimization has emerged as a powerful tool for optimizing noisy black box functions. One example in machine learning is hyper-parameter optimization where each evaluation of the target function requires training a model which may involve days or even weeks of computation. Most methods for this so-called "Bayesian optimization" only allow sequential exploration of the parameter space. However, it is often desirable to propose batches or sets of parameter values to explore simultaneously, especially when there are large parallel processing facilities at our disposal. Batch methods require modeling the interaction between the different evaluations in the batch, which can be expensive in complex scenarios. In this paper, we propose a new approach for parallelizing Bayesian optimization by modeling the diversity of a batch via Determinantal point processes (DPPs) whose kernels are learned automatically. This allows us to generalize a previous result as well as prove better regret bounds based on DPP sampling. Our experiments on a variety of synthetic and real-world robotics and hyper-parameter optimization tasks indicate that our DPP-based methods, especially those based on DPP sampling, outperform state-of-the-art methods.

preprint2016arXiv

How to be Fair and Diverse?

Due to the recent cases of algorithmic bias in data-driven decision-making, machine learning methods are being put under the microscope in order to understand the root cause of these biases and how to correct them. Here, we consider a basic algorithmic task that is central in machine learning: subsampling from a large data set. Subsamples are used both as an end-goal in data summarization (where fairness could either be a legal, political or moral requirement) and to train algorithms (where biases in the samples are often a source of bias in the resulting model). Consequently, there is a growing effort to modify either the subsampling methods or the algorithms themselves in order to ensure fairness. However, in doing so, a question that seems to be overlooked is whether it is possible to produce fair subsamples that are also adequately representative of the feature space of the data set - an important and classic requirement in machine learning. Can diversity and fairness be simultaneously ensured? We start by noting that, in some applications, guaranteeing one does not necessarily guarantee the other, and a new approach is required. Subsequently, we present an algorithmic framework which allows us to produce both fair and diverse samples. Our experimental results on an image summarization task show marked improvements in fairness without compromising feature diversity by much, giving us the best of both the worlds.

preprint2016arXiv

On Sampling and Greedy MAP Inference of Constrained Determinantal Point Processes

Subset selection problems ask for a small, diverse yet representative subset of the given data. When pairwise similarities are captured by a kernel, the determinants of submatrices provide a measure of diversity or independence of items within a subset. Matroid theory gives another notion of independence, thus giving rise to optimization and sampling questions about Determinantal Point Processes (DPPs) under matroid constraints. Partition constraints, as a special case, arise naturally when incorporating additional labeling or clustering information, besides the kernel, in DPPs. Finding the maximum determinant submatrix under matroid constraints on its row/column indices has been previously studied. However, the corresponding question of sampling from DPPs under matroid constraints has been unresolved, beyond the simple cardinality constrained k-DPPs. We give the first polynomial time algorithm to sample exactly from DPPs under partition constraints, for any constant number of partitions. We complement this by a complexity theoretic barrier that rules out such a result under general matroid constraints. Our experiments indicate that partition-constrained DPPs offer more flexibility and more diversity than k-DPPs and their naive extensions, while being reasonably efficient in running time. We also show that a simple greedy initialization followed by local search gives improved approximation guarantees for the problem of MAP inference from k- DPPs on well-conditioned kernels. Our experiments show that this improvement is significant for larger values of k, supporting our theoretical result.

preprint2015arXiv

Embedding approximately low-dimensional $\ell_2^2$ metrics into $\ell_1$

Goemans showed that any $n$ points $x_1, \dotsc x_n$ in $d$-dimensions satisfying $\ell_2^2$ triangle inequalities can be embedded into $\ell_{1}$, with worst-case distortion at most $\sqrt{d}$. We extend this to the case when the points are approximately low-dimensional, albeit with average distortion guarantees. More precisely, we give an $\ell_{2}^{2}$-to-$\ell_{1}$ embedding with average distortion at most the stable rank, $\mathrm{sr}(M)$, of the matrix $M$ consisting of columns $\{x_i-x_j\}_{i<j}$. Average distortion embedding suffices for applications such as the Sparsest Cut problem. Our embedding gives an approximation algorithm for the \sparsestcut problem on low threshold-rank graphs, where earlier work was inspired by Lasserre SDP hierarchy, and improves on a previous result of the first and third author [Deshpande and Venkat, In Proc. 17th APPROX, 2014]. Our ideas give a new perspective on $\ell_{2}^{2}$ metric, an alternate proof of Goemans' theorem, and a simpler proof for average distortion $\sqrt{d}$. Furthermore, while the seminal result of Arora, Rao and Vazirani giving a $O(\sqrt{\log n})$ guarantee for Uniform Sparsest Cut can be seen to imply Goemans' theorem with average distortion, our work opens up the possibility of proving such a result directly via a Goemans'-like theorem.

preprint2014arXiv

Guruswami-Sinop Rounding without Higher Level Lasserre

Guruswami and Sinop give a $O(1/δ)$ approximation guarantee for the non-uniform Sparsest Cut problem by solving $O(r)$-level Lasserre semidefinite constraints, provided that the generalized eigenvalues of the Laplacians of the cost and demand graphs satisfy a certain spectral condition, namely, $λ_{r+1} \geq Φ^{*}/(1-δ)$. Their key idea is a rounding technique that first maps a vector-valued solution to $[0, 1]$ using appropriately scaled projections onto Lasserre vectors. In this paper, we show that similar projections and analysis can be obtained using only $\ell_{2}^{2}$ triangle inequality constraints. This results in a $O(r/δ^{2})$ approximation guarantee for the non-uniform Sparsest Cut problem by adding only $\ell_{2}^{2}$ triangle inequality constraints to the usual semidefinite program, provided that the same spectral condition, $λ_{r+1} \geq Φ^{*}/(1-δ)$, holds.

preprint2010arXiv

Algorithms and Hardness for Subspace Approximation

The subspace approximation problem Subspace($k$,$p$) asks for a $k$-dimensional linear subspace that fits a given set of points optimally, where the error for fitting is a generalization of the least squares fit and uses the $\ell_{p}$ norm instead. Most of the previous work on subspace approximation has focused on small or constant $k$ and $p$, using coresets and sampling techniques from computational geometry. In this paper, extending another line of work based on convex relaxation and rounding, we give a polynomial time algorithm, \emph{for any $k$ and any $p \geq 2$}, with the approximation guarantee roughly $γ_{p} \sqrt{2 - \frac{1}{n-k}}$, where $γ_{p}$ is the $p$-th moment of a standard normal random variable N(0,1). We show that the convex relaxation we use has an integrality gap (or "rank gap") of $γ_{p} (1 - ε)$, for any constant $ε> 0$. Finally, we show that assuming the Unique Games Conjecture, the subspace approximation problem is hard to approximate within a factor better than $γ_{p} (1 - ε)$, for any constant $ε> 0$.

preprint2010arXiv

Efficient volume sampling for row/column subset selection

We give efficient algorithms for volume sampling, i.e., for picking $k$-subsets of the rows of any given matrix with probabilities proportional to the squared volumes of the simplices defined by them and the origin (or the squared volumes of the parallelepipeds defined by these subsets of rows). This solves an open problem from the monograph on spectral algorithms by Kannan and Vempala. Our first algorithm for volume sampling $k$-subsets of rows from an $m$-by-$n$ matrix runs in $O(kmn^ω \log n)$ arithmetic operations and a second variant of it for $(1+ε)$-approximate volume sampling runs in $O(mn \log m \cdot k^{2}/ε^{2} + m \log^ω m \cdot k^{2ω+1}/ε^{2ω} \cdot \log(k ε^{-1} \log m))$ arithmetic operations, which is almost linear in the size of the input (i.e., the number of entries) for small $k$. Our efficient volume sampling algorithms imply several interesting results for low-rank matrix approximation.

Amit Deshpande

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

One-pass additive-error subset selection for $\ell_{p}$ subspace approximation

Socially Fair Center-based and Linear Subspace Clustering

On the Problem of Underranking in Group-Fair Ranking

How do SGD hyperparameters in natural training affect adversarial robustness?

On Universalized Adversarial and Invariant Perturbations

Subspace approximation with outliers

Batched Gaussian Process Bandit Optimization via Determinantal Point Processes

How to be Fair and Diverse?

On Sampling and Greedy MAP Inference of Constrained Determinantal Point Processes

Embedding approximately low-dimensional $\ell_2^2$ metrics into $\ell_1$

Guruswami-Sinop Rounding without Higher Level Lasserre

Algorithms and Hardness for Subspace Approximation

Efficient volume sampling for row/column subset selection