Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
20works
0followers
19topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

20 published item(s)

preprint2022arXiv

A Convergent and Dimension-Independent Min-Max Optimization Algorithm

We study a variant of a recently introduced min-max optimization framework where the max-player is constrained to update its parameters in a greedy manner until it reaches a first-order stationary point. Our equilibrium definition for this framework depends on a proposal distribution which the min-player uses to choose directions in which to update its parameters. We show that, given a smooth and bounded nonconvex-nonconcave objective function, access to any proposal distribution for the min-player's updates, and stochastic gradient oracle for the max-player, our algorithm converges to the aforementioned approximate local equilibrium in a number of iterations that does not depend on the dimension. The equilibrium point found by our algorithm depends on the proposal distribution, and when applying our algorithm to train GANs we choose the proposal distribution to be a distribution of stochastic gradients. We empirically evaluate our algorithm on challenging nonconvex-nonconcave test-functions and loss functions arising in GAN training. Our algorithm converges on these test functions and, when used to train GANs, trains stably on synthetic and real-world datasets and avoids mode collapse

preprint2022arXiv

Fairness for AUC via Feature Augmentation

We study fairness in the context of classification where the performance is measured by the area under the curve (AUC) of the receiver operating characteristic. AUC is commonly used to measure the performance of prediction models. The same classifier can have significantly varying AUCs for different protected groups and, in real-world applications, it is often desirable to reduce such cross-group differences. We address the problem of how to acquire additional features to most greatly improve AUC for the disadvantaged group. We develop a novel approach, fairAUC, based on feature augmentation (adding features) to mitigate bias between identifiable groups. The approach requires only a few summary statistics to offer provable guarantees on AUC improvement, and allows managers flexibility in determining where in the fairness-accuracy tradeoff they would like to be. We evaluate fairAUC on synthetic and real-world datasets and find that it significantly improves AUC for the disadvantaged group relative to benchmarks maximizing overall AUC and minimizing bias between groups.

preprint2022arXiv

Private Matrix Approximation and Geometry of Unitary Orbits

Consider the following optimization problem: Given $n \times n$ matrices $A$ and $Λ$, maximize $\langle A, UΛU^*\rangle$ where $U$ varies over the unitary group $\mathrm{U}(n)$. This problem seeks to approximate $A$ by a matrix whose spectrum is the same as $Λ$ and, by setting $Λ$ to be appropriate diagonal matrices, one can recover matrix approximation problems such as PCA and rank-$k$ approximation. We study the problem of designing differentially private algorithms for this optimization problem in settings where the matrix $A$ is constructed using users' private data. We give efficient and private algorithms that come with upper and lower bounds on the approximation error. Our results unify and improve upon several prior works on private matrix approximation problems. They rely on extensions of packing/covering number bounds for Grassmannians to unitary orbits which should be of independent interest.

preprint2022arXiv

Selection in the Presence of Implicit Bias: The Advantage of Intersectional Constraints

In selection processes such as hiring, promotion, and college admissions, implicit bias toward socially-salient attributes such as race, gender, or sexual orientation of candidates is known to produce persistent inequality and reduce aggregate utility for the decision maker. Interventions such as the Rooney Rule and its generalizations, which require the decision maker to select at least a specified number of individuals from each affected group, have been proposed to mitigate the adverse effects of implicit bias in selection. Recent works have established that such lower-bound constraints can be very effective in improving aggregate utility in the case when each individual belongs to at most one affected group. However, in several settings, individuals may belong to multiple affected groups and, consequently, face more extreme implicit bias due to this intersectionality. We consider independently drawn utilities and show that, in the intersectional case, the aforementioned non-intersectional constraints can only recover part of the total utility achievable in the absence of implicit bias. On the other hand, we show that if one includes appropriate lower-bound constraints on the intersections, almost all the utility achievable in the absence of implicit bias can be recovered. Thus, intersectional constraints can offer a significant advantage over a reductionist dimension-by-dimension non-intersectional approach to reducing inequality.

preprint2021arXiv

Fair Classification with Noisy Protected Attributes: A Framework with Provable Guarantees

We present an optimization framework for learning a fair classifier in the presence of noisy perturbations in the protected attributes. Compared to prior work, our framework can be employed with a very general class of linear and linear-fractional fairness constraints, can handle multiple, non-binary protected attributes, and outputs a classifier that comes with provable guarantees on both accuracy and fairness. Empirically, we show that our framework can be used to attain either statistical rate or false positive rate fairness guarantees with a minimal loss in accuracy, even when the noise is large, in two real-world datasets.

preprint2020arXiv

Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees

Developing classification algorithms that are fair with respect to sensitive attributes of the data has become an important problem due to the growing deployment of classification algorithms in various social contexts. Several recent works have focused on fairness with respect to a specific metric, modeled the corresponding fair classification problem as a constrained optimization problem, and developed tailored algorithms to solve them. Despite this, there still remain important metrics for which we do not have fair classifiers and many of the aforementioned algorithms do not come with theoretical guarantees; perhaps because the resulting optimization problem is non-convex. The main contribution of this paper is a new meta-algorithm for classification that takes as input a large class of fairness constraints, with respect to multiple non-disjoint sensitive attributes, and which comes with provable guarantees. This is achieved by first developing a meta-algorithm for a large family of classification problems with convex constraints, and then showing that classification problems with general types of fairness constraints can be reduced to those in this family. We present empirical results that show that our algorithm can achieve near-perfect fairness with respect to various fairness metrics, and that the loss in accuracy due to the imposed fairness constraints is often small. Overall, this work unifies several prior works on fair classification, presents a practical algorithm with theoretical guarantees, and can handle fairness metrics that were previously not possible.

preprint2020arXiv

Coresets for Clustering in Euclidean Spaces: Importance Sampling is Nearly Optimal

Given a collection of $n$ points in $\mathbb{R}^d$, the goal of the $(k,z)$-clustering problem is to find a subset of $k$ "centers" that minimizes the sum of the $z$-th powers of the Euclidean distance of each point to the closest center. Special cases of the $(k,z)$-clustering problem include the $k$-median and $k$-means problems. Our main result is a unified two-stage importance sampling framework that constructs an $\varepsilon$-coreset for the $(k,z)$-clustering problem. Compared to the results for $(k,z)$-clustering in [Feldman and Langberg, STOC 2011], our framework saves a $\varepsilon^2 d$ factor in the coreset size. Compared to the results for $(k,z)$-clustering in [Sohler and Woodruff, FOCS 2018], our framework saves a $\operatorname{poly}(k)$ factor in the coreset size and avoids the $\exp(k/\varepsilon)$ term in the construction time. Specifically, our coreset for $k$-median ($z=1$) has size $\tilde{O}(\varepsilon^{-4} k)$ which, when compared to the result in [Sohler and Woodruff, STOC 2018], saves a $k$ factor in the coreset size. Our algorithmic results rely on a new dimensionality reduction technique that connects two well-known shape fitting problems: subspace approximation and clustering, and may be of independent interest. We also provide a size lower bound of $Ω\left(k\cdot \min \left\{2^{z/20},d \right\}\right)$ for a $0.01$-coreset for $(k,z)$-clustering, which has a linear dependence of size on $k$ and an exponential dependence on $z$ that matches our algorithmic results.

preprint2020arXiv

Data preprocessing to mitigate bias: A maximum entropy based approach

Data containing human or social attributes may over- or under-represent groups with respect to salient social attributes such as gender or race, which can lead to biases in downstream applications. This paper presents an algorithmic framework that can be used as a data preprocessing method towards mitigating such bias. Unlike prior work, it can efficiently learn distributions over large domains, controllably adjust the representation rates of protected groups and achieve target fairness metrics such as statistical parity, yet remains close to the empirical distribution induced by the given dataset. Our approach leverages the principle of maximum entropy - amongst all distributions satisfying a given set of constraints, we should choose the one closest in KL-divergence to a given prior. While maximum entropy distributions can succinctly encode distributions over large domains, they can be difficult to compute. Our main contribution is an instantiation of this framework for our set of constraints and priors, which encode our bias mitigation goals, and that runs in time polynomial in the dimension of the data. Empirically, we observe that samples from the learned distribution have desired representation rates and statistical rates, and when used for training a classifier incurs only a slight loss in accuracy while maintaining fairness properties.

preprint2020arXiv

Interventions for Ranking in the Presence of Implicit Bias

Implicit bias is the unconscious attribution of particular qualities (or lack thereof) to a member from a particular social group (e.g., defined by gender or race). Studies on implicit bias have shown that these unconscious stereotypes can have adverse outcomes in various social contexts, such as job screening, teaching, or policing. Recently, (Kleinberg and Raghavan, 2018) considered a mathematical model for implicit bias and showed the effectiveness of the Rooney Rule as a constraint to improve the utility of the outcome for certain cases of the subset selection problem. Here we study the problem of designing interventions for the generalization of subset selection -- ranking -- that requires to output an ordered set and is a central primitive in various social and computational contexts. We present a family of simple and interpretable constraints and show that they can optimally mitigate implicit bias for a generalization of the model studied in (Kleinberg and Raghavan, 2018). Subsequently, we prove that under natural distributional assumptions on the utilities of items, simple, Rooney Rule-like, constraints can also surprisingly recover almost all the utility lost due to implicit biases. Finally, we augment our theoretical results with empirical findings on real-world distributions from the IIT-JEE (2009) dataset and the Semantic Scholar Research corpus.

preprint2020arXiv

On the computability of continuous maximum entropy distributions with applications

We initiate a study of the following problem: Given a continuous domain $Ω$ along with its convex hull $\mathcal{K}$, a point $A \in \mathcal{K}$ and a prior measure $μ$ on $Ω$, find the probability density over $Ω$ whose marginal is $A$ and that minimizes the KL-divergence to $μ$. This framework gives rise to several extremal distributions that arise in mathematics, quantum mechanics, statistics, and theoretical computer science. Our technical contributions include a polynomial bound on the norm of the optimizer of the dual problem that holds in a very general setting and relies on a "balance" property of the measure $μ$ on $Ω$, and exact algorithms for evaluating the dual and its gradient for several interesting settings of $Ω$ and $μ$. Together, along with the ellipsoid method, these results imply polynomial-time algorithms to compute such KL-divergence minimizing distributions in several cases. Applications of our results include: 1) an optimization characterization of the Goemans-Williamson measure that is used to round a positive semidefinite matrix to a vector, 2) the computability of the entropic barrier for polytopes studied by Bubeck and Eldan, and 3) a polynomial-time algorithm to compute the barycentric quantum entropy of a density matrix that was proposed as an alternative to von Neumann entropy in the 1970s: this corresponds to the case when $Ω$ is the set of rank one projections matrices and $μ$ corresponds to the Haar measure on the unit sphere. Our techniques generalize to the setting of Hermitian rank $k$ projections using the Harish-Chandra-Itzykson-Zuber formula, and are applicable even beyond, to adjoint orbits of compact Lie groups.

preprint2020arXiv

Stable and Fair Classification

Fair classification has been a topic of intense study in machine learning, and several algorithms have been proposed towards this important task. However, in a recent study, Friedler et al. observed that fair classification algorithms may not be stable with respect to variations in the training dataset -- a crucial consideration in several real-world applications. Motivated by their work, we study the problem of designing classification algorithms that are both fair and stable. We propose an extended framework based on fair classification algorithms that are formulated as optimization problems, by introducing a stability-focused regularization term. Theoretically, we prove a stability guarantee, that was lacking in fair classification algorithms, and also provide an accuracy guarantee for our extended framework. Our accuracy guarantee can be used to inform the selection of the regularization parameter in our framework. To the best of our knowledge, this is the first work that combines stability and fairness in automated decision-making tasks. We assess the benefits of our approach empirically by extending several fair classification algorithms that are shown to achieve the best balance between fairness and accuracy over the Adult dataset. Our empirical results show that our framework indeed improves the stability at only a slight sacrifice in accuracy.

preprint2014arXiv

A quadratically tight partition bound for classical communication complexity and query complexity

In this work we introduce, both for classical communication complexity and query complexity, a modification of the 'partition bound' introduced by Jain and Klauck [2010]. We call it the 'public-coin partition bound'. We show that (the logarithm to the base two of) its communication complexity and query complexity versions form, for all relations, a quadratically tight lower bound on the public-coin randomized communication complexity and randomized query complexity respectively.

preprint2013arXiv

Entropy, Optimization and Counting

In this paper we study the problem of computing max-entropy distributions over a discrete set of objects subject to observed marginals. Interest in such distributions arises due to their applicability in areas such as statistical physics, economics, biology, information theory, machine learning, combinatorics and, more recently, approximation algorithms. A key difficulty in computing max-entropy distributions has been to show that they have polynomially-sized descriptions. We show that such descriptions exist under general conditions. Subsequently, we show how algorithms for (approximately) counting the underlying discrete set can be translated into efficient algorithms to (approximately) compute max-entropy distributions. In the reverse direction, we show how access to algorithms that compute max-entropy distributions can be used to count, which establishes an equivalence between counting and computing max-entropy distributions.

preprint2013arXiv

The Unique Games Conjecture, Integrality Gap for Cut Problems and Embeddability of Negative Type Metrics into $\ell_1$

In this paper, we disprove a conjecture of Goemans and Linial; namely, that every negative type metric embeds into $\ell_1$ with constant distortion. We show that for an arbitrarily small constant $δ> 0$, for all large enough $n$, there is an $n$-point negative type metric which requires distortion at least $(\log\log n)^{1/6-δ}$ to embed into $\ell_1.$ Surprisingly, our construction is inspired by the Unique Games Conjecture (UGC) of Khot, establishing a previously unsuspected connection between probabilistically checkable proof systems (PCPs) and the theory of metric embeddings. We first prove that the UGC implies a super-constant hardness result for the (non-uniform) Sparsest Cut problem. Though this hardness result relies on the UGC, we demonstrate, nevertheless, that the corresponding PCP reduction can be used to construct an "integrality gap instance" for Sparsest Cut. Towards this, we first construct an integrality gap instance for a natural SDP relaxation of Unique Games. Then we "simulate" the PCP reduction and "translate" the integrality gap instance of Unique Games to an integrality gap instance of Sparsest Cut. This enables us to prove a $(\log \log n)^{1/6-δ}$ integrality gap for Sparsest Cut, which is known to be equivalent to the metric embedding lower bound.

preprint2012arXiv

A Finite Population Model of Molecular Evolution: Theory and Computation

This paper is concerned with the evolution of haploid organisms that reproduce asexually. In a seminal piece of work, Eigen and coauthors proposed the quasispecies model in an attempt to understand such an evolutionary process. Their work has impacted antiviral treatment and vaccine design strategies. Yet, predictions of the quasispecies model are at best viewed as a guideline, primarily because it assumes an infinite population size, whereas realistic population sizes can be quite small. In this paper we consider a population genetics-based model aimed at understanding the evolution of such organisms with finite population sizes and present a rigorous study of the convergence and computational issues that arise therein. Our first result is structural and shows that, at any time during the evolution, as the population size tends to infinity, the distribution of genomes predicted by our model converges to that predicted by the quasispecies model. This justifies the continued use of the quasispecies model to derive guidelines for intervention. While the stationary state in the quasispecies model is readily obtained, due to the explosion of the state space in our model, exact computations are prohibitive. Our second set of results are computational in nature and address this issue. We derive conditions on the parameters of evolution under which our stochastic model mixes rapidly. Further, for a class of widely used fitness landscapes we give a fast deterministic algorithm which computes the stationary distribution of our model. These computational tools are expected to serve as a framework for the modeling of strategies for the deployment of mutagenic drugs.

preprint2011arXiv

$2^{\log^{1-\eps} n}$ Hardness for Closest Vector Problem with Preprocessing

We prove that for an arbitrarily small constant $\eps>0,$ assuming NP$\not \subseteq$DTIME$(2^{{\log^{O(1/\eps)} n}})$, the preprocessing versions of the closest vector problem and the nearest codeword problem are hard to approximate within a factor better than $2^{\log ^{1-\eps}n}.$ This improves upon the previous hardness factor of $(\log n)^δ$ for some $δ> 0$ due to \cite{AKKV05}.

preprint2011arXiv

A Local Spectral Method for Graphs: with Applications to Improving Graph Partitions and Exploring Data Graphs Locally

The second eigenvalue of the Laplacian matrix and its associated eigenvector are fundamental features of an undirected graph, and as such they have found widespread use in scientific computing, machine learning, and data analysis. In many applications, however, graphs that arise have several \emph{local} regions of interest, and the second eigenvector will typically fail to provide information fine-tuned to each local region. In this paper, we introduce a locally-biased analogue of the second eigenvector, and we demonstrate its usefulness at highlighting local properties of data graphs in a semi-supervised manner. To do so, we first view the second eigenvector as the solution to a constrained optimization problem, and we incorporate the local information as an additional constraint; we then characterize the optimal solution to this new problem and show that it can be interpreted as a generalization of a Personalized PageRank vector; and finally, as a consequence, we show that the solution can be computed in nearly-linear time. In addition, we show that this locally-biased vector can be used to compute an approximation to the best partition \emph{near} an input seed set in a manner analogous to the way in which the second eigenvector of the Laplacian can be used to obtain an approximation to the best partition in the entire input graph. Such a primitive is useful for identifying and refining clusters locally, as it allows us to focus on a local region of interest in a semi-supervised manner. Finally, we provide a detailed empirical evaluation of our method by showing how it can applied to finding locally-biased sparse cuts around an input vertex seed set in social and information networks.

preprint2011arXiv

Approximating the Exponential, the Lanczos Method and an \tilde{O}(m)-Time Spectral Algorithm for Balanced Separator

We give a novel spectral approximation algorithm for the balanced separator problem that, given a graph G, a constant balance b \in (0,1/2], and a parameter γ, either finds an Ω(b)-balanced cut of conductance O(\sqrt(γ)) in G, or outputs a certificate that all b-balanced cuts in G have conductance at least γ, and runs in time \tilde{O}(m). This settles the question of designing asymptotically optimal spectral algorithms for balanced separator. Our algorithm relies on a variant of the heat kernel random walk and requires, as a subroutine, an algorithm to compute \exp(-L)v where L is the Laplacian of a graph related to G and v is a vector. Algorithms for computing the matrix-exponential-vector product efficiently comprise our next set of results. Our main result here is a new algorithm which computes a good approximation to \exp(-A)v for a class of PSD matrices A and a given vector u, in time roughly \tilde{O}(m_A), where m_A is the number of non-zero entries of A. This uses, in a non-trivial way, the result of Spielman and Teng on inverting SDD matrices in \tilde{O}(m_A) time. Finally, we prove e^{-x} can be uniformly approximated up to a small additive error, in a non-negative interval [a,b] with a polynomial of degree roughly \sqrt{b-a}. While this result is of independent interest in approximation theory, we show that, via the Lanczos method from numerical analysis, it yields a simple algorithm to compute \exp(-A)v for PSD matrices that runs in time roughly O(t_A \sqrt{||A||}), where t_A is the time required for computation of the vector Aw for given vector w. As an application, we obtain a simple and practical algorithm, with output conductance O(\sqrt(γ)), for balanced separator that runs in time \tilde{O}(m/\sqrt(γ)). This latter algorithm matches the running time, but improves on the approximation guarantee of the algorithm by Andersen and Peres.

preprint2010arXiv

Algorithms and Hardness for Subspace Approximation

The subspace approximation problem Subspace($k$,$p$) asks for a $k$-dimensional linear subspace that fits a given set of points optimally, where the error for fitting is a generalization of the least squares fit and uses the $\ell_{p}$ norm instead. Most of the previous work on subspace approximation has focused on small or constant $k$ and $p$, using coresets and sampling techniques from computational geometry. In this paper, extending another line of work based on convex relaxation and rounding, we give a polynomial time algorithm, \emph{for any $k$ and any $p \geq 2$}, with the approximation guarantee roughly $γ_{p} \sqrt{2 - \frac{1}{n-k}}$, where $γ_{p}$ is the $p$-th moment of a standard normal random variable N(0,1). We show that the convex relaxation we use has an integrality gap (or "rank gap") of $γ_{p} (1 - ε)$, for any constant $ε> 0$. Finally, we show that assuming the Unique Games Conjecture, the subspace approximation problem is hard to approximate within a factor better than $γ_{p} (1 - ε)$, for any constant $ε> 0$.

preprint2010arXiv

Towards an SDP-based Approach to Spectral Methods: A Nearly-Linear-Time Algorithm for Graph Partitioning and Decomposition

In this paper, we consider the following graph partitioning problem: The input is an undirected graph $G=(V,E),$ a balance parameter $b \in (0,1/2]$ and a target conductance value $γ\in (0,1).$ The output is a cut which, if non-empty, is of conductance at most $O(f),$ for some function $f(G, γ),$ and which is either balanced or well correlated with all cuts of conductance at most $γ.$ Spielman and Teng gave an $\tilde{O}(|E|/γ^{2})$-time algorithm for $f= \sqrt{γ\log^{3}|V|}$ and used it to decompose graphs into a collection of near-expanders. We present a new spectral algorithm for this problem which runs in time $\tilde{O}(|E|/γ)$ for $f=\sqrtγ.$ Our result yields the first nearly-linear time algorithm for the classic Balanced Separator problem that achieves the asymptotically optimal approximation guarantee for spectral methods. Our method has the advantage of being conceptually simple and relies on a primal-dual semidefinite-programming SDP approach. We first consider a natural SDP relaxation for the Balanced Separator problem. While it is easy to obtain from this SDP a certificate of the fact that the graph has no balanced cut of conductance less than $γ,$ somewhat surprisingly, we can obtain a certificate for the stronger correlation condition. This is achieved via a novel separation oracle for our SDP and by appealing to Arora and Kale's framework to bound the running time. Our result contains technical ingredients that may be of independent interest.