Source author record

Anand Louis

Anand Louis appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Machine Learning Computational Complexity cs.CY Discrete Mathematics Artificial Intelligence Information Retrieval

Catalog footprint

What is connected

18works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

$λ_\infty$ & Maximum Variance Embedding: Measuring and Optimizing Connectivity of A Graph Metric

Bobkov, Houdré, and the last author [2000] introduced a Poincaré-type functional parameter, $λ_\infty$, of a graph and related it to connectivity of the graph via Cheeger-type inequalities. A work by the second author, Raghavendra, and Vempala [2013] related the complexity of $λ_\infty$ to the so-called small-set expansion (SSE) problem and further set forth the desiderata for NP-hardness of this optimization problem. We confirm the conjecture that computing $λ_\infty$ is NP-hard for weighted trees. Beyond measuring connectivity in many applications we want to optimize it. This, via convex duality, leads to a problem in machine learning known as the Maximum Variance Embedding (MVE). The output is a function from vertices to a low dim Euclidean space, subject to bounds on Euclidean distances between neighbors. The objective is to maximize output variance. Special cases of MVE into $n$ and $1$ dims lead to absolute algebraic connectivity [1990] and spread constant [1998], that measure connectivity of the graph and its Cartesian $n$-power, respectively. MVE has other applications in measuring diffusion speed and robustness of networks, clustering, and dimension reduction. We show that computing MVE in tree-width dims is NP-hard, while only one additional dim beyond width of a given tree-decomposition makes the problem in P. We show that MVE of a tree in 2 dims defines a non-convex yet benign optimization landscape, i.e., local=global optima. We further develop a linear time combinatorial algorithm for this case. Finally, we denote approximate Maximum Variance Embedding is tractable in significantly lower dims. For trees and general graphs, for which Maximum Variance Embedding cannot be solved in less than $2$ and $Ω(n)$ dims, we provide $1+\varepsilon$ approximation algorithms for embedding into $1$ and $O(\log n /\varepsilon^2)$ dims, respectively.

preprint2022arXiv

Approximating CSPs with Outliers

Constraint satisfaction problems (CSPs) are ubiquitous in theoretical computer science. We study the problem of StrongCSPs, i.e. instances where a large induced sub-instance has a satisfying assignment. More formally, given a CSP instance $Ψ(V, E, [k], \{Π_{ij}\}_{(i,j) \in E})$ consisting of a set of vertices $V$, a set of edges $E$, alphabet $[k]$, a constraint $Π_{ij} \subset [k] \times [k]$ for each $(i,j) \in E$, the goal of this problem is to compute the largest subset $S \subseteq V$ such that the instance induced on $S$ has an assignment that satisfies all the constraints. In this paper, we study approximation algorithms for Unique Games and related problems under the StrongCSP framework when the underlying constraint graph satisfies mild expansion properties. In particular, we show that given a Strong Unique Games instance whose optimal solution $S^*$ is supported on a regular low threshold rank graph, there exists an algorithm that runs in time exponential in the threshold rank, and recovers a large satisfiable sub-instance whose size is independent on the label set size and maximum degree of the graph. Our algorithm combines the techniques of Barak-Raghavendra-Steurer (FOCS'11), Guruswami-Sinop (FOCS'11) with several new ideas and runs in time exponential in the threshold rank of the optimal set. A key component of our algorithm is a new threshold rank based spectral decomposition, which is used to compute a "large" induced subgraph of "small" threshold rank; our techniques build on the work of Oveis Gharan and Rezaei (SODA'17) and could be of independent interest.

preprint2022arXiv

Exact recovery algorithm for Planted Bipartite Graph in Semi-random Graphs

The problem of finding the largest induced balanced bipartite subgraph in a given graph is NP-hard. This problem is closely related to the problem of finding the smallest Odd Cycle Transversal. In this work, we consider the following model of instances: starting with a set of vertices $V$, a set $S \subseteq V$ of $k$ vertices is chosen and an arbitrary $d$-regular bipartite graph is added on it; edges between pairs of vertices in $S \times (V \setminus S)$ and $(V \setminus S) \times (V \setminus S)$ are added with probability $p$. Since for $d=0$, the problem reduces to recovering a planted independent set, we don't expect efficient algorithms for $k=o(\sqrt{n})$. This problem is a generalization of the planted balanced biclique problem where the bipartite graph induced on $S$ is a complete bipartite graph; [Lev18] gave an algorithm for recovering $S$ in this problem when $k=Ω(\sqrt{n})$. Our main result is an efficient algorithm that recovers (w.h.p.) the planted bipartite graph when $k=Ω_p(\sqrt{n \log n})$ for a large range of parameters. Our results also hold for a natural semi-random model of instances, which involve the presence of a monotone adversary. Our proof shows that a natural SDP relaxation for the problem is integral by constructing an appropriate solution to it's dual formulation. Our main technical contribution is a new approach for constructing the dual solution where we calibrate the eigenvectors of the adjacency matrix to be the eigenvectors of the dual matrix. We believe that this approach may have applications to other recovery problems in semi-random models as well. When $k=Ω(\sqrt{n})$, we give an algorithm for recovering $S$ whose running time is exponential in the number of small eigenvalues in graph induced on $S$; this algorithm is based on subspace enumeration techniques due to the works of [KT07,ABS10,Kol11].

preprint2022arXiv

Socially Fair Center-based and Linear Subspace Clustering

Center-based clustering (e.g., $k$-means, $k$-medians) and clustering using linear subspaces are two most popular techniques to partition real-world data into smaller clusters. However, when the data consists of sensitive demographic groups, significantly different clustering cost per point for different sensitive groups can lead to fairness-related harms (e.g., different quality-of-service). The goal of socially fair clustering is to minimize the maximum cost of clustering per point over all groups. In this work, we propose a unified framework to solve socially fair center-based clustering and linear subspace clustering, and give practical, efficient approximation algorithms for these problems. We do extensive experiments to show that on multiple benchmark datasets our algorithms either closely match or outperform state-of-the-art baselines.

preprint2021arXiv

Group Fairness for Knapsack Problems

We study the knapsack problem with group fairness constraints. The input of the problem consists of a knapsack of bounded capacity and a set of items, each item belongs to a particular category and has and associated weight and value. The goal of this problem is to select a subset of items such that all categories are fairly represented, the total weight of the selected items does not exceed the capacity of the knapsack,and the total value is maximized. We study the fairness parameters such as the bounds on the total value of items from each category, the total weight of items from each category, and the total number of items from each category. We give approximation algorithms for these problems. These fairness notions could also be extended to the min-knapsack problem. The fair knapsack problems encompass various important problems, such as participatory budgeting, fair budget allocation, advertising.

preprint2021arXiv

On the Problem of Underranking in Group-Fair Ranking

Search and recommendation systems, such as search engines, recruiting tools, online marketplaces, news, and social media, output ranked lists of content, products, and sometimes, people. Credit ratings, standardized tests, risk assessments output only a score, but are also used implicitly for ranking. Bias in such ranking systems, especially among the top ranks, can worsen social and economic inequalities, polarize opinions, and reinforce stereotypes. On the other hand, a bias correction for minority groups can cause more harm if perceived as favoring group-fair outcomes over meritocracy. In this paper, we formulate the problem of underranking in group-fair rankings, which was not addressed in previous work. Most group-fair ranking algorithms post-process a given ranking and output a group-fair ranking. We define underranking based on how close the group-fair rank of each item is to its original rank, and prove a lower bound on the trade-off achievable for simultaneous underranking and group fairness in ranking. We give a fair ranking algorithm that takes any given ranking and outputs another ranking with simultaneous underranking and group fairness guarantees comparable to the lower bound we prove. Our algorithm works with group fairness constraints for any number of groups. Our experimental results confirm the theoretical trade-off between underranking and group fairness, and also show that our algorithm achieves the best of both when compared to the state-of-the-art baselines.

preprint2020arXiv

Approximation Algorithms and Hardness for Strong Unique Games

The UNIQUE GAMES problem is a central problem in algorithms and complexity theory. Given an instance of UNIQUE GAMES, the STRONG UNIQUE GAMES problem asks to find the largest subset of vertices, such that the UNIQUE GAMES instance induced on them is completely satisfiable. In this paper, we give new algorithmic and hardness results for STRONG UNIQUE GAMES. Given an instance with label set size $k$ where a set of $(1 - ε)$-fraction of the vertices induce an instance that is completely satisfiable, our first algorithm produces a set of $1 - \widetilde{O}({k^2}) ε\sqrt{\log n}$ fraction of the vertices such that the UNIQUE GAMES induced on them is completely satisfiable. In the same setting, our second algorithm produces a set of $1 - \widetilde{O}({k^2}) \sqrt{ε\log d}$ (here $d$ is the largest vertex degree of the graph) fraction of the vertices such that the UNIQUE GAMES induced on them is completely satisfiable. The technical core of our results is a new connection between STRONG UNIQUE GAMES and Small-Set-Vertex-Expansion in graphs. Complementing this, assuming the Unique Games Conjecture, we prove that it is NP-hard to compute a set of size larger than $1 - Ω( \sqrt{ε\log k \log d})$ for which all the constraints induced on this set are satisfied. Given an undirected graph $G(V,E)$ the ODD CYCLE TRANSVERSAL problem asks to delete the least fraction of vertices to make the induced graph on the remaining vertices bipartite. As a corollary to our main algorithmic results, we obtain an algorithm that outputs a set $S$ such the graph induced on $V \setminus S$ is bipartite, and $|S|/n \leq O(\sqrt{ε\log d})$ (here $d$ is the largest vertex degree and $ε$ is the optimal fraction of vertices that need to be deleted). Assuming the Unique Games Conjecture, we prove a matching (up to constant factors) hardness.

preprint2020arXiv

Robust Identifiability in Linear Structural Equation Models of Causal Inference

In this work, we consider the problem of robust parameter estimation from observational data in the context of linear structural equation models (LSEMs). LSEMs are a popular and well-studied class of models for inferring causality in the natural and social sciences. One of the main problems related to LSEMs is to recover the model parameters from the observational data. Under various conditions on LSEMs and the model parameters the prior work provides efficient algorithms to recover the parameters. However, these results are often about generic identifiability. In practice, generic identifiability is not sufficient and we need robust identifiability: small changes in the observational data should not affect the parameters by a huge amount. Robust identifiability has received far less attention and remains poorly understood. Sankararaman et al. (2019) recently provided a set of sufficient conditions on parameters under which robust identifiability is feasible. However, a limitation of their work is that their results only apply to a small sub-class of LSEMs, called ``bow-free paths.'' In this work, we significantly extend their work along multiple dimensions. First, for a large and well-studied class of LSEMs, namely ``bow free'' models, we provide a sufficient condition on model parameters under which robust identifiability holds, thereby removing the restriction of paths required by prior work. We then show that this sufficient condition holds with high probability which implies that for a large set of parameters robust identifiability holds and that for such parameters, existing algorithms already achieve robust identifiability. Finally, we validate our results on both simulated and real-world datasets.

preprint2020arXiv

Stability of Linear Structural Equation Models of Causal Inference

We consider the numerical stability of the parameter recovery problem in Linear Structural Equation Model ($\LSEM$) of causal inference. A long line of work starting from Wright (1920) has focused on understanding which sub-classes of $\LSEM$ allow for efficient parameter recovery. Despite decades of study, this question is not yet fully resolved. The goal of this paper is complementary to this line of work; we want to understand the stability of the recovery problem in the cases when efficient recovery is possible. Numerical stability of Pearl's notion of causality was first studied in Schulman and Srivastava (2016) using the concept of condition number where they provide ill-conditioned examples. In this work, we provide a condition number analysis for the $\LSEM$. First we prove that under a sufficient condition, for a certain sub-class of $\LSEM$ that are \emph{bow-free} (Brito and Pearl (2002)), the parameter recovery is stable. We further prove that \emph{randomly} chosen input parameters for this family satisfy the condition with a substantial probability. Hence for this family, on a large subset of parameter space, recovery is numerically stable. Next we construct an example of $\LSEM$ on four vertices with \emph{unbounded} condition number. We then corroborate our theoretical findings via simulations as well as real-world experiments for a sociology application. Finally, we provide a general heuristic for estimating the condition number of any $\LSEM$ instance.

preprint2016arXiv

Accelerated Newton Iteration: Roots of Black Box Polynomials and Matrix Eigenvalues

We study the problem of computing the largest root of a real rooted polynomial $p(x)$ to within error $\varepsilon $ given only black box access to it, i.e., for any $x \in {\mathbb R}$, the algorithm can query an oracle for the value of $p(x)$, but the algorithm is not allowed access to the coefficients of $p(x)$. A folklore result for this problem is that the largest root of a polynomial can be computed in $O(n \log (1/\varepsilon ))$ polynomial queries using the Newton iteration. We give a simple algorithm that queries the oracle at only $O(\log n \log(1/\varepsilon ))$ points, where $n$ is the degree of the polynomial. Our algorithm is based on a novel approach for accelerating the Newton method by using higher derivatives. As a special case, we consider the problem of computing the top eigenvalue of a symmetric matrix in ${\mathbb Q}^{n \times n}$ to within error $\varepsilon $ in time polynomial in the input description, i.e., the number of bits to describe the matrix and $\log(1/\varepsilon )$. Well-known methods such as the power iteration and Lanczos iteration incur running time polynomial in $1/\varepsilon $, while Gaussian elimination takes $Ω(n^4)$ bit operations. As a corollary of our main result, we obtain a $\tilde{O}(n^ω \log^2 ( ||A||_F/\varepsilon ))$ bit complexity algorithm to compute the top eigenvalue of the matrix $A$ or to check if it is approximately PSD ($A \succeq -\varepsilon I$).

preprint2016arXiv

Spectral Properties of Hypergraph Laplacian and Approximation Algorithms

The celebrated Cheeger's Inequality establishes a bound on the edge expansion of a graph via its spectrum. This inequality is central to a rich spectral theory of graphs, based on studying the eigenvalues and eigenvectors of the adjacency matrix (and other related matrices) of graphs. It has remained open to define a suitable spectral model for hypergraphs whose spectra can be used to estimate various combinatorial properties of the hypergraph. In this paper we introduce a new hypergraph Laplacian operator generalizing the Laplacian matrix of graphs. In particular, the operator is induced by a diffusion process on the hypergraph, such that within each hyperedge, measure flows from vertices having maximum weighted measure to those having minimum. Since the operator is non-linear, we have to exploit other properties of the diffusion process to recover a spectral property concerning the "second eigenvalue" of the resulting Laplacian. Moreover, we show that higher order spectral properties cannot hold in general using the current framework. We consider a stochastic diffusion process, in which each vertex also experiences Brownian noise from outside the system. We show a relationship between the second eigenvalue and the convergence behavior of the process. We show that various hypergraph parameters like multi-way expansion and diameter can be bounded using this operator's spectral properties. Since higher order spectral properties do not hold for the Laplacian operator, we instead use the concept of procedural minimizers to consider higher order Cheeger-like inequalities. For any positive integer $k$, we give a polynomial time algorithm to compute an $O(\log r)$-approximation to the $k$-th procedural minimizer, where $r$ is the maximum cardinality of a hyperedge. We show that this approximation factor is optimal under the SSE hypothesis for constant values of $k$.

preprint2014arXiv

Approximation Algorithms for Hypergraph Small Set Expansion and Small Set Vertex Expansion

The expansion of a hypergraph, a natural extension of the notion of expansion in graphs, is defined as the minimum over all cuts in the hypergraph of the ratio of the number of the hyperedges cut to the size of the smaller side of the cut. We study the Hypergraph Small Set Expansion problem, which, for a parameter $δ\in (0,1/2]$, asks to compute the cut having the least expansion while having at most $δ$ fraction of the vertices on the smaller side of the cut. We present two algorithms. Our first algorithm gives an $\tilde O(δ^{-1} \sqrt{\log n})$ approximation. The second algorithm finds a set with expansion $\tilde O(δ^{-1}(\sqrt{d_{\text{max}}r^{-1}\log r\, ϕ^*} + ϕ^*))$ in a $r$--uniform hypergraph with maximum degree $d_{\text{max}}$ (where $ϕ^*$ is the expansion of the optimal solution). Using these results, we also obtain algorithms for the Small Set Vertex Expansion problem: we get an $\tilde O(δ^{-1} \sqrt{\log n})$ approximation algorithm and an algorithm that finds a set with vertex expansion $O\left(δ^{-1}\sqrt{ϕ^V \log d_{\text{max}} } + δ^{-1} ϕ^V\right)$ (where $ϕ^V$ is the vertex expansion of the optimal solution). For $δ=1/2$, Hypergraph Small Set Expansion is equivalent to the hypergraph expansion problem. In this case, our approximation factor of $O(\sqrt{\log n})$ for expansion in hypergraphs matches the corresponding approximation factor for expansion in graphs due to ARV.

preprint2014arXiv

Hypergraph Markov Operators, Eigenvalues and Approximation Algorithms

The celebrated Cheeger's Inequality \cite{am85,a86} establishes a bound on the expansion of a graph via its spectrum. This inequality is central to a rich spectral theory of graphs, based on studying the eigenvalues and eigenvectors of the adjacency matrix (and other related matrices) of graphs. It has remained open to define a suitable spectral model for hypergraphs whose spectra can be used to estimate various combinatorial properties of the hypergraph. In this paper we introduce a new hypergraph Laplacian operator (generalizing the Laplacian matrix of graphs)and study its spectra. We prove a Cheeger-type inequality for hypergraphs, relating the second smallest eigenvalue of this operator to the expansion of the hypergraph. We bound other hypergraph expansion parameters via higher eigenvalues of this operator. We give bounds on the diameter of the hypergraph as a function of the second smallest eigenvalue of the Laplacian operator. The Markov process underlying the Laplacian operator can be viewed as a dispersion process on the vertices of the hypergraph that might be of independent interest. We bound the {\em Mixing-time} of this process as a function of the second smallest eigenvalue of the Laplacian operator. All these results are generalizations of the corresponding results for graphs. We show that there can be no linear operator for hypergraphs whose spectra captures hypergraph expansion in a Cheeger-like manner. For any $k$, we give a polynomial time algorithm to compute an approximation to the $k^{th}$ smallest eigenvalue of the operator. We show that this approximation factor is optimal under the SSE hypothesis (introduced by \cite{rs10}) for constant values of $k$. Finally, using the factor preserving reduction from vertex expansion in graphs to hypergraph expansion, we show that all our results for hypergraphs extend to vertex expansion in graphs.

preprint2013arXiv

Approximation Algorithm for Sparsest k-Partitioning

Given a graph $G$, the sparsest-cut problem asks to find the set of vertices $S$ which has the least expansion defined as $$ϕ_G(S) := \frac{w(E(S,\bar{S}))}{\min \set{w(S), w(\bar{S})}}, $$ where $w$ is the total edge weight of a subset. Here we study the natural generalization of this problem: given an integer $k$, compute a $k$-partition $\set{P_1, \ldots, P_k}$ of the vertex set so as to minimize $$ ϕ_k(\set{P_1, \ldots, P_k}) := \max_i ϕ_G(P_i). $$ Our main result is a polynomial time bi-criteria approximation algorithm which outputs a $(1 - \e)k$-partition of the vertex set such that each piece has expansion at most $O_{\varepsilon}(\sqrt{\log n \log k})$ times $OPT$. We also study balanced versions of this problem.

preprint2013arXiv

The Complexity of Approximating Vertex Expansion

We study the complexity of approximating the vertex expansion of graphs $G = (V,E)$, defined as \[ Φ^V := \min_{S \subset V} n \cdot \frac{|N(S)|}{|S| |V \backslash S|}. \] We give a simple polynomial-time algorithm for finding a subset with vertex expansion $O(\sqrt{OPT \log d})$ where $d$ is the maximum degree of the graph. Our main result is an asymptotically matching lower bound: under the Small Set Expansion (SSE) hypothesis, it is hard to find a subset with expansion less than $C\sqrt{OPT \log d}$ for an absolute constant $C$. In particular, this implies for all constant $ε> 0$, it is SSE-hard to distinguish whether the vertex expansion $< ε$ or at least an absolute constant. The analogous threshold for edge expansion is $\sqrt{OPT}$ with no dependence on the degree; thus our results suggest that vertex expansion is harder to approximate than edge expansion. In particular, while Cheeger's algorithm can certify constant edge expansion, it is SSE-hard to certify constant vertex expansion in graphs. Our proof is via a reduction from the {\it Unique Games} instance obtained from the \SSE hypothesis to the vertex expansion problem. It involves the definition of a smoother intermediate problem we call {\sf Analytic Vertex Expansion} which is representative of both the vertex expansion and the conductance of the graph. Both reductions (from the UGC instance to this problem and from this problem to vertex expansion) use novel proof ideas.

preprint2011arXiv

Many Sparse Cuts via Higher Eigenvalues

Cheeger's fundamental inequality states that any edge-weighted graph has a vertex subset $S$ such that its expansion (a.k.a. conductance) is bounded as follows: \[ ϕ(S) \defeq \frac{w(S,\bar{S})}{\min \set{w(S), w(\bar{S})}} \leq 2\sqrt{λ_2} \] where $w$ is the total edge weight of a subset or a cut and $λ_2$ is the second smallest eigenvalue of the normalized Laplacian of the graph. Here we prove the following natural generalization: for any integer $k \in [n]$, there exist $ck$ disjoint subsets $S_1, ..., S_{ck}$, such that \[ \max_i ϕ(S_i) \leq C \sqrt{λ_{k} \log k} \] where $λ_i$ is the $i^{th}$ smallest eigenvalue of the normalized Laplacian and $c<1,C>0$ are suitable absolute constants. Our proof is via a polynomial-time algorithm to find such subsets, consisting of a spectral projection and a randomized rounding. As a consequence, we get the same upper bound for the small set expansion problem, namely for any $k$, there is a subset $S$ whose weight is at most a $\bigO(1/k)$ fraction of the total weight and $ϕ(S) \le C \sqrt{λ_k \log k}$. Both results are the best possible up to constant factors. The underlying algorithmic problem, namely finding $k$ subsets such that the maximum expansion is minimized, besides extending sparse cuts to more than one subset, appears to be a natural clustering problem in its own right.

preprint2010arXiv

Cut-Matching Games on Directed Graphs

We give O(log^2 n)-approximation algorithm based on the cut-matching framework of [10, 13, 14] for computing the sparsest cut on directed graphs. Our algorithm uses only O(log^2 n) single commodity max-flow computations and thus breaks the multicommodity-flow barrier for computing the sparsest cut on directed graphs

preprint2009arXiv

Improved Algorithm for Degree Bounded Survivable Network Design Problem

We consider the Degree-Bounded Survivable Network Design Problem: the objective is to find a minimum cost subgraph satisfying the given connectivity requirements as well as the degree bounds on the vertices. If we denote the upper bound on the degree of a vertex v by b(v), then we present an algorithm that finds a solution whose cost is at most twice the cost of the optimal solution while the degree of a degree constrained vertex v is at most 2b(v) + 2. This improves upon the results of Lau and Singh and that of Lau, Naor, Salavatipour and Singh.

Anand Louis

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

$λ_\infty$ & Maximum Variance Embedding: Measuring and Optimizing Connectivity of A Graph Metric

Approximating CSPs with Outliers

Exact recovery algorithm for Planted Bipartite Graph in Semi-random Graphs

Socially Fair Center-based and Linear Subspace Clustering

Group Fairness for Knapsack Problems

On the Problem of Underranking in Group-Fair Ranking

Approximation Algorithms and Hardness for Strong Unique Games

Robust Identifiability in Linear Structural Equation Models of Causal Inference

Stability of Linear Structural Equation Models of Causal Inference

Accelerated Newton Iteration: Roots of Black Box Polynomials and Matrix Eigenvalues

Spectral Properties of Hypergraph Laplacian and Approximation Algorithms

Approximation Algorithms for Hypergraph Small Set Expansion and Small Set Vertex Expansion

Hypergraph Markov Operators, Eigenvalues and Approximation Algorithms

Approximation Algorithm for Sparsest k-Partitioning

The Complexity of Approximating Vertex Expansion

Many Sparse Cuts via Higher Eigenvalues

Cut-Matching Games on Directed Graphs

Improved Algorithm for Degree Bounded Survivable Network Design Problem