Source author record

Huy L. Nguyen

Huy L. Nguyen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Machine Learning Computational Geometry Information Theory math.IT Artificial Intelligence math.OC Computational Complexity Distributed, Parallel, and Cluster Computing math.PR Discrete Mathematics math.ST nlin.AO Social and Information Networks Statistics Theory

Catalog footprint

What is connected

26works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

One-Sided Matrix Completion from Ultra-Sparse Samples

Matrix completion is a classical problem that has received recurring interest across a wide range of fields. In this paper, we revisit this problem in an ultra-sparse sampling regime, where each entry of an unknown, $n\times d$ matrix $M$ (with $n \ge d$) is observed independently with probability $p = C / d$, for a fixed integer $C \ge 2$. This setting is motivated by applications involving large, sparse panel datasets, where the number of rows far exceeds the number of columns. When each row contains only $C$ entries -- fewer than the rank of $M$ -- accurate imputation of $M$ is impossible. Instead, we estimate the row span of $M$ or the averaged second-moment matrix $T = M^{\top} M / n$. The empirical second-moment matrix computed from observed entries exhibits non-random and sparse missingness. We propose an unbiased estimator that normalizes each nonzero entry of the second moment by its observed frequency, followed by gradient descent to impute the missing entries of $T$. The normalization divides a weighted sum of $n$ binomial random variables by the total number of ones. We show that the estimator is unbiased for any $p$ and enjoys low variance. When the row vectors of $M$ are drawn uniformly from a rank-$r$ factor model satisfying an incoherence condition, we prove that if $n \ge O({d r^5 ε^{-2} C^{-2} \log d})$, any local minimum of the gradient-descent objective is approximately global and recovers $T$ with error at most $ε^2$. Experiments on both synthetic and real-world data validate our approach. On three MovieLens datasets, our algorithm reduces bias by $88\%$ relative to baseline estimators. We also empirically validate the linear sampling complexity of $n$ relative to $d$ on synthetic data. On an Amazon reviews dataset with sparsity $10^{-7}$, our method reduces the recovery error of $T$ by $59\%$ and $M$ by $38\%$ compared to baseline methods.

preprint2026arXiv

Testable and Actionable Calibration for Full Swap Regret

AI generated predictions increasingly inform decision making in critical tasks, and therefore must be trustworthy. One widely used measure of trustworthiness is calibration, which requires that the predictions match the true frequencies and can be treated like real probabilities of a given outcome. However, defining calibration is subtle, and designing good measures of calibration error has been an active topic of recent research. The first goal is to find calibration measures that are actionable, meaning they can inform decision makers about their utility loss when predictions are treated as true probabilities, which is known as swap regret. The second goal is to find calibration measures that are testable, meaning that calibration error can be measured from a small sample of predictions and outcomes. Although these are very basic requirements, there is no existing calibration measure that fully satisfies both properties, and all existing measures relax actionability by bounding a weaker notion of swap regret, or relax testability by having suboptimal estimation error. We introduce a new calibration measure, Soft-Binned Calibration Decision Loss (SCDL), which we prove is fully actionable without weakening either requirement, and testable with nearly optimal error rate. In addition, SCDL satisfies other desired properties such as continuity and consistency. We also provide a set of experiments confirming that the theoretical advantages of SCDL compared to other measures lead to better performance in practice.

preprint2022arXiv

Adaptive Accelerated (Extra-)Gradient Methods with Variance Reduction

In this paper, we study the finite-sum convex optimization problem focusing on the general convex case. Recently, the study of variance reduced (VR) methods and their accelerated variants has made exciting progress. However, the step size used in the existing VR algorithms typically depends on the smoothness parameter, which is often unknown and requires tuning in practice. To address this problem, we propose two novel adaptive VR algorithms: Adaptive Variance Reduced Accelerated Extra-Gradient (AdaVRAE) and Adaptive Variance Reduced Accelerated Gradient (AdaVRAG). Our algorithms do not require knowledge of the smoothness parameter. AdaVRAE uses $\mathcal{O}\left(n\log\log n+\sqrt{\frac{nβ}ε}\right)$ gradient evaluations and AdaVRAG uses $\mathcal{O}\left(n\log\log n+\sqrt{\frac{nβ\logβ}ε}\right)$ gradient evaluations to attain an $\mathcal{O}(ε)$-suboptimal solution, where $n$ is the number of functions in the finite sum and $β$ is the smoothness parameter. This result matches the best-known convergence rate of non-adaptive VR methods and it improves upon the convergence of the state of the art adaptive VR method, AdaSVRG. We demonstrate the superior performance of our algorithms compared with previous methods in experiments on real-world datasets.

preprint2022arXiv

Fair and Useful Cohort Selection

A challenge in fair algorithm design is that, while there are compelling notions of individual fairness, these notions typically do not satisfy desirable composition properties, and downstream applications based on fair classifiers might not preserve fairness. To study fairness under composition, Dwork and Ilvento introduced an archetypal problem called fair-cohort-selection problem, where a single fair classifier is composed with itself to select a group of candidates of a given size, and proposed a solution to this problem. In this work we design algorithms for selecting cohorts that not only preserve fairness, but also maximize the utility of the selected cohort under two notions of utility that we introduce and motivate. We give optimal (or approximately optimal) polynomial-time algorithms for this problem in both an offline setting, and an online setting where candidates arrive one at a time and are classified as they arrive.

preprint2021arXiv

Adaptive Gradient Methods for Constrained Convex Optimization and Variational Inequalities

We provide new adaptive first-order methods for constrained convex optimization. Our main algorithms AdaACSA and AdaAGD+ are accelerated methods, which are universal in the sense that they achieve nearly-optimal convergence rates for both smooth and non-smooth functions, even when they only have access to stochastic gradients. In addition, they do not require any prior knowledge on how the objective function is parametrized, since they automatically adjust their per-coordinate learning rate. These can be seen as truly accelerated Adagrad methods for constrained optimization. We complement them with a simpler algorithm AdaGrad+ which enjoys the same features, and achieves the standard non-accelerated convergence rate. We also present a set of new results involving adaptive methods for unconstrained optimization and monotone operators.

preprint2020arXiv

Efficient Private Algorithms for Learning Large-Margin Halfspaces

We present new differentially private algorithms for learning a large-margin halfspace. In contrast to previous algorithms, which are based on either differentially private simulations of the statistical query model or on private convex optimization, the sample complexity of our algorithms depends only on the margin of the data, and not on the dimension. We complement our results with a lower bound, showing that the dependence of our upper bounds on the margin is optimal.

preprint2020arXiv

Optimal Streaming Algorithms for Submodular Maximization with Cardinality Constraints

We study the problem of maximizing a non-monotone submodular function subject to a cardinality constraint in the streaming model. Our main contribution is a single-pass (semi-)streaming algorithm that uses roughly $O(k / \varepsilon^2)$ memory, where $k$ is the size constraint. At the end of the stream, our algorithm post-processes its data structure using any offline algorithm for submodular maximization, and obtains a solution whose approximation guarantee is $\fracα{1+α}-\varepsilon$, where $α$ is the approximation of the offline algorithm. If we use an exact (exponential time) post-processing algorithm, this leads to $\frac{1}{2}-\varepsilon$ approximation (which is nearly optimal). If we post-process with the algorithm of Buchbinder and Feldman (Math of OR 2019), that achieves the state-of-the-art offline approximation guarantee of $α=0.385$, we obtain $0.2779$-approximation in polynomial time, improving over the previously best polynomial-time approximation of $0.1715$ due to Feldman et al. (NeurIPS 2018). It is also worth mentioning that our algorithm is combinatorial and deterministic, which is rare for an algorithm for non-monotone submodular maximization, and enjoys a fast update time of $O(\frac{\log k + \log (1/α)}{\varepsilon^2})$ per element.

preprint2016arXiv

A New Framework for Distributed Submodular Maximization

A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. A lot of recent effort has been devoted to developing distributed algorithms for these problems. However, these results suffer from high number of rounds, suboptimal approximation ratios, or both. We develop a framework for bringing existing algorithms in the sequential setting to the distributed setting, achieving near optimal approximation ratios for many settings in only a constant number of MapReduce rounds. Our techniques also give a fast sequential algorithm for non-monotone maximization subject to a matroid constraint.

preprint2016arXiv

Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality

We study the tradeoff between the statistical error and communication cost of distributed statistical estimation problems in high dimensions. In the distributed sparse Gaussian mean estimation problem, each of the $m$ machines receives $n$ data points from a $d$-dimensional Gaussian distribution with unknown mean $θ$ which is promised to be $k$-sparse. The machines communicate by message passing and aim to estimate the mean $θ$. We provide a tight (up to logarithmic factors) tradeoff between the estimation error and the number of bits communicated between the machines. This directly leads to a lower bound for the distributed \textit{sparse linear regression} problem: to achieve the statistical minimax error, the total communication is at least $Ω(\min\{n,d\}m)$, where $n$ is the number of observations that each machine receives and $d$ is the ambient dimension. These lower results improve upon [Sha14,SD'14] by allowing multi-round iterative communication model. We also give the first optimal simultaneous protocol in the dense case for mean estimation. As our main technique, we prove a \textit{distributed data processing inequality}, as a generalization of usual data processing inequalities, which might be of independent interest and useful for other problems.

preprint2016arXiv

Constrained Submodular Maximization: Beyond 1/e

In this work, we present a new algorithm for maximizing a non-monotone submodular function subject to a general constraint. Our algorithm finds an approximate fractional solution for maximizing the multilinear extension of the function over a down-closed polytope. The approximation guarantee is 0.372 and it is the first improvement over the 1/e approximation achieved by the unified Continuous Greedy algorithm [Feldman et al., FOCS 2011].

preprint2016arXiv

Heavy hitters via cluster-preserving clustering

In turnstile $\ell_p$ $\varepsilon$-heavy hitters, one maintains a high-dimensional $x\in\mathbb{R}^n$ subject to $\texttt{update}(i,Δ)$ causing $x_i\leftarrow x_i + Δ$, where $i\in[n]$, $Δ\in\mathbb{R}$. Upon receiving a query, the goal is to report a small list $L\subset[n]$, $|L| = O(1/\varepsilon^p)$, containing every "heavy hitter" $i\in[n]$ with $|x_i| \ge \varepsilon \|x_{\overline{1/\varepsilon^p}}\|_p$, where $x_{\overline{k}}$ denotes the vector obtained by zeroing out the largest $k$ entries of $x$ in magnitude. For any $p\in(0,2]$ the CountSketch solves $\ell_p$ heavy hitters using $O(\varepsilon^{-p}\log n)$ words of space with $O(\log n)$ update time, $O(n\log n)$ query time to output $L$, and whose output after any query is correct with high probability (whp) $1 - 1/poly(n)$. Unfortunately the query time is very slow. To remedy this, the work [CM05] proposed for $p=1$ in the strict turnstile model, a whp correct algorithm achieving suboptimal space $O(\varepsilon^{-1}\log^2 n)$, worse update time $O(\log^2 n)$, but much better query time $O(\varepsilon^{-1}poly(\log n))$. We show this tradeoff between space and update time versus query time is unnecessary. We provide a new algorithm, ExpanderSketch, which in the most general turnstile model achieves optimal $O(\varepsilon^{-p}\log n)$ space, $O(\log n)$ update time, and fast $O(\varepsilon^{-p}poly(\log n))$ query time, and whp correctness. Our main innovation is an efficient reduction from the heavy hitters to a clustering problem in which each heavy hitter is encoded as some form of noisy spectral cluster in a much bigger graph, and the goal is to identify every cluster. Since every heavy hitter must be found, correctness requires that every cluster be found. We then develop a "cluster-preserving clustering" algorithm, partitioning the graph into clusters without destroying any original cluster.

preprint2016arXiv

Submodular Maximization over Sliding Windows

In this paper we study the extraction of representative elements in the data stream model in the form of submodular maximization. Different from the previous work on streaming submodular maximization, we are interested only in the recent data, and study the maximization problem over sliding windows. We provide a general reduction from the sliding window model to the standard streaming model, and thus our approach works for general constraints as long as there is a corresponding streaming algorithm in the standard streaming model. As a consequence, we obtain the first algorithms in the sliding window model for maximizing a monotone/non-monotone submodular function under cardinality and matroid constraints. We also propose several heuristics and show their efficiency in real-world datasets.

preprint2015arXiv

Cutting corners cheaply, or how to remove Steiner points

Our main result is that the Steiner Point Removal (SPR) problem can always be solved with polylogarithmic distortion, which answers in the affirmative a question posed by Chan, Xia, Konjevod, and Richa (2006). Specifically, we prove that for every edge-weighted graph $G = (V,E,w)$ and a subset of terminals $T \subseteq V$, there is a graph $G'=(T,E',w')$ that is isomorphic to a minor of $G$, such that for every two terminals $u,v\in T$, the shortest-path distances between them in $G$ and in $G'$ satisfy $d_{G,w}(u,v) \le d_{G',w'}(u,v) \le O(\log^5|T|) \cdot d_{G,w}(u,v)$. Our existence proof actually gives a randomized polynomial-time algorithm. Our proof features a new variant of metric decomposition. It is well-known that every $n$-point metric space $(X,d)$ admits a $β$-separating decomposition for $β=O(\log n)$, which roughly means for every desired diameter bound $Δ>0$ there is a randomized partitioning of $X$, which satisfies the following separation requirement: for every $x,y \in X$, the probability they lie in different clusters of the partition is at most $β\,d(x,y)/Δ$. We introduce an additional requirement, which is the following tail bound: for every shortest-path $P$ of length $d(P) \leq Δ/β$, the number of clusters of the partition that meet the path $P$, denoted $Z_P$, satisfies $\Pr[Z_P > t] \le 2e^{-Ω(t)}$ for all $t>0$.

preprint2015arXiv

Random Coordinate Descent Methods for Minimizing Decomposable Submodular Functions

Submodular function minimization is a fundamental optimization problem that arises in several applications in machine learning and computer vision. The problem is known to be solvable in polynomial time, but general purpose algorithms have high running times and are unsuitable for large-scale problems. Recent work have used convex optimization techniques to obtain very practical algorithms for minimizing functions that are sums of ``simple" functions. In this paper, we use random coordinate descent methods to obtain algorithms with faster linear convergence rates and cheaper iteration costs. Compared to alternating projection methods, our algorithms do not rely on full-dimensional vector operations and they converge in significantly fewer iterations.

preprint2015arXiv

The Power of Randomization: Distributed Submodular Maximization on Massive Datasets

A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. Unfortunately, the resulting submodular optimization problems are often too large to be solved on a single machine. We develop a simple distributed algorithm that is embarrassingly parallel and it achieves provable, constant factor, worst-case approximation guarantees. In our experiments, we demonstrate its efficiency in large problems with different kinds of constraints with objective values always close to what is achievable in the centralized setting.

preprint2014arXiv

Approximate k-flat Nearest Neighbor Search

Let $k$ be a nonnegative integer. In the approximate $k$-flat nearest neighbor ($k$-ANN) problem, we are given a set $P \subset \mathbb{R}^d$ of $n$ points in $d$-dimensional space and a fixed approximation factor $c > 1$. Our goal is to preprocess $P$ so that we can efficiently answer approximate $k$-flat nearest neighbor queries: given a $k$-flat $F$, find a point in $P$ whose distance to $F$ is within a factor $c$ of the distance between $F$ and the closest point in $P$. The case $k = 0$ corresponds to the well-studied approximate nearest neighbor problem, for which a plethora of results are known, both in low and high dimensions. The case $k = 1$ is called approximate line nearest neighbor. In this case, we are aware of only one provably efficient data structure, due to Andoni, Indyk, Krauthgamer, and Nguyen. For $k \geq 2$, we know of no previous results. We present the first efficient data structure that can handle approximate nearest neighbor queries for arbitrary $k$. We use a data structure for $0$-ANN-queries as a black box, and the performance depends on the parameters of the $0$-ANN solution: suppose we have an $0$-ANN structure with query time $O(n^ρ)$ and space requirement $O(n^{1+σ})$, for $ρ, σ> 0$. Then we can answer $k$-ANN queries in time $O(n^{k/(k + 1 - ρ) + t})$ and space $O(n^{1+σk/(k + 1 - ρ)} + n\log^{O(1/t)} n)$. Here, $t > 0$ is an arbitrary constant and the $O$-notation hides exponential factors in $k$, $1/t$, and $c$ and polynomials in $d$. Our new data structures also give an improvement in the space requirement over the previous result for $1$-ANN: we can achieve near-linear space and sublinear query time, a further step towards practical applications where space constitutes the bottleneck.

preprint2014arXiv

On Communication Cost of Distributed Statistical Estimation and Dimensionality

We explore the connection between dimensionality and communication cost in distributed learning problems. Specifically we study the problem of estimating the mean $\vecθ$ of an unknown $d$ dimensional gaussian distribution in the distributed setting. In this problem, the samples from the unknown distribution are distributed among $m$ different machines. The goal is to estimate the mean $\vecθ$ at the optimal minimax rate while communicating as few bits as possible. We show that in this setting, the communication cost scales linearly in the number of dimensions i.e. one needs to deal with different dimensions individually. Applying this result to previous lower bounds for one dimension in the interactive setting \cite{ZDJW13} and to our improved bounds for the simultaneous setting, we prove new lower bounds of $Ω(md/\log(m))$ and $Ω(md)$ for the bits of communication needed to achieve the minimax squared loss, in the interactive and simultaneous settings respectively. To complement, we also demonstrate an interactive protocol achieving the minimax squared loss with $O(md)$ bits of communication, which improves upon the simple simultaneous protocol by a logarithmic factor. Given the strong lower bounds in the general setting, we initiate the study of the distributed parameter estimation problems with structured parameters. Specifically, when the parameter is promised to be $s$-sparse, we show a simple thresholding based protocol that achieves the same squared loss while saving a $d/s$ factor of communication. We conjecture that the tradeoff between communication and squared loss demonstrated by this protocol is essentially optimal up to logarithmic factor.

preprint2014arXiv

Online Bipartite Matching with Decomposable Weights

We study a weighted online bipartite matching problem: $G(V_1, V_2, E)$ is a weighted bipartite graph where $V_1$ is known beforehand and the vertices of $V_2$ arrive online. The goal is to match vertices of $V_2$ as they arrive to vertices in $V_1$, so as to maximize the sum of weights of edges in the matching. If assignments to $V_1$ cannot be changed, no bounded competitive ratio is achievable. We study the weighted online matching problem with {\em free disposal}, where vertices in $V_1$ can be assigned multiple times, but only get credit for the maximum weight edge assigned to them over the course of the algorithm. For this problem, the greedy algorithm is $0.5$-competitive and determining whether a better competitive ratio is achievable is a well known open problem. We identify an interesting special case where the edge weights are decomposable as the product of two factors, one corresponding to each end point of the edge. This is analogous to the well studied related machines model in the scheduling literature, although the objective functions are different. For this case of decomposable edge weights, we design a 0.5664 competitive randomized algorithm in complete bipartite graphs. We show that such instances with decomposable weights are non-trivial by establishing upper bounds of 0.618 for deterministic and $0.8$ for randomized algorithms. A tight competitive ratio of $1-1/e \approx 0.632$ was known previously for both the 0-1 case as well as the case where edge weights depend on the offline vertices only, but for these cases, reassignments cannot change the quality of the solution. Beating 0.5 for weighted matching where reassignments are necessary has been a significant challenge. We thus give the first online algorithm with competitive ratio strictly better than 0.5 for a non-trivial case of weighted matching with free disposal.

preprint2014arXiv

Time lower bounds for nonadaptive turnstile streaming algorithms

We say a turnstile streaming algorithm is "non-adaptive" if, during updates, the memory cells written and read depend only on the index being updated and random coins tossed at the beginning of the stream (and not on the memory contents of the algorithm). Memory cells read during queries may be decided upon adaptively. All known turnstile streaming algorithms in the literature are non-adaptive. We prove the first non-trivial update time lower bounds for both randomized and deterministic turnstile streaming algorithms, which hold when the algorithms are non-adaptive. While there has been abundant success in proving space lower bounds, there have been no non-trivial update time lower bounds in the turnstile model. Our lower bounds hold against classically studied problems such as heavy hitters, point query, entropy estimation, and moment estimation. In some cases of deterministic algorithms, our lower bounds nearly match known upper bounds.

preprint2013arXiv

Approximate Nearest Neighbor Search in $\ell_p$

We present a new locality sensitive hashing (LSH) algorithm for $c$-approximate nearest neighbor search in $\ell_p$ with $1<p<2$. For a database of $n$ points in $\ell_p$, we achieve $O(dn^ρ)$ query time and $O(dn+n^{1+ρ})$ space, where $ρ\le O((\ln c)^2/c^p)$. This improves upon the previous best upper bound $ρ\le 1/c$ by Datar et al. (SOCG 2004), and is close to the lower bound $ρ\ge 1/c^p$ by O'Donnell, Wu and Zhou (ITCS 2011). The proof is a simple generalization of the LSH scheme for $\ell_2$ by Andoni and Indyk (FOCS 2006).

preprint2013arXiv

Beyond Locality-Sensitive Hashing

We present a new data structure for the c-approximate near neighbor problem (ANN) in the Euclidean space. For n points in R^d, our algorithm achieves O(n^ρ + d log n) query time and O(n^{1 + ρ} + d log n) space, where ρ<= 7/(8c^2) + O(1 / c^3) + o(1). This is the first improvement over the result by Andoni and Indyk (FOCS 2006) and the first data structure that bypasses a locality-sensitive hashing lower bound proved by O'Donnell, Wu and Zhou (ICS 2011). By a standard reduction we obtain a data structure for the Hamming space and \ell_1 norm with ρ<= 7/(8c) + O(1/c^{3/2}) + o(1), which is the first improvement over the result of Indyk and Motwani (STOC 1998).

preprint2013arXiv

Lower bounds for oblivious subspace embeddings

An oblivious subspace embedding (OSE) for some eps, delta in (0,1/3) and d <= m <= n is a distribution D over R^{m x n} such that for any linear subspace W of R^n of dimension d, Pr_{Pi ~ D}(for all x in W, (1-eps) |x|_2 <= |Pi x|_2 <= (1+eps)|x|_2) >= 1 - delta. We prove that any OSE with delta < 1/3 must have m = Omega((d + log(1/delta))/eps^2), which is optimal. Furthermore, if every Pi in the support of D is sparse, having at most s non-zero entries per column, then we show tradeoff lower bounds between m and s.

preprint2013arXiv

Tight Lower Bound for Linear Sketches of Moments

The problem of estimating frequency moments of a data stream has attracted a lot of attention since the onset of streaming algorithms [AMS99]. While the space complexity for approximately computing the $p^{\rm th}$ moment, for $p\in(0,2]$ has been settled [KNW10], for $p>2$ the exact complexity remains open. For $p>2$ the current best algorithm uses $O(n^{1-2/p}\log n)$ words of space [AKO11,BO10], whereas the lower bound is of $Ω(n^{1-2/p})$ [BJKS04]. In this paper, we show a tight lower bound of $Ω(n^{1-2/p}\log n)$ words for the class of algorithms based on linear sketches, which store only a sketch $Ax$ of input vector $x$ and some (possibly randomized) matrix $A$. We note that all known algorithms for this problem are linear sketches.

preprint2012arXiv

On the Convergence of the Hegselmann-Krause System

We study convergence of the following discrete-time non-linear dynamical system: n agents are located in R^d and at every time step, each moves synchronously to the average location of all agents within a unit distance of it. This popularly studied system was introduced by Krause to model the dynamics of opinion formation and is often referred to as the Hegselmann-Krause model. We prove the first polynomial time bound for the convergence of this system in arbitrary dimensions. This improves on the bound of n^{O(n)} resulting from a more general theorem of Chazelle. Also, we show a quadratic lower bound and improve the upper bound for one-dimensional systems to O(n^3).

preprint2012arXiv

OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings

An "oblivious subspace embedding (OSE)" given some parameters eps,d is a distribution D over matrices B in R^{m x n} such that for any linear subspace W in R^n with dim(W) = d it holds that Pr_{B ~ D}(forall x in W ||B x||_2 in (1 +/- eps)||x||_2) > 2/3 We show an OSE exists with m = O(d^2/eps^2) and where every B in the support of D has exactly s=1 non-zero entries per column. This improves previously best known bound in [Clarkson-Woodruff, arXiv:1207.6365]. Our quadratic dependence on d is optimal for any OSE with s=1 [Nelson-Nguyen, 2012]. We also give two OSE's, which we call Oblivious Sparse Norm-Approximating Projections (OSNAPs), that both allow the parameter settings m = Õ(d/eps^2) and s = polylog(d)/eps, or m = O(d^{1+gamma}/eps^2) and s=O(1/eps) for any constant gamma>0. This m is nearly optimal since m >= d is required simply to no non-zero vector of W lands in the kernel of B. These are the first constructions with m=o(d^2) to have s=o(d). In fact, our OSNAPs are nothing more than the sparse Johnson-Lindenstrauss matrices of [Kane-Nelson, SODA 2012]. Our analyses all yield OSE's that are sampled using either O(1)-wise or O(log d)-wise independent hash functions, which provides some efficiency advantages over previous work for turnstile streaming applications. Our main result is essentially a Bai-Yin type theorem in random matrix theory and is likely to be of independent interest: i.e. we show that for any U in R^{n x d} with orthonormal columns and random sparse B, all singular values of BU lie in [1-eps, 1+eps] with good probability. Plugging OSNAPs into known algorithms for numerical linear algebra problems such as approximate least squares regression, low rank approximation, and approximating leverage scores implies faster algorithms for all these problems.

preprint2012arXiv

Sparsity Lower Bounds for Dimensionality Reducing Maps

We give near-tight lower bounds for the sparsity required in several dimensionality reducing linear maps. First, consider the JL lemma which states that for any set of n vectors in R there is a matrix A in R^{m x d} with m = O(eps^{-2}log n) such that mapping by A preserves pairwise Euclidean distances of these n vectors up to a 1 +/- eps factor. We show that there exists a set of n vectors such that any such matrix A with at most s non-zero entries per column must have s = Omega(eps^{-1}log n/log(1/eps)) as long as m < O(n/log(1/eps)). This bound improves the lower bound of Omega(min{eps^{-2}, eps^{-1}sqrt{log_m d}}) by [Dasgupta-Kumar-Sarlos, STOC 2010], which only held against the stronger property of distributional JL, and only against a certain restricted class of distributions. Meanwhile our lower bound is against the JL lemma itself, with no restrictions. Our lower bound matches the sparse Johnson-Lindenstrauss upper bound of [Kane-Nelson, SODA 2012] up to an O(log(1/eps)) factor. Next, we show that any m x n matrix with the k-restricted isometry property (RIP) with constant distortion must have at least Omega(klog(n/k)) non-zeroes per column if the number of the rows is the optimal value m = O(klog (n/k)), and if k < n/polylog n. This improves the previous lower bound of Omega(min{k, n/m}) by [Chandar, 2010] and shows that for virtually all k it is impossible to have a sparse RIP matrix with an optimal number of rows. Lastly, we show that any oblivious distribution over subspace embedding matrices with 1 non-zero per column and preserving all distances in a d dimensional-subspace up to a constant factor with constant probability must have at least Omega(d^2) rows. This matches one of the upper bounds in [Nelson-Nguyen, 2012] and shows the impossibility of obtaining the best of both of constructions in that work, namely 1 non-zero per column and Õ(d) rows.

Huy L. Nguyen

What is connected

Connect this record

See the researcher in context

Building this map preview

26 published item(s)

One-Sided Matrix Completion from Ultra-Sparse Samples

Testable and Actionable Calibration for Full Swap Regret

Adaptive Accelerated (Extra-)Gradient Methods with Variance Reduction

Fair and Useful Cohort Selection

Adaptive Gradient Methods for Constrained Convex Optimization and Variational Inequalities

Efficient Private Algorithms for Learning Large-Margin Halfspaces

Optimal Streaming Algorithms for Submodular Maximization with Cardinality Constraints

A New Framework for Distributed Submodular Maximization

Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality

Constrained Submodular Maximization: Beyond 1/e

Heavy hitters via cluster-preserving clustering

Submodular Maximization over Sliding Windows

Cutting corners cheaply, or how to remove Steiner points

Random Coordinate Descent Methods for Minimizing Decomposable Submodular Functions

The Power of Randomization: Distributed Submodular Maximization on Massive Datasets

Approximate k-flat Nearest Neighbor Search

On Communication Cost of Distributed Statistical Estimation and Dimensionality

Online Bipartite Matching with Decomposable Weights

Time lower bounds for nonadaptive turnstile streaming algorithms

Approximate Nearest Neighbor Search in $\ell_p$

Beyond Locality-Sensitive Hashing

Lower bounds for oblivious subspace embeddings

Tight Lower Bound for Linear Sketches of Moments

On the Convergence of the Hegselmann-Krause System

OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings

Sparsity Lower Bounds for Dimensionality Reducing Maps