Source author record

Aaron Sidford

Aaron Sidford appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms math.OC Machine Learning math.NA Computation Computational Complexity Discrete Mathematics Information Theory math.IT Numerical Analysis Computer Science and Game Theory Distributed, Parallel, and Cluster Computing Neural and Evolutionary Computing quant-ph

Catalog footprint

What is connected

39works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Convex optimization with $p$-norm oracles

In recent years, there have been significant advances in efficiently solving $\ell_s$-regression using linear system solvers and $\ell_2$-regression [Adil-Kyng-Peng-Sachdeva, J. ACM'24]. Would efficient smoothed $\ell_p$-norm solvers lead to even faster rates for solving $\ell_s$-regression when $2 \leq p < s$? In this paper, we give an affirmative answer to this question and show how to solve $\ell_s$-regression using $\tilde{O}(n^{\fracν{1+ν}})$ iterations of solving smoothed $\ell_p$ regression problems, where $ν:= \frac{1}{p} - \frac{1}{s}$. To obtain this result, we provide improved accelerated rates for convex optimization problems when given access to an $\ell_p^s(λ)$-proximal oracle, which, for a point $c$, returns the solution of the regularized problem $\min_{x} f(x) + λ||x-c||_p^s$. Additionally, we show that these rates for the $\ell_p^s(λ)$-proximal oracle are optimal for algorithms that query in the span of the outputs of the oracle, and we further apply our techniques to settings of high-order and quasi-self-concordant optimization.

preprint2026arXiv

Solving Matrix Games with Near-Optimal Matvec Complexity

We study the problem of computing an $ε$-approximate Nash equilibrium of a two-player, bilinear game with a bounded payoff matrix $A \in \mathbb{R}^{m \times n}$, when the players' strategies are constrained to lie in simple sets. We provide algorithms which solve this problem in $\tilde{O}(ε^{-2/3})$ matrix-vector multiplies (matvecs) in two well-studied cases: $\ell_1$-$\ell_1$ (or zero-sum) games, where the players' strategies are both in the probability simplex, and $\ell_2$-$\ell_1$ games (encompassing hard-margin SVMs), where the players' strategies are in the unit Euclidean ball and probability simplex respectively. These results improve upon the previous state-of-the-art complexities of $\tilde{O}(ε^{-8/9})$ for $\ell_1$-$\ell_1$ and $\tilde{O}(ε^{-7/9})$ for $\ell_2$-$\ell_1$ due to [KOS '25]. In both settings our results are nearly-optimal as they match lower bounds of [KS '25] up to polylogarithmic factors.

preprint2023arXiv

Quantum Speedups for Zero-Sum Games via Improved Dynamic Gibbs Sampling

We give a quantum algorithm for computing an $ε$-approximate Nash equilibrium of a zero-sum game in a $m \times n$ payoff matrix with bounded entries. Given a standard quantum oracle for accessing the payoff matrix our algorithm runs in time $\widetilde{O}(\sqrt{m + n}\cdot ε^{-2.5} + ε^{-3})$ and outputs a classical representation of the $ε$-approximate Nash equilibrium. This improves upon the best prior quantum runtime of $\widetilde{O}(\sqrt{m + n} \cdot ε^{-3})$ obtained by [vAG19] and the classic $\widetilde{O}((m + n) \cdot ε^{-2})$ runtime due to [GK95] whenever $ε= Ω((m +n)^{-1})$. We obtain this result by designing new quantum data structures for efficiently sampling from a slowly-changing Gibbs distribution.

preprint2022arXiv

High-precision Estimation of Random Walks in Small Space

We provide a deterministic $\tilde{O}(\log N)$-space algorithm for estimating random walk probabilities on undirected graphs, and more generally Eulerian directed graphs, to within inverse polynomial additive error ($ε=1/\mathrm{poly}(N)$) where $N$ is the length of the input. Previously, this problem was known to be solvable by a randomized algorithm using space $O(\log N)$ (following Aleliunas et al., FOCS 79) and by a deterministic algorithm using space $O(\log^{3/2} N)$ (Saks and Zhou, FOCS 95 and JCSS 99), both of which held for arbitrary directed graphs but had not been improved even for undirected graphs. We also give improvements on the space complexity of both of these previous algorithms for non-Eulerian directed graphs when the error is negligible ($ε=1/N^{ω(1)}$), generalizing what Hoza and Zuckerman (FOCS 18) recently showed for the special case of distinguishing whether a random walk probability is $0$ or greater than $ε$. We achieve these results by giving new reductions between powering Eulerian random-walk matrices and inverting Eulerian Laplacian matrices, providing a new notion of spectral approximation for Eulerian graphs that is preserved under powering, and giving the first deterministic $\tilde{O}(\log N)$-space algorithm for inverting Eulerian Laplacian matrices. The latter algorithm builds on the work of Murtagh et al. (FOCS 17) that gave a deterministic $\tilde{O}(\log N)$-space algorithm for inverting undirected Laplacian matrices, and the work of Cohen et al. (FOCS 19) that gave a randomized $\tilde{O}(N)$-time algorithm for inverting Eulerian Laplacian matrices. A running theme throughout these contributions is an analysis of "cycle-lifted graphs", where we take a graph and "lift" it to a new graph whose adjacency matrix is the tensor product of the original adjacency matrix and a directed cycle (or variants of one).

preprint2022arXiv

Improved Lower Bounds for Submodular Function Minimization

We provide a generic technique for constructing families of submodular functions to obtain lower bounds for submodular function minimization (SFM). Applying this technique, we prove that any deterministic SFM algorithm on a ground set of $n$ elements requires at least $Ω(n \log n)$ queries to an evaluation oracle. This is the first super-linear query complexity lower bound for SFM and improves upon the previous best lower bound of $2n$ given by [Graur et al., ITCS 2020]. Using our construction, we also prove that any (possibly randomized) parallel SFM algorithm, which can make up to $\mathsf{poly}(n)$ queries per round, requires at least $Ω(n / \log n)$ rounds to minimize a submodular function. This improves upon the previous best lower bound of $\tildeΩ(n^{1/3})$ rounds due to [Chakrabarty et al., FOCS 2021], and settles the parallel complexity of query-efficient SFM up to logarithmic factors due to a recent advance in [Jiang, SODA 2021].

preprint2022arXiv

RECAPP: Crafting a More Efficient Catalyst for Convex Optimization

The accelerated proximal point algorithm (APPA), also known as "Catalyst", is a well-established reduction from convex optimization to approximate proximal point computation (i.e., regularized minimization). This reduction is conceptually elegant and yields strong convergence rate guarantees. However, these rates feature an extraneous logarithmic term arising from the need to compute each proximal point to high accuracy. In this work, we propose a novel Relaxed Error Criterion for Accelerated Proximal Point (RECAPP) that eliminates the need for high accuracy subproblem solutions. We apply RECAPP to two canonical problems: finite-sum and max-structured minimization. For finite-sum problems, we match the best known complexity, previously obtained by carefully-designed problem-specific algorithms. For minimizing $\max_y f(x,y)$ where $f$ is convex in $x$ and strongly-concave in $y$, we improve on the best known (Catalyst-based) bound by a logarithmic factor.

preprint2022arXiv

Semi-Random Sparse Recovery in Nearly-Linear Time

Sparse recovery is one of the most fundamental and well-studied inverse problems. Standard statistical formulations of the problem are provably solved by general convex programming techniques and more practical, fast (nearly-linear time) iterative methods. However, these latter "fast algorithms" have previously been observed to be brittle in various real-world settings. We investigate the brittleness of fast sparse recovery algorithms to generative model changes through the lens of studying their robustness to a "helpful" semi-random adversary, a framework which tests whether an algorithm overfits to input assumptions. We consider the following basic model: let $\mathbf{A} \in \mathbb{R}^{n \times d}$ be a measurement matrix which contains an unknown subset of rows $\mathbf{G} \in \mathbb{R}^{m \times d}$ which are bounded and satisfy the restricted isometry property (RIP), but is otherwise arbitrary. Letting $x^\star \in \mathbb{R}^d$ be $s$-sparse, and given either exact measurements $b = \mathbf{A} x^\star$ or noisy measurements $b = \mathbf{A} x^\star + ξ$, we design algorithms recovering $x^\star$ information-theoretically optimally in nearly-linear time. We extend our algorithm to hold for weaker generative models relaxing our planted RIP assumption to a natural weighted variant, and show that our method's guarantees naturally interpolate the quality of the measurement matrix to, in some parameter regimes, run in sublinear time. Our approach differs from prior fast iterative methods with provable guarantees under semi-random generative models: natural conditions on a submatrix which make sparse recovery tractable are NP-hard to verify. We design a new iterative method tailored to the geometry of sparse recovery which is provably robust to our semi-random model. We hope our approach opens the door to new robust, efficient algorithms for natural statistical inverse problems.

preprint2022arXiv

Sharper Rates for Separable Minimax and Finite Sum Optimization via Primal-Dual Extragradient Methods

We design accelerated algorithms with improved rates for several fundamental classes of optimization problems. Our algorithms all build upon techniques related to the analysis of primal-dual extragradient methods via relative Lipschitzness proposed recently by [CST21]. (1) Separable minimax optimization. We study separable minimax optimization problems $\min_x \max_y f(x) - g(y) + h(x, y)$, where $f$ and $g$ have smoothness and strong convexity parameters $(L^x, μ^x)$, $(L^y, μ^y)$, and $h$ is convex-concave with a $(Λ^{xx}, Λ^{xy}, Λ^{yy})$-blockwise operator norm bounded Hessian. We provide an algorithm with gradient query complexity $\tilde{O}\left(\sqrt{\frac{L^{x}}{μ^{x}}} + \sqrt{\frac{L^{y}}{μ^{y}}} + \frac{Λ^{xx}}{μ^{x}} + \frac{Λ^{xy}}{\sqrt{μ^{x}μ^{y}}} + \frac{Λ^{yy}}{μ^{y}}\right)$. Notably, for convex-concave minimax problems with bilinear coupling (e.g.\ quadratics), where $Λ^{xx} = Λ^{yy} = 0$, our rate matches a lower bound of [ZHZ19]. (2) Finite sum optimization. We study finite sum optimization problems $\min_x \frac{1}{n}\sum_{i\in[n]} f_i(x)$, where each $f_i$ is $L_i$-smooth and the overall problem is $μ$-strongly convex. We provide an algorithm with gradient query complexity $\tilde{O}\left(n + \sum_{i\in[n]} \sqrt{\frac{L_i}{nμ}} \right)$. Notably, when the smoothness bounds $\{L_i\}_{i\in[n]}$ are non-uniform, our rate improves upon accelerated SVRG [LMH15, FGKS15] and Katyusha [All17] by up to a $\sqrt{n}$ factor. (3) Minimax finite sums. We generalize our algorithms for minimax and finite sum optimization to solve a natural family of minimax finite sum optimization problems at an accelerated rate, encapsulating both above results up to a logarithmic factor.

preprint2021arXiv

Complexity of Highly Parallel Non-Smooth Convex Optimization

A landmark result of non-smooth convex optimization is that gradient descent is an optimal algorithm whenever the number of computed gradients is smaller than the dimension $d$. In this paper we study the extension of this result to the parallel optimization setting. Namely we consider optimization algorithms interacting with a highly parallel gradient oracle, that is one that can answer $\mathrm{poly}(d)$ gradient queries in parallel. We show that in this case gradient descent is optimal only up to $\tilde{O}(\sqrt{d})$ rounds of interactions with the oracle. The lower bound improves upon a decades old construction by Nemirovski which proves optimality only up to $d^{1/3}$ rounds (as recently observed by Balkanski and Singer), and the suboptimality of gradient descent after $\sqrt{d}$ rounds was already observed by Duchi, Bartlett and Wainwright. In the latter regime we propose a new method with improved complexity, which we conjecture to be optimal. The analysis of this new method is based upon a generalized version of the recent results on optimal acceleration for highly smooth convex optimization.

preprint2020arXiv

A General Framework for Symmetric Property Estimation

In this paper we provide a general framework for estimating symmetric properties of distributions from i.i.d. samples. For a broad class of symmetric properties we identify the easy region where empirical estimation works and the difficult region where more complex estimators are required. We show that by approximately computing the profile maximum likelihood (PML) distribution \cite{ADOS16} in this difficult region we obtain a symmetric property estimation framework that is sample complexity optimal for many properties in a broader parameter regime than previous universal estimation approaches based on PML. The resulting algorithms based on these pseudo PML distributions are also more practical.

preprint2020arXiv

Acceleration with a Ball Optimization Oracle

Consider an oracle which takes a point $x$ and returns the minimizer of a convex function $f$ in an $\ell_2$ ball of radius $r$ around $x$. It is straightforward to show that roughly $r^{-1}\log\frac{1}ε$ calls to the oracle suffice to find an $ε$-approximate minimizer of $f$ in an $\ell_2$ unit ball. Perhaps surprisingly, this is not optimal: we design an accelerated algorithm which attains an $ε$-approximate minimizer with roughly $r^{-2/3} \log \frac{1}ε$ oracle queries, and give a matching lower bound. Further, we implement ball optimization oracles for functions with locally stable Hessians using a variant of Newton's method. The resulting algorithm applies to a number of problems of practical and theoretical import, improving upon previous results for logistic and $\ell_\infty$ regression and achieving guarantees comparable to the state-of-the-art for $\ell_p$ regression.

preprint2020arXiv

Constant Girth Approximation for Directed Graphs in Subquadratic Time

In this paper we provide a $\tilde{O}(m\sqrt{n})$ time algorithm that computes a $3$-multiplicative approximation of the girth of a $n$-node $m$-edge directed graph with non-negative edge lengths. This is the first algorithm which approximates the girth of a directed graph up to a constant multiplicative factor faster than All-Pairs Shortest Paths (APSP) time, i.e. $O(mn)$. Additionally, for any integer $k \ge 1$, we provide a deterministic algorithm for a $O(k\log\log n)$-multiplicative approximation to the girth in directed graphs in $\tilde{O}(m^{1+1/k})$ time. Combining the techniques from these two results gives us an algorithm for a $O(k\log k)$-multiplicative approximation to the girth in directed graphs in $\tilde{O}(m^{1+1/k})$ time. Our results naturally also provide algorithms for improved constructions of roundtrip spanners, the analog of spanners in directed graphs. The previous fastest algorithms for these problems either ran in All-Pairs Shortest Paths (APSP) time, i.e. $O(mn)$, or were due Pachocki et al. (PRSTV18) which provided a randomized algorithm that for any integer $k \ge 1$ in time $\tilde{O}(m^{1+1/k})$ computed with high probability a $O(k\log n)$ multiplicative approximation of the girth. Our first algorithm constitutes the first sub-APSP-time algorithm for approximating the girth to constant accuracy, our second removes the need for randomness and improves the approximation factor in Pachocki et al. (PRSTV18), and our third is the first time versus quality trade-off for obtaining constant approximations.

preprint2020arXiv

Coordinate Methods for Accelerating $\ell_\infty$ Regression and Faster Approximate Maximum Flow

We provide faster algorithms for approximately solving $\ell_{\infty}$ regression, a fundamental problem prevalent in both combinatorial and continuous optimization. In particular, we provide accelerated coordinate descent methods capable of provably exploiting dynamic measures of coordinate smoothness, and apply them to $\ell_\infty$ regression over a box to give algorithms which converge in $k$ iterations at a $O(1/k)$ rate. Our algorithms can be viewed as an alternative approach to the recent breakthrough result of Sherman [She17] which achieves a similar runtime improvement over classic algorithmic approaches, i.e. smoothing and gradient descent, which either converge at a $O(1/\sqrt{k})$ rate or have running times with a worse dependence on problem parameters. Our runtimes match those of [She17] across a broad range of parameters and achieve improvement in certain structured cases. We demonstrate the efficacy of our result by providing faster algorithms for the well-studied maximum flow problem. Directly leveraging our accelerated $\ell_\infty$ regression algorithms imply a $\tilde{O}\left(m + \sqrt{mn}/ε\right)$ runtime to compute an $ε$-approximate maximum flow for an undirected graph with $m$ edges and $n$ vertices, generically improving upon the previous best known runtime of $\tilde{O}\left(m/ε\right)$ in [She17] whenever the graph is slightly dense. We further design an algorithm adapted to the structure of the regression problem induced by maximum flow obtaining a runtime of $\tilde{O}\left(m + \max(n, \sqrt{ns})/ε\right)$, where $s$ is the squared $\ell_2$ norm of the congestion of any optimal flow. Moreover, we show how to leverage this result to achieve improved exact algorithms for maximum flow on a variety of unit capacity graphs. We hope that our work serves as an important step towards achieving even faster maximum flow algorithms.

preprint2020arXiv

Coordinate Methods for Matrix Games

We develop primal-dual coordinate methods for solving bilinear saddle-point problems of the form $\min_{x \in \mathcal{X}} \max_{y\in\mathcal{Y}} y^\top A x$ which contain linear programming, classification, and regression as special cases. Our methods push existing fully stochastic sublinear methods and variance-reduced methods towards their limits in terms of per-iteration complexity and sample complexity. We obtain nearly-constant per-iteration complexity by designing efficient data structures leveraging Taylor approximations to the exponential and a binomial heap. We improve sample complexity via low-variance gradient estimators using dynamic sampling distributions that depend on both the iterates and the magnitude of the matrix entries. Our runtime bounds improve upon those of existing primal-dual methods by a factor depending on sparsity measures of the $m$ by $n$ matrix $A$. For example, when rows and columns have constant $\ell_1/\ell_2$ norm ratios, we offer improvements by a factor of $m+n$ in the fully stochastic setting and $\sqrt{m+n}$ in the variance-reduced setting. We apply our methods to computational geometry problems, i.e. minimum enclosing ball, maximum inscribed ball, and linear regression, and obtain improved complexity bounds. For linear regression with an elementwise nonnegative matrix, our guarantees improve on exact gradient methods by a factor of $\sqrt{\mathrm{nnz}(A)/(m+n)}$.

preprint2020arXiv

Efficiently Solving MDPs with Stochastic Mirror Descent

We present a unified framework based on primal-dual stochastic mirror descent for approximately solving infinite-horizon Markov decision processes (MDPs) given a generative model. When applied to an average-reward MDP with $A_{tot}$ total state-action pairs and mixing time bound $t_{mix}$ our method computes an $ε$-optimal policy with an expected $\widetilde{O}(t_{mix}^2 A_{tot} ε^{-2})$ samples from the state-transition matrix, removing the ergodicity dependence of prior art. When applied to a $γ$-discounted MDP with $A_{tot}$ total state-action pairs our method computes an $ε$-optimal policy with an expected $\widetilde{O}((1-γ)^{-4} A_{tot} ε^{-2})$ samples, matching the previous state-of-the-art up to a $(1-γ)^{-1}$ factor. Both methods are model-free, update state values and policies simultaneously, and run in time linear in the number of samples taken. We achieve these results through a more general stochastic mirror descent framework for solving bilinear saddle-point problems with simplex and box domains and we demonstrate the flexibility of this framework by providing further applications to constrained MDPs.

preprint2020arXiv

Faster Divergence Maximization for Faster Maximum Flow

In this paper we provide an algorithm which given any $m$-edge $n$-vertex directed graph with integer capacities at most $U$ computes a maximum $s$-$t$ flow for any vertices $s$ and $t$ in $m^{4/3+o(1)}U^{1/3}$ time. This improves upon the previous best running times of $m^{11/8+o(1)}U^{1/4}$ (Liu Sidford 2019), $\tilde{O}(m \sqrt{n} \log U)$ (Lee Sidford 2014), and $O(mn)$ (Orlin 2013) when the graph is not too dense or has large capacities. To achieve the results this paper we build upon previous algorithmic approaches to maximum flow based on interior point methods (IPMs). In particular, we overcome a key bottleneck of previous advances in IPMs for maxflow (Mądry 2013, Mądry 2016, Liu Sidford 2019), which make progress by maximizing the energy of local $\ell_2$ norm minimizing electric flows. We generalize this approach and instead maximize the divergence of flows which minimize the Bregman divergence distance with respect to the weighted logarithmic barrier. This allows our algorithm to avoid dependencies on the $\ell_4$ norm that appear in other IPM frameworks (e.g. Cohen Mądry Sankowski Vladu 2017, Axiotis Mądry Vladu 2020). Further, we show that smoothed $\ell_2$-$\ell_p$ flows (Kyng, Peng, Sachdeva, Wang 2019), which we previously used to efficiently maximize energy (Liu Sidford 2019), can also be used to efficiently maximize divergence, thereby yielding our desired runtimes. We believe both this generalized view of energy maximization and generalized flow solvers we develop may be of further interest.

preprint2020arXiv

Solving Linear Programs with Sqrt(rank) Linear System Solves

We present an algorithm that given a linear program with $n$ variables, $m$ constraints, and constraint matrix $A$, computes an $ε$-approximate solution in $\tilde{O}(\sqrt{rank(A)}\log(1/ε))$ iterations with high probability. Each iteration of our method consists of solving $\tilde{O}(1)$ linear systems and additional nearly linear time computation, improving by a factor of $\tildeΩ((m/rank(A))^{1/2})$ over the previous fastest method with this iteration cost due to Renegar (1988). Further, we provide a deterministic polynomial time computable $\tilde{O}(rank(A))$-self-concordant barrier function for the polytope, resolving an open question of Nesterov and Nemirovski (1994) on the theory of "universal barriers" for interior point methods. Applying our techniques to the linear program formulation of maximum flow yields an $\tilde{O}(|E|\sqrt{|V|}\log(U))$ time algorithm for solving the maximum flow problem on directed graphs with $|E|$ edges, $|V|$ vertices, and integer capacities of size at most $U$. This improves upon the previous fastest polynomial running time of $O(|E|\min\{|E|^{1/2},|V|^{2/3}\}\log(|V|^{2}/|E|)\log(U))$ achieved by Goldberg and Rao (1998). In the special case of solving dense directed unit capacity graphs our algorithm improves upon the previous fastest running times of $O(|E|\min\{|E|^{1/2},|V|^{2/3}\})$ achieved by Even and Tarjan (1975) and Karzanov (1973) and of $\tilde{O}(|E|^{10/7})$ achieved more recently by Mądry (2013).

preprint2020arXiv

The Bethe and Sinkhorn Permanents of Low Rank Matrices and Implications for Profile Maximum Likelihood

In this paper we consider the problem of computing the likelihood of the profile of a discrete distribution, i.e., the probability of observing the multiset of element frequencies, and computing a profile maximum likelihood (PML) distribution, i.e., a distribution with the maximum profile likelihood. For each problem we provide polynomial time algorithms that given $n$ i.i.d.\ samples from a discrete distribution, achieve an approximation factor of $\exp\left(-O(\sqrt{n} \log n) \right)$, improving upon the previous best-known bound achievable in polynomial time of $\exp(-O(n^{2/3} \log n))$ (Charikar, Shiragur and Sidford, 2019). Through the work of Acharya, Das, Orlitsky and Suresh (2016), this implies a polynomial time universal estimator for symmetric properties of discrete distributions in a broader range of error parameter. We achieve these results by providing new bounds on the quality of approximation of the Bethe and Sinkhorn permanents (Vontobel, 2012 and 2014). We show that each of these are $\exp(O(k \log(N/k)))$ approximations to the permanent of $N \times N$ matrices with non-negative rank at most $k$, improving upon the previous known bounds of $\exp(O(N))$. To obtain our results on PML, we exploit the fact that the PML objective is proportional to the permanent of a certain Vandermonde matrix with $\sqrt{n}$ distinct columns, i.e. with non-negative rank at most $\sqrt{n}$. As a by-product of our work we establish a surprising connection between the convex relaxation in prior work (CSS19) and the well-studied Bethe and Sinkhorn approximations.

preprint2020arXiv

Towards Optimal Running Times for Optimal Transport

In this work, we provide faster algorithms for approximating the optimal transport distance, e.g. earth mover's distance, between two discrete probability distributions $μ, ν\in Δ^n$. Given a cost function $C : [n] \times [n] \to \mathbb{R}_{\geq 0}$ where $C(i,j) \leq 1$ quantifies the penalty of transporting a unit of mass from $i$ to $j$, we show how to compute a coupling $X$ between $r$ and $c$ in time $\widetilde{O}\left(n^2 /ε\right)$ whose expected transportation cost is within an additive $ε$ of optimal. This improves upon the previously best known running time for this problem of $\widetilde{O}\left(\text{min}\left\{ n^{9/4}/ε, n^2/ε^2 \right\}\right)$. We achieve our results by providing reductions from optimal transport to canonical optimization problems for which recent algorithmic efforts have provided nearly-linear time algorithms. Leveraging nearly linear time algorithms for solving packing linear programs and for solving the matrix balancing problem, we obtain two separate proofs of our stated running time. Further, one of our algorithms is easily parallelized and can be implemented with depth $\widetilde{O}(1/ε)$. Moreover, we show that further algorithmic improvements to our result would be surprising in the sense that any improvement would yield an $o(n^{2.5})$ algorithm for \textit{maximum cardinality bipartite matching}, for which currently the only known algorithms for achieving such a result are based on fast-matrix multiplication.

preprint2016arXiv

Almost-Linear-Time Algorithms for Markov Chains and New Spectral Primitives for Directed Graphs

In this paper we introduce a notion of spectral approximation for directed graphs. While there are many potential ways one might define approximation for directed graphs, most of them are too strong to allow sparse approximations in general. In contrast, we prove that for our notion of approximation, such sparsifiers do exist, and we show how to compute them in almost linear time. Using this notion of approximation, we provide a general framework for solving asymmetric linear systems that is broadly inspired by the work of [Peng-Spielman, STOC`14]. Applying this framework in conjunction with our sparsification algorithm, we obtain an almost linear time algorithm for solving directed Laplacian systems associated with Eulerian Graphs. Using this solver in the recent framework of [Cohen-Kelner-Peebles-Peng-Sidford-Vladu, FOCS`16], we obtain almost linear time algorithms for solving a directed Laplacian linear system, computing the stationary distribution of a Markov chain, computing expected commute times in a directed graph, and more. For each of these problems, our algorithms improves the previous best running times of $O((nm^{3/4} + n^{2/3} m) \log^{O(1)} (n κε^{-1}))$ to $O((m + n2^{O(\sqrt{\log{n}\log\log{n}})}) \log^{O(1)} (n κε^{-1}))$ where $n$ is the number of vertices in the graph, $m$ is the number of edges, $κ$ is a natural condition number associated with the problem, and $ε$ is the desired accuracy. We hope these results open the door for further studies into directed spectral graph theory, and will serve as a stepping stone for designing a new generation of fast algorithms for directed graphs.

preprint2016arXiv

Efficient Algorithms for Large-scale Generalized Eigenvector Computation and Canonical Correlation Analysis

This paper considers the problem of canonical-correlation analysis (CCA) (Hotelling, 1936) and, more broadly, the generalized eigenvector problem for a pair of symmetric matrices. These are two fundamental problems in data analysis and scientific computing with numerous applications in machine learning and statistics (Shi and Malik, 2000; Hardoon et al., 2004; Witten et al., 2009). We provide simple iterative algorithms, with improved runtimes, for solving these problems that are globally linearly convergent with moderate dependencies on the condition numbers and eigenvalue gaps of the matrices involved. We obtain our results by reducing CCA to the top-$k$ generalized eigenvector problem. We solve this problem through a general framework that simply requires black box access to an approximate linear system solver. Instantiating this framework with accelerated gradient descent we obtain a running time of $O(\frac{z k \sqrtκ}ρ \log(1/ε) \log \left(kκ/ρ\right))$ where $z$ is the total number of nonzero entries, $κ$ is the condition number and $ρ$ is the relative eigenvalue gap of the appropriate matrices. Our algorithm is linear in the input size and the number of components $k$ up to a $\log(k)$ factor. This is essential for handling large-scale matrices that appear in practice. To the best of our knowledge this is the first such algorithm with global linear convergence. We hope that our results prompt further research and ultimately improve the practical running time for performing these important data analysis procedures on large data sets.

preprint2016arXiv

Faster Algorithms for Computing the Stationary Distribution, Simulating Random Walks, and More

In this paper, we provide faster algorithms for computing various fundamental quantities associated with random walks on a directed graph, including the stationary distribution, personalized PageRank vectors, hitting times, and escape probabilities. In particular, on a directed graph with $n$ vertices and $m$ edges, we show how to compute each quantity in time $\tilde{O}(m^{3/4}n+mn^{2/3})$, where the $\tilde{O}$ notation suppresses polylogarithmic factors in $n$, the desired accuracy, and the appropriate condition number (i.e. the mixing time or restart probability). Our result improves upon the previous fastest running times for these problems; previous results either invoke a general purpose linear system solver on a $n\times n$ matrix with $m$ non-zero entries, or depend polynomially on the desired error or natural condition number associated with the problem (i.e. the mixing time or restart probability). For sparse graphs, we obtain a running time of $\tilde{O}(n^{7/4})$, breaking the $O(n^{2})$ barrier of the best running time one could hope to achieve using fast matrix multiplication. We achieve our result by providing a similar running time improvement for solving directed Laplacian systems, a natural directed or asymmetric analog of the well studied symmetric or undirected Laplacian systems. We show how to solve such systems in time $\tilde{O}(m^{3/4}n+mn^{2/3})$, and efficiently reduce a broad range of problems to solving $\tilde{O}(1)$ directed Laplacian systems on Eulerian graphs. We hope these results and our analysis open the door for further study into directed spectral graph theory.

preprint2016arXiv

Faster Eigenvector Computation via Shift-and-Invert Preconditioning

We give faster algorithms and improved sample complexities for estimating the top eigenvector of a matrix $Σ$ -- i.e. computing a unit vector $x$ such that $x^T Σx \ge (1-ε)λ_1(Σ)$: Offline Eigenvector Estimation: Given an explicit $A \in \mathbb{R}^{n \times d}$ with $Σ= A^TA$, we show how to compute an $ε$ approximate top eigenvector in time $\tilde O([nnz(A) + \frac{d*sr(A)}{gap^2} ]* \log 1/ε)$ and $\tilde O([\frac{nnz(A)^{3/4} (d*sr(A))^{1/4}}{\sqrt{gap}} ] * \log 1/ε)$. Here $nnz(A)$ is the number of nonzeros in $A$, $sr(A)$ is the stable rank, $gap$ is the relative eigengap. By separating the $gap$ dependence from the $nnz(A)$ term, our first runtime improves upon the classical power and Lanczos methods. It also improves prior work using fast subspace embeddings [AC09, CW13] and stochastic optimization [Sha15c], giving significantly better dependencies on $sr(A)$ and $ε$. Our second running time improves these further when $nnz(A) \le \frac{d*sr(A)}{gap^2}$. Online Eigenvector Estimation: Given a distribution $D$ with covariance matrix $Σ$ and a vector $x_0$ which is an $O(gap)$ approximate top eigenvector for $Σ$, we show how to refine to an $ε$ approximation using $ O(\frac{var(D)}{gap*ε})$ samples from $D$. Here $var(D)$ is a natural notion of variance. Combining our algorithm with previous work to initialize $x_0$, we obtain improved sample complexity and runtime results under a variety of assumptions on $D$. We achieve our results using a general framework that we believe is of independent interest. We give a robust analysis of the classic method of shift-and-invert preconditioning to reduce eigenvector computation to approximately solving a sequence of linear systems. We then apply fast stochastic variance reduced gradient (SVRG) based system solvers to achieve our claims.

preprint2016arXiv

Geometric Median in Nearly Linear Time

In this paper we provide faster algorithms for solving the geometric median problem: given $n$ points in $\mathbb{R}^{d}$ compute a point that minimizes the sum of Euclidean distances to the points. This is one of the oldest non-trivial problems in computational geometry yet despite an abundance of research the previous fastest algorithms for computing a $(1+ε)$-approximate geometric median were $O(d\cdot n^{4/3}ε^{-8/3})$ by Chin et. al, $\tilde{O}(d\exp{ε^{-4}\logε^{-1}})$ by Badoiu et. al, $O(nd+\mathrm{poly}(d,ε^{-1})$ by Feldman and Langberg, and $O((nd)^{O(1)}\log\frac{1}ε)$ by Parrilo and Sturmfels and Xue and Ye. In this paper we show how to compute a $(1+ε)$-approximate geometric median in time $O(nd\log^{3}\frac{1}ε)$ and $O(dε^{-2})$. While our $O(dε^{-2})$ is a fairly straightforward application of stochastic subgradient descent, our $O(nd\log^{3}\frac{1}ε)$ time algorithm is a novel long step interior point method. To achieve this running time we start with a simple $O((nd)^{O(1)}\log\frac{1}ε)$ time interior point method and show how to improve it, ultimately building an algorithm that is quite non-standard from the perspective of interior point literature. Our result is one of very few cases we are aware of outperforming traditional interior point theory and the only we are aware of using interior point methods to obtain a nearly linear time algorithm for a canonical optimization problem that traditionally requires superlinear time. We hope our work leads to further improvements in this line of research.

preprint2016arXiv

Robust Shift-and-Invert Preconditioning: Faster and More Sample Efficient Algorithms for Eigenvector Computation

We provide faster algorithms and improved sample complexities for approximating the top eigenvector of a matrix. Offline Setting: Given an $n \times d$ matrix $A$, we show how to compute an $ε$ approximate top eigenvector in time $\tilde O ( [nnz(A) + \frac{d \cdot sr(A)}{gap^2}]\cdot \log 1/ε)$ and $\tilde O([\frac{nnz(A)^{3/4} (d \cdot sr(A))^{1/4}}{\sqrt{gap}}]\cdot \log1/ε)$. Here $sr(A)$ is the stable rank and $gap$ is the multiplicative eigenvalue gap. By separating the $gap$ dependence from $nnz(A)$ we improve on the classic power and Lanczos methods. We also improve prior work using fast subspace embeddings and stochastic optimization, giving significantly improved dependencies on $sr(A)$ and $ε$. Our second running time improves this further when $nnz(A) \le \frac{d\cdot sr(A)}{gap^2}$. Online Setting: Given a distribution $D$ with covariance matrix $Σ$ and a vector $x_0$ which is an $O(gap)$ approximate top eigenvector for $Σ$, we show how to refine to an $ε$ approximation using $\tilde O(\frac{v(D)}{gap^2} + \frac{v(D)}{gap \cdot ε})$ samples from $D$. Here $v(D)$ is a natural variance measure. Combining our algorithm with previous work to initialize $x_0$, we obtain a number of improved sample complexity and runtime results. For general distributions, we achieve asymptotically optimal accuracy as a function of sample size as the number of samples grows large. Our results center around a robust analysis of the classic method of shift-and-invert preconditioning to reduce eigenvector computation to approximately solving a sequence of linear systems. We then apply fast SVRG based approximate system solvers to achieve our claims. We believe our results suggest the general effectiveness of shift-and-invert based approaches and imply that further computational gains may be reaped in practice.

preprint2016arXiv

Routing under Balance

We introduce the notion of balance for directed graphs: a weighted directed graph is $α$-balanced if for every cut $S \subseteq V$, the total weight of edges going from $S$ to $V\setminus S$ is within factor $α$ of the total weight of edges going from $V\setminus S$ to $S$. Several important families of graphs are nearly balanced, in particular, Eulerian graphs (with $α= 1$) and residual graphs of $(1+ε)$-approximate undirected maximum flows (with $α=O(1/ε)$). We use the notion of balance to give a more fine-grained understanding of several well-studied routing questions that are considerably harder in directed graphs. We first revisit oblivious routings in directed graphs. Our main algorithmic result is an oblivious routing scheme for single-source instances that achieve an $O(α\cdot \log^3 n / \log \log n)$ competitive ratio. In the process, we make several technical contributions which may be of independent interest. In particular, we give an efficient algorithm for computing low-radius decompositions of directed graphs parameterized by balance. We also define and construct low-stretch arborescences, a generalization of low-stretch spanning trees to directed graphs. On the negative side, we present new lower bounds for oblivious routing problems on directed graphs. We show that the competitive ratio of oblivious routing algorithms for directed graphs is $Ω(n)$ in general; this result improves upon the long-standing best known lower bound of $Ω(\sqrt{n})$ given by Hajiaghayi, Kleinberg, Leighton and Räcke in 2006. We also show that our restriction to single-source instances is necessary by showing an $Ω(\sqrt{n})$ lower bound for multiple-source oblivious routing in Eulerian graphs. We also give a fast algorithm for the maximum flow problem in balanced directed graphs.

preprint2016arXiv

Streaming PCA: Matching Matrix Bernstein and Near-Optimal Finite Sample Guarantees for Oja's Algorithm

This work provides improved guarantees for streaming principle component analysis (PCA). Given $A_1, \ldots, A_n\in \mathbb{R}^{d\times d}$ sampled independently from distributions satisfying $\mathbb{E}[A_i] = Σ$ for $Σ\succeq \mathbf{0}$, this work provides an $O(d)$-space linear-time single-pass streaming algorithm for estimating the top eigenvector of $Σ$. The algorithm nearly matches (and in certain cases improves upon) the accuracy obtained by the standard batch method that computes top eigenvector of the empirical covariance $\frac{1}{n} \sum_{i \in [n]} A_i$ as analyzed by the matrix Bernstein inequality. Moreover, to achieve constant accuracy, our algorithm improves upon the best previous known sample complexities of streaming algorithms by either a multiplicative factor of $O(d)$ or $1/\mathrm{gap}$ where $\mathrm{gap}$ is the relative distance between the top two eigenvalues of $Σ$. These results are achieved through a novel analysis of the classic Oja's algorithm, one of the oldest and most popular algorithms for streaming PCA. In particular, this work shows that simply picking a random initial point $w_0$ and applying the update rule $w_{i + 1} = w_i + η_i A_i w_i$ suffices to accurately estimate the top eigenvector, with a suitable choice of $η_i$. We believe our result sheds light on how to efficiently perform streaming PCA both in theory and in practice and we hope that our analysis may serve as the basis for analyzing many variants and extensions of streaming PCA.

preprint2016arXiv

Subquadratic Submodular Function Minimization

Submodular function minimization (SFM) is a fundamental discrete optimization problem which generalizes many well known problems, has applications in various fields, and can be solved in polynomial time. Owing to applications in computer vision and machine learning, fast SFM algorithms are highly desirable. The current fastest algorithms [Lee, Sidford, Wong, FOCS 2015] run in $O(n^{2}\log nM\cdot\textrm{EO} +n^{3}\log^{O(1)}nM)$ time and $O(n^{3}\log^{2}n\cdot \textrm{EO} +n^{4}\log^{O(1)}n$) time respectively, where $M$ is the largest absolute value of the function (assuming the range is integers) and $\textrm{EO}$ is the time taken to evaluate the function on any set. Although the best known lower bound on the query complexity is only $Ω(n)$, the current shortest non-deterministic proof certifying the optimum value of a function requires $Θ(n^{2})$ function evaluations. The main contribution of this paper are subquadratic SFM algorithms. For integer-valued submodular functions, we give an SFM algorithm which runs in $O(nM^{3}\log n\cdot\textrm{EO})$ time giving the first nearly linear time algorithm in any known regime. For real-valued submodular functions with range in $[-1,1]$, we give an algorithm which in $\tilde{O}(n^{5/3}\cdot\textrm{EO}/\varepsilon^{2})$ time returns an $\varepsilon$-additive approximate solution. At the heart of it, our algorithms are projected stochastic subgradient descent methods on the Lovasz extension of submodular functions where we crucially exploit submodularity and data structures to obtain fast, i.e. sublinear time subgradient updates. . The latter is crucial for beating the $n^{2}$ bound as we show that algorithms which access only subgradients of the Lovasz extension, and these include the theoretically best algorithms mentioned above, must make $Ω(n)$ subgradient calls (even for functions whose range is $\{-1,0,1\}$).

preprint2015arXiv

A Faster Cutting Plane Method and its Implications for Combinatorial and Convex Optimization

We improve upon the running time for finding a point in a convex set given a separation oracle. In particular, given a separation oracle for a convex set $K\subset \mathbb{R}^n$ contained in a box of radius $R$, we show how to either find a point in $K$ or prove that $K$ does not contain a ball of radius $ε$ using an expected $O(n\log(nR/ε))$ oracle evaluations and additional time $O(n^3\log^{O(1)}(nR/ε))$. This matches the oracle complexity and improves upon the $O(n^{ω+1}\log(nR/ε))$ additional time of the previous fastest algorithm achieved over 25 years ago by Vaidya for the current matrix multiplication constant $ω<2.373$ when $R/ε=n^{O(1)}$. Using a mix of standard reductions and new techniques, our algorithm yields improved runtimes for solving classic problems in continuous and combinatorial optimization: Submodular Minimization: Our weakly and strongly polynomial time algorithms have runtimes of $O(n^2\log nM\cdot\text{EO}+n^3\log^{O(1)}nM)$ and $O(n^3\log^2 n\cdot\text{EO}+n^4\log^{O(1)}n)$, improving upon the previous best of $O((n^4\text{EO}+n^5)\log M)$ and $O(n^5\text{EO}+n^6)$. Matroid Intersection: Our runtimes are $O(nrT_{\text{rank}}\log n\log (nM) +n^3\log^{O(1)}(nM))$ and $O(n^2\log (nM) T_{\text{ind}}+n^3 \log^{O(1)} (nM))$, achieving the first quadratic bound on the query complexity for the independence and rank oracles. In the unweighted case, this is the first improvement since 1986 for independence oracle. Submodular Flow: Our runtime is $O(n^2\log nCU\cdot\text{EO}+n^3\log^{O(1)}nCU)$, improving upon the previous bests from 15 years ago roughly by a factor of $O(n^4)$. Semidefinite Programming: Our runtime is $\tilde{O}(n(n^2+m^ω+S))$, improving upon the previous best of $\tilde{O}(n(n^ω+m^ω+S))$ for the regime where the number of nonzeros $S$ is small.

preprint2015arXiv

Competing with the Empirical Risk Minimizer in a Single Pass

In many estimation problems, e.g. linear and logistic regression, we wish to minimize an unknown objective given only unbiased samples of the objective function. Furthermore, we aim to achieve this using as few samples as possible. In the absence of computational constraints, the minimizer of a sample average of observed data -- commonly referred to as either the empirical risk minimizer (ERM) or the $M$-estimator -- is widely regarded as the estimation strategy of choice due to its desirable statistical convergence properties. Our goal in this work is to perform as well as the ERM, on every problem, while minimizing the use of computational resources such as running time and space usage. We provide a simple streaming algorithm which, under standard regularity assumptions on the underlying problem, enjoys the following properties: * The algorithm can be implemented in linear time with a single pass of the observed data, using space linear in the size of a single sample. * The algorithm achieves the same statistical rate of convergence as the empirical risk minimizer on every problem, even considering constant factors. * The algorithm's performance depends on the initial error at a rate that decreases super-polynomially. * The algorithm is easily parallelizable. Moreover, we quantify the (finite-sample) rate at which the algorithm becomes competitive with the ERM.

preprint2015arXiv

Efficient Inverse Maintenance and Faster Algorithms for Linear Programming

In this paper, we consider the following inverse maintenance problem: given $A \in \mathbb{R}^{n\times d}$ and a number of rounds $r$, we receive a $n\times n$ diagonal matrix $D^{(k)}$ at round $k$ and we wish to maintain an efficient linear system solver for $A^{T}D^{(k)}A$ under the assumption $D^{(k)}$ does not change too rapidly. This inverse maintenance problem is the computational bottleneck in solving multiple optimization problems. We show how to solve this problem with $\tilde{O}(nnz(A)+d^ω)$ preprocessing time and amortized $\tilde{O}(nnz(A)+d^{2})$ time per round, improving upon previous running times for solving this problem. Consequently, we obtain the fastest known running times for solving multiple problems including, linear programming and computing a rounding of a polytope. In particular given a feasible point in a linear program with $d$ variables, $n$ constraints, and constraint matrix $A\in\mathbb{R}^{n\times d}$, we show how to solve the linear program in time $\tilde{O}(nnz(A)+d^{2})\sqrt{d}\log(ε^{-1}))$. We achieve our results through a novel combination of classic numerical techniques of low rank update, preconditioning, and fast matrix multiplication as well as recent work on subspace embeddings and spectral sparsification that we hope will be of independent interest.

preprint2015arXiv

Path Finding I :Solving Linear Programs with Õ(sqrt(rank)) Linear System Solves

In this paper we present a new algorithm for solving linear programs that requires only $\tilde{O}(\sqrt{rank(A)}L)$ iterations to solve a linear program with $m$ constraints, $n$ variables, and constraint matrix $A$, and bit complexity $L$. Each iteration of our method consists of solving $\tilde{O}(1)$ linear systems and additional nearly linear time computation. Our method improves upon the previous best iteration bound by factor of $\tildeΩ((m/rank(A))^{1/4})$ for methods with polynomial time computable iterations and by $\tildeΩ((m/rank(A))^{1/2})$ for methods which solve at most $\tilde{O}(1)$ linear systems in each iteration. Our method is parallelizable and amenable to linear algebraic techniques for accelerating the linear system solver. As such, up to polylogarithmic factors we either match or improve upon the best previous running times in both depth and work for different ratios of $m$ and $rank(A)$. Moreover, our method matches up to polylogarithmic factors a theoretical limit established by Nesterov and Nemirovski in 1994 regarding the use of a "universal barrier" for interior point methods, thereby resolving a long-standing open question regarding the running time of polynomial time interior point methods for linear programming.

preprint2015arXiv

Path Finding II : An Õ(m sqrt(n)) Algorithm for the Minimum Cost Flow Problem

In this paper we present an $\tilde{O}(m\sqrt{n}\log^{O(1)}U)$ time algorithm for solving the maximum flow problem on directed graphs with $m$ edges, $n$ vertices, and capacity ratio $U$. This improves upon the previous fastest running time of $O(m\min\left(n^{2/3},m^{1/2}\right)\log\left(n^{2}/m\right)\log U)$ achieved over 15 years ago by Goldberg and Rao. In the special case of solving dense directed unit capacity graphs our algorithm improves upon the previous fastest running times of of $O(\min\{m^{3/2},mn^{^{2/3}}\})$ achieved by Even and Tarjan and Karzanov over 35 years ago and of $\tilde{O}(m^{10/7})$ achieved recently by Mądry. We achieve these results through the development and application of a new general interior point method that we believe is of independent interest. The number of iterations required by this algorithm is better than that predicted by analyzing the best self-concordant barrier of the feasible region. By applying this method to the linear programming formulations of maximum flow, minimum cost flow, and lossy generalized minimum cost flow and applying analysis by Daitch and Spielman we achieve running time of $\tilde{O}(m\sqrt{n}\log^{O(1)}(U/ε))$ for these problems as well. Furthermore, our algorithm is parallelizable and using a recent nearly linear time work polylogarithmic depth Laplacian system solver of Spielman and Peng we achieve a $\tilde{O}(\sqrt{n}\log^{O(1)}(U/ε))$ depth algorithm and $\tilde{O}(m\sqrt{n}\log^{O(1)}(U/ε))$ work algorithm for solving these problems.

preprint2015arXiv

Single Pass Spectral Sparsification in Dynamic Streams

We present the first single pass algorithm for computing spectral sparsifiers of graphs in the dynamic semi-streaming model. Given a single pass over a stream containing insertions and deletions of edges to a graph G, our algorithm maintains a randomized linear sketch of the incidence matrix of G into dimension O((1/epsilon^2) n polylog(n)). Using this sketch, at any point, the algorithm can output a (1 +/- epsilon) spectral sparsifier for G with high probability. While O((1/epsilon^2) n polylog(n)) space algorithms are known for computing "cut sparsifiers" in dynamic streams [AGM12b, GKP12] and spectral sparsifiers in "insertion-only" streams [KL11], prior to our work, the best known single pass algorithm for maintaining spectral sparsifiers in dynamic streams required sketches of dimension Omega((1/epsilon^2) n^(5/3)) [AGM14]. To achieve our result, we show that, using a coarse sparsifier of G and a linear sketch of G's incidence matrix, it is possible to sample edges by effective resistance, obtaining a spectral sparsifier of arbitrary precision. Sampling from the sketch requires a novel application of ell_2/ell_2 sparse recovery, a natural extension of the ell_0 methods used for cut sparsifiers in [AGM12b]. Recent work of [MP12] on row sampling for matrix approximation gives a recursive approach for obtaining the required coarse sparsifiers. Under certain restrictions, our approach also extends to the problem of maintaining a spectral approximation for a general matrix A^T A given a stream of updates to rows in A.

preprint2015arXiv

Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization

We develop a family of accelerated stochastic algorithms that minimize sums of convex functions. Our algorithms improve upon the fastest running time for empirical risk minimization (ERM), and in particular linear least-squares regression, across a wide range of problem settings. To achieve this, we establish a framework based on the classical proximal point algorithm. Namely, we provide several algorithms that reduce the minimization of a strongly convex function to approximate minimizations of regularizations of the function. Using these results, we accelerate recent fast stochastic algorithms in a black-box fashion. Empirically, we demonstrate that the resulting algorithms exhibit notions of stability that are advantageous in practice. Both in theory and in practice, the provided algorithms reap the computational benefits of adding a large strongly convex regularization term, without incurring a corresponding bias to the original problem.

preprint2014arXiv

Uniform Sampling for Matrix Approximation

Random sampling has become a critical tool in solving massive matrix problems. For linear regression, a small, manageable set of data rows can be randomly selected to approximate a tall, skinny data matrix, improving processing time significantly. For theoretical performance guarantees, each row must be sampled with probability proportional to its statistical leverage score. Unfortunately, leverage scores are difficult to compute. A simple alternative is to sample rows uniformly at random. While this often works, uniform sampling will eliminate critical row information for many natural instances. We take a fresh look at uniform sampling by examining what information it does preserve. Specifically, we show that uniform sampling yields a matrix that, in some sense, well approximates a large fraction of the original. While this weak form of approximation is not enough for solving linear regression directly, it is enough to compute a better approximation. This observation leads to simple iterative row sampling algorithms for matrix approximation that run in input-sparsity time and preserve row structure and sparsity at all intermediate steps. In addition to an improved understanding of uniform sampling, our main proof introduces a structural result of independent interest: we show that every matrix can be made to have low coherence by reweighting a small subset of its rows.

preprint2013arXiv

A Simple, Combinatorial Algorithm for Solving SDD Systems in Nearly-Linear Time

In this paper, we present a simple combinatorial algorithm that solves symmetric diagonally dominant (SDD) linear systems in nearly-linear time. It uses very little of the machinery that previously appeared to be necessary for a such an algorithm. It does not require recursive preconditioning, spectral sparsification, or even the Chebyshev Method or Conjugate Gradient. After constructing a "nice" spanning tree of a graph associated with the linear system, the entire algorithm consists of the repeated application of a simple (non-recursive) update rule, which it implements using a lightweight data structure. The algorithm is numerically stable and can be implemented without the increased bit-precision required by previous solvers. As such, the algorithm has the fastest known running time under the standard unit-cost RAM model. We hope that the simplicity of the algorithm and the insights yielded by its analysis will be useful in both theory and practice.

preprint2013arXiv

An Almost-Linear-Time Algorithm for Approximate Max Flow in Undirected Graphs, and its Multicommodity Generalizations

In this paper, we introduce a new framework for approximately solving flow problems in capacitated, undirected graphs and apply it to provide asymptotically faster algorithms for the maximum $s$-$t$ flow and maximum concurrent multicommodity flow problems. For graphs with $n$ vertices and $m$ edges, it allows us to find an $ε$-approximate maximum $s$-$t$ flow in time $O(m^{1+o(1)}ε^{-2})$, improving on the previous best bound of $\tilde{O}(mn^{1/3} poly(1/ε))$. Applying the same framework in the multicommodity setting solves a maximum concurrent multicommodity flow problem with $k$ commodities in $O(m^{1+o(1)}ε^{-2}k^2)$ time, improving on the existing bound of $\tilde{O}(m^{4/3} poly(k,ε^{-1})$. Our algorithms utilize several new technical tools that we believe may be of independent interest: - We give a non-Euclidean generalization of gradient descent and provide bounds on its performance. Using this, we show how to reduce approximate maximum flow and maximum concurrent flow to the efficient construction of oblivious routings with a low competitive ratio. - We define and provide an efficient construction of a new type of flow sparsifier. In addition to providing the standard properties of a cut sparsifier our construction allows for flows in the sparse graph to be routed (very efficiently) in the original graph with low congestion. - We give the first almost-linear-time construction of an $O(m^{o(1)})$-competitive oblivious routing scheme. No previous such algorithm ran in time better than $\tilde{Ω}(mn)$. We also note that independently Jonah Sherman produced an almost linear time algorithm for maximum flow and we thank him for coordinating submissions.

preprint2013arXiv

Efficient Accelerated Coordinate Descent Methods and Faster Algorithms for Solving Linear Systems

In this paper we show how to accelerate randomized coordinate descent methods and achieve faster convergence rates without paying per-iteration costs in asymptotic running time. In particular, we show how to generalize and efficiently implement a method proposed by Nesterov, giving faster asymptotic running times for various algorithms that use standard coordinate descent as a black box. In addition to providing a proof of convergence for this new general method, we show that it is numerically stable, efficiently implementable, and in certain regimes, asymptotically optimal. To highlight the computational power of this algorithm, we show how it can used to create faster linear system solvers in several regimes: - We show how this method achieves a faster asymptotic runtime than conjugate gradient for solving a broad class of symmetric positive definite systems of equations. - We improve the best known asymptotic convergence guarantees for Kaczmarz methods, a popular technique for image reconstruction and solving overdetermined systems of equations, by accelerating a randomized algorithm of Strohmer and Vershynin. - We achieve the best known running time for solving Symmetric Diagonally Dominant (SDD) system of equations in the unit-cost RAM model, obtaining an O(m log^{3/2} n (log log n)^{1/2} log (log n / eps)) asymptotic running time by accelerating a recent solver by Kelner et al. Beyond the independent interest of these solvers, we believe they highlight the versatility of the approach of this paper and we hope that they will open the door for further algorithmic improvements in the future.

Aaron Sidford

What is connected

Connect this record

See the researcher in context

Building this map preview

39 published item(s)

Convex optimization with $p$-norm oracles

Solving Matrix Games with Near-Optimal Matvec Complexity

Quantum Speedups for Zero-Sum Games via Improved Dynamic Gibbs Sampling

High-precision Estimation of Random Walks in Small Space

Improved Lower Bounds for Submodular Function Minimization

RECAPP: Crafting a More Efficient Catalyst for Convex Optimization

Semi-Random Sparse Recovery in Nearly-Linear Time

Sharper Rates for Separable Minimax and Finite Sum Optimization via Primal-Dual Extragradient Methods

Complexity of Highly Parallel Non-Smooth Convex Optimization

A General Framework for Symmetric Property Estimation

Acceleration with a Ball Optimization Oracle

Constant Girth Approximation for Directed Graphs in Subquadratic Time

Coordinate Methods for Accelerating $\ell_\infty$ Regression and Faster Approximate Maximum Flow

Coordinate Methods for Matrix Games

Efficiently Solving MDPs with Stochastic Mirror Descent

Faster Divergence Maximization for Faster Maximum Flow

Solving Linear Programs with Sqrt(rank) Linear System Solves

The Bethe and Sinkhorn Permanents of Low Rank Matrices and Implications for Profile Maximum Likelihood

Towards Optimal Running Times for Optimal Transport

Almost-Linear-Time Algorithms for Markov Chains and New Spectral Primitives for Directed Graphs

Efficient Algorithms for Large-scale Generalized Eigenvector Computation and Canonical Correlation Analysis

Faster Algorithms for Computing the Stationary Distribution, Simulating Random Walks, and More

Faster Eigenvector Computation via Shift-and-Invert Preconditioning

Geometric Median in Nearly Linear Time

Robust Shift-and-Invert Preconditioning: Faster and More Sample Efficient Algorithms for Eigenvector Computation

Routing under Balance

Streaming PCA: Matching Matrix Bernstein and Near-Optimal Finite Sample Guarantees for Oja's Algorithm

Subquadratic Submodular Function Minimization

A Faster Cutting Plane Method and its Implications for Combinatorial and Convex Optimization

Competing with the Empirical Risk Minimizer in a Single Pass

Efficient Inverse Maintenance and Faster Algorithms for Linear Programming

Path Finding I :Solving Linear Programs with Õ(sqrt(rank)) Linear System Solves

Path Finding II : An Õ(m sqrt(n)) Algorithm for the Minimum Cost Flow Problem

Single Pass Spectral Sparsification in Dynamic Streams

Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization

Uniform Sampling for Matrix Approximation

A Simple, Combinatorial Algorithm for Solving SDD Systems in Nearly-Linear Time

An Almost-Linear-Time Algorithm for Approximate Max Flow in Undirected Graphs, and its Multicommodity Generalizations

Efficient Accelerated Coordinate Descent Methods and Faster Algorithms for Solving Linear Systems