Source author record

Richard Peng

Richard Peng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Numerical Analysis Machine Learning math.NA Distributed, Parallel, and Cluster Computing Social and Information Networks Discrete Mathematics Computation Computational Geometry Computer Vision math.OC math.PR physics.soc-ph

Catalog footprint

What is connected

39works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Maximum Flow and Minimum-Cost Flow in Almost-Linear Time

We give an algorithm that computes exact maximum flows and minimum-cost flows on directed graphs with $m$ edges and polynomially bounded integral demands, costs, and capacities in $m^{1+o(1)}$ time. Our algorithm builds the flow through a sequence of $m^{1+o(1)}$ approximate undirected minimum-ratio cycles, each of which is computed and processed in amortized $m^{o(1)}$ time using a new dynamic graph data structure. Our framework extends to algorithms running in $m^{1+o(1)}$ time for computing flows that minimize general edge-separable convex functions to high accuracy. This gives almost-linear time algorithms for several problems including entropy-regularized optimal transport, matrix scaling, $p$-norm flows, and $p$-norm isotonic regression on arbitrary directed acyclic graphs.

preprint2022arXiv

Minor Sparsifiers and the Distributed Laplacian Paradigm

We study distributed algorithms built around minor-based vertex sparsifiers, and give the first algorithm in the CONGEST model for solving linear systems in graph Laplacian matrices to high accuracy. Our Laplacian solver has a round complexity of $O(n^{o(1)}(\sqrt{n}+D))$, and thus almost matches the lower bound of $\widetildeΩ(\sqrt{n}+D)$, where $n$ is the number of nodes in the network and $D$ is its diameter. We show that our distributed solver yields new sublinear round algorithms for several cornerstone problems in combinatorial optimization. This is achieved by leveraging the powerful algorithmic framework of Interior Point Methods (IPMs) and the Laplacian paradigm in the context of distributed graph algorithms, which entails numerically solving optimization problems on graphs via a series of Laplacian systems. Problems that benefit from our distributed algorithmic paradigm include exact mincost flow, negative weight shortest paths, maxflow, and bipartite matching on sparse directed graphs. For the maxflow problem, this is the first exact distributed algorithm that applies to directed graphs, while the previous work by [Ghaffari et al. SICOMP'18] considered the approximate setting and works only for undirected graphs. For the mincost flow and the negative weight shortest path problems, our results constitute the first exact distributed algorithms running in a sublinear number of rounds. Given that the hybrid between IPMs and the Laplacian paradigm has proven useful for tackling numerous optimization problems in the centralized setting, we believe that our distributed solver will find future applications.

preprint2022arXiv

Nested Dissection Meets IPMs: Planar Min-Cost Flow in Nearly-Linear Time

We present a nearly-linear time algorithm for finding a minimum-cost flow in planar graphs with polynomially bounded integer costs and capacities. The previous fastest algorithm for this problem is based on interior point methods (IPMs) and works for general sparse graphs in $O(n^{1.5}\text{poly}(\log n))$ time [Daitch-Spielman, STOC'08]. Intuitively, $Ω(n^{1.5})$ is a natural runtime barrier for IPM-based methods, since they require $\sqrt{n}$ iterations, each routing a possibly-dense electrical flow. To break this barrier, we develop a new implicit representation for flows based on generalized nested-dissection [Lipton-Rose-Tarjan, JSTOR'79] and approximate Schur complements [Kyng-Sachdeva, FOCS'16]. This implicit representation permits us to design a data structure to route an electrical flow with sparse demands in roughly $\sqrt{n}$ update time, resulting in a total running time of $O(n\cdot\text{poly}(\log n))$. Our results immediately extend to all families of separable graphs.

preprint2021arXiv

Solving Sparse Linear Systems Faster than Matrix Multiplication

Can linear systems be solved faster than matrix multiplication? While there has been remarkable progress for the special cases of graph structured linear systems, in the general setting, the bit complexity of solving an $n \times n$ linear system $Ax=b$ is $\tilde{O}(n^ω)$, where $ω< 2.372864$ is the matrix multiplication exponent. Improving on this has been an open problem even for sparse linear systems with poly$(n)$ condition number. In this paper, we present an algorithm that solves linear systems in sparse matrices asymptotically faster than matrix multiplication for any $ω> 2$. This speedup holds for any input matrix $A$ with $o(n^{ω-1}/\log(κ(A)))$ non-zeros, where $κ(A)$ is the condition number of $A$. For poly$(n)$-conditioned matrices with $\tilde{O}(n)$ nonzeros, and the current value of $ω$, the bit complexity of our algorithm to solve to within any $1/\text{poly}(n)$ error is $O(n^{2.331645})$. Our algorithm can be viewed as an efficient, randomized implementation of the block Krylov method via recursive low displacement rank factorizations. It is inspired by the algorithm of [Eberly et al. ISSAC `06 `07] for inverting matrices over finite fields. In our analysis of numerical stability, we develop matrix anti-concentration techniques to bound the smallest eigenvalue and the smallest gap in eigenvalues of semi-random matrices.

preprint2020arXiv

A Deterministic Algorithm for Balanced Cut with Applications to Dynamic Connectivity, Flows, and Beyond

We consider the classical Minimum Balanced Cut problem: given a graph $G$, compute a partition of its vertices into two subsets of roughly equal volume, while minimizing the number of edges connecting the subsets. We present the first {\em deterministic, almost-linear time} approximation algorithm for this problem. Specifically, our algorithm, given an $n$-vertex $m$-edge graph $G$ and any parameter $1\leq r\leq O(\log n)$, computes a $(\log m)^{r^2}$-approximation for Minimum Balanced Cut on $G$, in time $O\left ( m^{1+O(1/r)+o(1)}\cdot (\log m)^{O(r^2)}\right )$. In particular, we obtain a $(\log m)^{1/ε}$-approximation in time $m^{1+O(1/\sqrtε)}$ for any constant $ε$, and a $(\log m)^{f(m)}$-approximation in time $m^{1+o(1)}$, for any slowly growing function $m$. We obtain deterministic algorithms with similar guarantees for the Sparsest Cut and the Lowest-Conductance Cut problems. Our algorithm for the Minimum Balanced Cut problem in fact provides a stronger guarantee: it either returns a balanced cut whose value is close to a given target value, or it certifies that such a cut does not exist by exhibiting a large subgraph of $G$ that has high conductance. We use this algorithm to obtain deterministic algorithms for dynamic connectivity and minimum spanning forest, whose worst-case update time on an $n$-vertex graph is $n^{o(1)}$, thus resolving a major open problem in the area of dynamic graph algorithms. Our work also implies deterministic algorithms for a host of additional problems, whose time complexities match, up to subpolynomial in $n$ factors, those of known randomized algorithms. The implications include almost-linear time deterministic algorithms for solving Laplacian systems and for approximating maximum flows in undirected graphs.

preprint2020arXiv

A Study of Performance of Optimal Transport

We investigate the problem of efficiently computing optimal transport (OT) distances, which is equivalent to the node-capacitated minimum cost maximum flow problem in a bipartite graph. We compare runtimes in computing OT distances on data from several domains, such as synthetic data of geometric shapes, embeddings of tokens in documents, and pixels in images. We show that in practice, combinatorial methods such as network simplex and augmenting path based algorithms can consistently outperform numerical matrix-scaling based methods such as Sinkhorn [Cuturi'13] and Greenkhorn [Altschuler et al'17], even in low accuracy regimes, with up to orders of magnitude speedups. Lastly, we present a new combinatorial algorithm that improves upon the classical Kuhn-Munkres algorithm.

preprint2020arXiv

Fast Dynamic Cuts, Distances and Effective Resistances via Vertex Sparsifiers

We present a general framework of designing efficient dynamic approximate algorithms for optimization on undirected graphs. In particular, we develop a technique that, given any problem that admits a certain notion of vertex sparsifiers, gives data structures that maintain approximate solutions in sub-linear update and query time. We illustrate the applicability of our paradigm to the following problems. (1) A fully-dynamic algorithm that approximates all-pair maximum-flows/minimum-cuts up to a nearly logarithmic factor in $\tilde{O}(n^{2/3})$ amortized time against an oblivious adversary, and $\tilde{O}(m^{3/4})$ time against an adaptive adversary. (2) An incremental data structure that maintains $O(1)$-approximate shortest path in $n^{o(1)}$ time per operation, as well as fully dynamic approximate all-pair shortest path and transshipment in $\tilde{O}(n^{2/3+o(1)})$ amortized time per operation. (3) A fully-dynamic algorithm that approximates all-pair effective resistance up to an $(1+ε)$ factor in $\tilde{O}(n^{2/3+o(1)} ε^{-O(1)})$ amortized update time per operation. The key tool behind result (1) is the dynamic maintenance of an algorithmic construction due to Madry [FOCS' 10], which partitions a graph into a collection of simpler graph structures (known as j-trees) and approximately captures the cut-flow and metric structure of the graph. The $O(1)$-approximation guarantee of (2) is by adapting the distance oracles by [Thorup-Zwick JACM `05]. Result (3) is obtained by invoking the random-walk based spectral vertex sparsifier by [Durfee et al. STOC `19] in a hierarchical manner, while carefully keeping track of the recourse among levels in the hierarchy.

preprint2020arXiv

Fast, Provably convergent IRLS Algorithm for p-norm Linear Regression

Linear regression in $\ell_p$-norm is a canonical optimization problem that arises in several applications, including sparse recovery, semi-supervised learning, and signal processing. Generic convex optimization algorithms for solving $\ell_p$-regression are slow in practice. Iteratively Reweighted Least Squares (IRLS) is an easy to implement family of algorithms for solving these problems that has been studied for over 50 years. However, these algorithms often diverge for p > 3, and since the work of Osborne (1985), it has been an open problem whether there is an IRLS algorithm that is guaranteed to converge rapidly for p > 3. We propose p-IRLS, the first IRLS algorithm that provably converges geometrically for any $p \in [2,\infty).$ Our algorithm is simple to implement and is guaranteed to find a $(1+\varepsilon)$-approximate solution in $O(p^{3.5} m^{\frac{p-2}{2(p-1)}} \log \frac{m}{\varepsilon}) \le O_p(\sqrt{m} \log \frac{m}{\varepsilon} )$ iterations. Our experiments demonstrate that it performs even better than our theoretical bounds, beats the standard Matlab/CVX implementation for solving these problems by 10--50x, and is the fastest among available implementations in the high-accuracy regime.

preprint2020arXiv

Vertex Sparsification for Edge Connectivity

Graph compression or sparsification is a basic information-theoretic and computational question. A major open problem in this research area is whether $(1+ε)$-approximate cut-preserving vertex sparsifiers with size close to the number of terminals exist. As a step towards this goal, we study a thresholded version of the problem: for a given parameter $c$, find a smaller graph, which we call connectivity-$c$ mimicking network, which preserves connectivity among $k$ terminals exactly up to the value of $c$. We show that connectivity-$c$ mimicking networks with $O(kc^4)$ edges exist and can be found in time $m(c\log n)^{O(c)}$. We also give a separate algorithm that constructs such graphs with $k \cdot O(c)^{2c}$ edges in time $mc^{O(c)}\log^{O(1)}n$. These results lead to the first data structures for answering fully dynamic offline $c$-edge-connectivity queries for $c \ge 4$ in polylogarithmic time per query, as well as more efficient algorithms for survivable network design on bounded treewidth graphs.

preprint2016arXiv

A Framework for Analyzing Resparsification Algorithms

A spectral sparsifier of a graph $G$ is a sparser graph $H$ that approximately preserves the quadratic form of $G$, i.e. for all vectors $x$, $x^T L_G x \approx x^T L_H x$, where $L_G$ and $L_H$ denote the respective graph Laplacians. Spectral sparsifiers generalize cut sparsifiers, and have found many applications in designing graph algorithms. In recent years, there has been interest in computing spectral sparsifiers in semi-streaming and dynamic settings. Natural algorithms in these settings often involve repeated sparsification of a graph, and accumulation of errors across these steps. We present a framework for analyzing algorithms that perform repeated sparsifications that only incur error corresponding to a single sparsification step, leading to better results for many resparsification-based algorithms. As an application, we show how to maintain a spectral sparsifier in the semi-streaming setting: We present a simple algorithm that, for a graph $G$ on $n$ vertices and $m$ edges, computes a spectral sparsifier of $G$ with $O(n \log n)$ edges in a single pass over $G$, using only $O(n \log n)$ space, and $O(m \log^2 n)$ total time. This improves on previous best semi-streaming algorithms for both spectral and cut sparsifiers by a factor of $\log{n}$ in both space and runtime. The algorithm extends to semi-streaming row sampling for general PSD matrices. We also use our framework to combine a spectral sparsification algorithm by Koutis with improved spanner constructions to give a parallel algorithm for constructing $O(n\log^2{n}\log\log{n})$ sized spectral sparsifiers in $O(m\log^2{n}\log\log{n})$ time. This is the best known combinatorial graph sparsification algorithm.The size of the sparsifiers is only a factor $\log{n}\log\log{n}$ more than ones produced by numerical routines.

preprint2016arXiv

Almost-Linear-Time Algorithms for Markov Chains and New Spectral Primitives for Directed Graphs

In this paper we introduce a notion of spectral approximation for directed graphs. While there are many potential ways one might define approximation for directed graphs, most of them are too strong to allow sparse approximations in general. In contrast, we prove that for our notion of approximation, such sparsifiers do exist, and we show how to compute them in almost linear time. Using this notion of approximation, we provide a general framework for solving asymmetric linear systems that is broadly inspired by the work of [Peng-Spielman, STOC`14]. Applying this framework in conjunction with our sparsification algorithm, we obtain an almost linear time algorithm for solving directed Laplacian systems associated with Eulerian Graphs. Using this solver in the recent framework of [Cohen-Kelner-Peebles-Peng-Sidford-Vladu, FOCS`16], we obtain almost linear time algorithms for solving a directed Laplacian linear system, computing the stationary distribution of a Markov chain, computing expected commute times in a directed graph, and more. For each of these problems, our algorithms improves the previous best running times of $O((nm^{3/4} + n^{2/3} m) \log^{O(1)} (n κε^{-1}))$ to $O((m + n2^{O(\sqrt{\log{n}\log\log{n}})}) \log^{O(1)} (n κε^{-1}))$ where $n$ is the number of vertices in the graph, $m$ is the number of edges, $κ$ is a natural condition number associated with the problem, and $ε$ is the desired accuracy. We hope these results open the door for further studies into directed spectral graph theory, and will serve as a stepping stone for designing a new generation of fast algorithms for directed graphs.

preprint2016arXiv

An Empirical Study of Cycle Toggling Based Laplacian Solvers

We study the performance of linear solvers for graph Laplacians based on the combinatorial cycle adjustment methodology proposed by [Kelner-Orecchia-Sidford-Zhu STOC-13]. The approach finds a dual flow solution to this linear system through a sequence of flow adjustments along cycles. We study both data structure oriented and recursive methods for handling these adjustments. The primary difficulty faced by this approach, updating and querying long cycles, motivated us to study an important special case: instances where all cycles are formed by fundamental cycles on a length $n$ path. Our methods demonstrate significant speedups over previous implementations, and are competitive with standard numerical routines.

preprint2016arXiv

Faster Algorithms for Computing the Stationary Distribution, Simulating Random Walks, and More

In this paper, we provide faster algorithms for computing various fundamental quantities associated with random walks on a directed graph, including the stationary distribution, personalized PageRank vectors, hitting times, and escape probabilities. In particular, on a directed graph with $n$ vertices and $m$ edges, we show how to compute each quantity in time $\tilde{O}(m^{3/4}n+mn^{2/3})$, where the $\tilde{O}$ notation suppresses polylogarithmic factors in $n$, the desired accuracy, and the appropriate condition number (i.e. the mixing time or restart probability). Our result improves upon the previous fastest running times for these problems; previous results either invoke a general purpose linear system solver on a $n\times n$ matrix with $m$ non-zero entries, or depend polynomially on the desired error or natural condition number associated with the problem (i.e. the mixing time or restart probability). For sparse graphs, we obtain a running time of $\tilde{O}(n^{7/4})$, breaking the $O(n^{2})$ barrier of the best running time one could hope to achieve using fast matrix multiplication. We achieve our result by providing a similar running time improvement for solving directed Laplacian systems, a natural directed or asymmetric analog of the well studied symmetric or undirected Laplacian systems. We show how to solve such systems in time $\tilde{O}(m^{3/4}n+mn^{2/3})$, and efficiently reduce a broad range of problems to solving $\tilde{O}(1)$ directed Laplacian systems on Eulerian graphs. We hope these results and our analysis open the door for further study into directed spectral graph theory.

preprint2016arXiv

Faster and Simpler Width-Independent Parallel Algorithms for Positive Semidefinite Programming

This paper studies the problem of finding an $(1+ε)$-approximate solution to positive semidefinite programs. These are semidefinite programs in which all matrices in the constraints and objective are positive semidefinite and all scalars are non-negative. We present a simpler \NC parallel algorithm that on input with $n$ constraint matrices, requires $O(\frac{1}{ε^3} log^3 n)$ iterations, each of which involves only simple matrix operations and computing the trace of the product of a matrix exponential and a positive semidefinite matrix. Further, given a positive SDP in a factorized form, the total work of our algorithm is nearly-linear in the number of non-zero entries in the factorization.

preprint2016arXiv

Scalable Constrained Clustering: A Generalized Spectral Method

We present a simple spectral approach to the well-studied constrained clustering problem. It captures constrained clustering as a generalized eigenvalue problem with graph Laplacians. The algorithm works in nearly-linear time and provides concrete guarantees for the quality of the clusters, at least for the case of 2-way partitioning. In practice this translates to a very fast implementation that consistently outperforms existing spectral approaches both in speed and quality.

preprint2015arXiv

Approximate Undirected Maximum Flows in O(m polylog(n)) Time

We give the first O(m polylog(n)) time algorithms for approximating maximum flows in undirected graphs and constructing polylog(n) -quality cut-approximating hierarchical tree decompositions. Our algorithm invokes existing algorithms for these two problems recursively while gradually incorporating size reductions. These size reductions are in turn obtained via ultra-sparsifiers, which are key tools in solvers for symmetric diagonally dominant (SDD) linear systems.

preprint2015arXiv

Improved Parallel Algorithms for Spanners and Hopsets

We use exponential start time clustering to design faster and more work-efficient parallel graph algorithms involving distances. Previous algorithms usually rely on graph decomposition routines with strict restrictions on the diameters of the decomposed pieces. We weaken these bounds in favor of stronger local probabilistic guarantees. This allows more direct analyses of the overall process, giving: * Linear work parallel algorithms that construct spanners with $O(k)$ stretch and size $O(n^{1+1/k})$ in unweighted graphs, and size $O(n^{1+1/k} \log k)$ in weighted graphs. * Hopsets that lead to the first parallel algorithm for approximating shortest paths in undirected graphs with $O(m\;\mathrm{polylog}\;n)$ work.

preprint2015arXiv

Sparsified Cholesky and Multigrid Solvers for Connection Laplacians

We introduce the sparsified Cholesky and sparsified multigrid algorithms for solving systems of linear equations. These algorithms accelerate Gaussian elimination by sparsifying the nonzero matrix entries created by the elimination process. We use these new algorithms to derive the first nearly linear time algorithms for solving systems of equations in connection Laplacians, a generalization of Laplacian matrices that arise in many problems in image and signal processing. We also prove that every connection Laplacian has a linear sized approximate inverse. This is an LU factorization with a linear number of nonzero entries that is a strong approximation of the original matrix. Using such a factorization one can solve systems of equations in a connection Laplacian in linear time. Such a factorization was unknown even for ordinary graph Laplacians.

preprint2015arXiv

Sparsified Cholesky Solvers for SDD linear systems

We show that Laplacian and symmetric diagonally dominant (SDD) matrices can be well approximated by linear-sized sparse Cholesky factorizations. We show that these matrices have constant-factor approximations of the form $L L^{T}$, where $L$ is a lower-triangular matrix with a number of nonzero entries linear in its dimension. Furthermore linear systems in $L$ and $L^{T}$ can be solved in $O (n)$ work and $O(\log{n}\log^2\log{n})$ depth, where $n$ is the dimension of the matrix. We present nearly linear time algorithms that construct solvers that are almost this efficient. In doing so, we give the first nearly-linear work routine for constructing spectral vertex sparsifiers---that is, spectral approximations of Schur complements of Laplacian matrices.

preprint2015arXiv

Spectral Sparsification of Random-Walk Matrix Polynomials

We consider a fundamental algorithmic question in spectral graph theory: Compute a spectral sparsifier of random-walk matrix-polynomial $$L_α(G)=D-\sum_{r=1}^dα_rD(D^{-1}A)^r$$ where $A$ is the adjacency matrix of a weighted, undirected graph, $D$ is the diagonal matrix of weighted degrees, and $α=(α_1...α_d)$ are nonnegative coefficients with $\sum_{r=1}^dα_r=1$. Recall that $D^{-1}A$ is the transition matrix of random walks on the graph. The sparsification of $L_α(G)$ appears to be algorithmically challenging as the matrix power $(D^{-1}A)^r$ is defined by all paths of length $r$, whose precise calculation would be prohibitively expensive. In this paper, we develop the first nearly linear time algorithm for this sparsification problem: For any $G$ with $n$ vertices and $m$ edges, $d$ coefficients $α$, and $ε> 0$, our algorithm runs in time $O(d^2m\log^2n/ε^{2})$ to construct a Laplacian matrix $\tilde{L}=D-\tilde{A}$ with $O(n\log n/ε^{2})$ non-zeros such that $\tilde{L}\approx_εL_α(G)$. Matrix polynomials arise in mathematical analysis of matrix functions as well as numerical solutions of matrix equations. Our work is particularly motivated by the algorithmic problems for speeding up the classic Newton's method in applications such as computing the inverse square-root of the precision matrix of a Gaussian random field, as well as computing the $q$th-root transition (for $q\geq1$) in a time-reversible Markov model. The key algorithmic step for both applications is the construction of a spectral sparsifier of a constant degree random-walk matrix-polynomials introduced by Newton's method. Our algorithm can also be used to build efficient data structures for effective resistances for multi-step time-reversible Markov models, and we anticipate that it could be useful for other tasks in network analysis.

preprint2014arXiv

$\ell_p$ Row Sampling by Lewis Weights

We give a simple algorithm to efficiently sample the rows of a matrix while preserving the p-norms of its product with vectors. Given an $n$-by-$d$ matrix $\boldsymbol{\mathit{A}}$, we find with high probability and in input sparsity time an $\boldsymbol{\mathit{A}}'$ consisting of about $d \log{d}$ rescaled rows of $\boldsymbol{\mathit{A}}$ such that $\| \boldsymbol{\mathit{A}} \boldsymbol{\mathit{x}} \|_1$ is close to $\| \boldsymbol{\mathit{A}}' \boldsymbol{\mathit{x}} \|_1$ for all vectors $\boldsymbol{\mathit{x}}$. We also show similar results for all $\ell_p$ that give nearly optimal sample bounds in input sparsity time. Our results are based on sampling by "Lewis weights", which can be viewed as statistical leverage scores of a reweighted matrix. We also give an elementary proof of the guarantees of this sampling process for $\ell_1$.

preprint2014arXiv

A Generalized Cheeger Inequality

The generalized conductance $ϕ(G,H)$ between two graphs $G$ and $H$ on the same vertex set $V$ is defined as the ratio $$ ϕ(G,H) = \min_{S\subseteq V} \frac{cap_G(S,\bar{S})}{ cap_H(S,\bar{S})}, $$ where $cap_G(S,\bar{S})$ is the total weight of the edges crossing from $S$ to $\bar{S}=V-S$. We show that the minimum generalized eigenvalue $λ(L_G,L_H)$ of the pair of Laplacians $L_G$ and $L_H$ satisfies $$ λ(L_G,L_H) \geq ϕ(G,H) ϕ(G)/8, $$ where $ϕ(G)$ is the usual conductance of $G$. A generalized cut that meets this bound can be obtained from the generalized eigenvector corresponding to $λ(L_G,L_H)$. The inequality complements a recent proof that $ϕ(G)$ cannot be replaced by $Θ(ϕ(G,H))$ in the above inequality, unless the Unique Games Conjecture is false.

preprint2014arXiv

Preconditioning in Expectation

We show that preconditioners constructed by random sampling can perform well without meeting the standard requirements of iterative methods. When applied to graph Laplacians, this leads to ultra-sparsifiers that in expectation behave as the nearly-optimal ones given by [Kolla-Makarychev-Saberi-Teng STOC`10]. Combining this with the recursive preconditioning framework by [Spielman-Teng STOC`04] and improved embedding algorithms, this leads to algorithms that solve symmetric diagonally dominant linear systems and electrical flow problems in expected time close to $m\log^{1/2}n$ .

preprint2014arXiv

Scalable Parallel Factorizations of SDD Matrices and Efficient Sampling for Gaussian Graphical Models

Motivated by a sampling problem basic to computational statistical inference, we develop a nearly optimal algorithm for a fundamental problem in spectral graph theory and numerical analysis. Given an $n\times n$ SDDM matrix ${\bf \mathbf{M}}$, and a constant $-1 \leq p \leq 1$, our algorithm gives efficient access to a sparse $n\times n$ linear operator $\tilde{\mathbf{C}}$ such that $${\mathbf{M}}^{p} \approx \tilde{\mathbf{C}} \tilde{\mathbf{C}}^\top.$$ The solution is based on factoring ${\bf \mathbf{M}}$ into a product of simple and sparse matrices using squaring and spectral sparsification. For ${\mathbf{M}}$ with $m$ non-zero entries, our algorithm takes work nearly-linear in $m$, and polylogarithmic depth on a parallel machine with $m$ processors. This gives the first sampling algorithm that only requires nearly linear work and $n$ i.i.d. random univariate Gaussian samples to generate i.i.d. random samples for $n$-dimensional Gaussian random fields with SDDM precision matrices. For sampling this natural subclass of Gaussian random fields, it is optimal in the randomness and nearly optimal in the work and parallel complexity. In addition, our sampling algorithm can be directly extended to Gaussian random fields with SDD precision matrices.

preprint2014arXiv

Stretching Stretch

We give a generalized definition of stretch that simplifies the efficient construction of low-stretch embeddings suitable for graph algorithms. The generalization, based on discounting highly stretched edges by taking their $p$-th power for some $0 < p < 1$, is directly related to performances of existing algorithms. This discounting of high-stretch edges allows us to treat many classes of edges with coarser granularity. It leads to a two-pass approach that combines bottom-up clustering and top-down decompositions to construct these embeddings in $\mathcal{O}(m\log\log{n})$ time. Our algorithm parallelizes readily and can also produce generalizations of low-stretch subgraphs.

preprint2014arXiv

Uniform Sampling for Matrix Approximation

Random sampling has become a critical tool in solving massive matrix problems. For linear regression, a small, manageable set of data rows can be randomly selected to approximate a tall, skinny data matrix, improving processing time significantly. For theoretical performance guarantees, each row must be sampled with probability proportional to its statistical leverage score. Unfortunately, leverage scores are difficult to compute. A simple alternative is to sample rows uniformly at random. While this often works, uniform sampling will eliminate critical row information for many natural instances. We take a fresh look at uniform sampling by examining what information it does preserve. Specifically, we show that uniform sampling yields a matrix that, in some sense, well approximates a large fraction of the original. While this weak form of approximation is not enough for solving linear regression directly, it is enough to compute a better approximation. This observation leads to simple iterative row sampling algorithms for matrix approximation that run in input-sparsity time and preserve row structure and sparsity at all intermediate steps. In addition to an improved understanding of uniform sampling, our main proof introduces a structural result of independent interest: we show that every matrix can be made to have low coherence by reweighting a small subset of its rows.

preprint2013arXiv

An Efficient Parallel Solver for SDD Linear Systems

We present the first parallel algorithm for solving systems of linear equations in symmetric, diagonally dominant (SDD) matrices that runs in polylogarithmic time and nearly-linear work. The heart of our algorithm is a construction of a sparse approximate inverse chain for the input matrix: a sequence of sparse matrices whose product approximates its inverse. Whereas other fast algorithms for solving systems of equations in SDD matrices exploit low-stretch spanning trees, our algorithm only requires spectral graph sparsifiers.

preprint2013arXiv

Faster spectral sparsification and numerical algorithms for SDD matrices

We study algorithms for spectral graph sparsification. The input is a graph $G$ with $n$ vertices and $m$ edges, and the output is a sparse graph $\tilde{G}$ that approximates $G$ in an algebraic sense. Concretely, for all vectors $x$ and any $ε>0$, $\tilde{G}$ satisfies $$ (1-ε) x^T L_G x \leq x^T L_{\tilde{G}} x \leq (1+ε) x^T L_G x, $$ where $L_G$ and $L_{\tilde{G}}$ are the Laplacians of $G$ and $\tilde{G}$ respectively. We show that the fastest known algorithm for computing a sparsifier with $O(n\log n/ε^2)$ edges can actually run in $\tilde{O}(m\log^2 n)$ time, an $O(\log n)$ factor faster than before. We also present faster sparsification algorithms for slightly dense graphs. Specifically, we give an algorithm that runs in $\tilde{O}(m\log n)$ time and generates a sparsifier with $\tilde{O}(n\log^3{n}/ε^2)$ edges. This implies that a sparsifier with $O(n\log n/ε^2)$ edges can be computed in $\tilde{O}(m\log n)$ time for graphs with more than $O(n\log^4 n)$ edges. We also give an $\tilde{O}(m)$ time algorithm for graphs with more than $n\log^5 n (\log \log n)^3$ edges of polynomially bounded weights, and an $O(m)$ algorithm for unweighted graphs with more than $n\log^8 n (\log \log n)^3 $ edges and $n\log^{10} n (\log \log n)^5$ edges in the weighted case. The improved sparsification algorithms are employed to accelerate linear system solvers and algorithms for computing fundamental eigenvectors of slightly dense SDD matrices.

preprint2013arXiv

Fully Dynamic $(1+ε)$-Approximate Matchings

We present the first data structures that maintain near optimal maximum cardinality and maximum weighted matchings on sparse graphs in sublinear time per update. Our main result is a data structure that maintains a $(1+ε)$ approximation of maximum matching under edge insertions/deletions in worst case $O(\sqrt{m}ε^{-2})$ time per update. This improves the 3/2 approximation given in [Neiman,Solomon,STOC 2013] which runs in similar time. The result is based on two ideas. The first is to re-run a static algorithm after a chosen number of updates to ensure approximation guarantees. The second is to judiciously trim the graph to a smaller equivalent one whenever possible. We also study extensions of our approach to the weighted setting, and combine it with known frameworks to obtain arbitrary approximation ratios. For a constant $ε$ and for graphs with edge weights between 1 and N, we design an algorithm that maintains an $(1+ε)$-approximate maximum weighted matching in $O(\sqrt{m} \log N)$ time per update. The only previous result for maintaining weighted matchings on dynamic graphs has an approximation ratio of 4.9108, and was shown in [Anand,Baswana,Gupta,Sen, FSTTCS 2012, arXiv 2012].

preprint2013arXiv

Iterative Row Sampling

There has been significant interest and progress recently in algorithms that solve regression problems involving tall and thin matrices in input sparsity time. These algorithms find shorter equivalent of a n*d matrix where n >> d, which allows one to solve a poly(d) sized problem instead. In practice, the best performances are often obtained by invoking these routines in an iterative fashion. We show these iterative methods can be adapted to give theoretical guarantees comparable and better than the current state of the art. Our approaches are based on computing the importances of the rows, known as leverage scores, in an iterative manner. We show that alternating between computing a short matrix estimate and finding more accurate approximate leverage scores leads to a series of geometrically smaller instances. This gives an algorithm that runs in $O(nnz(A) + d^{ω+ θ} ε^{-2})$ time for any $θ> 0$, where the $d^{ω+ θ}$ term is comparable to the cost of solving a regression problem on the small approximation. Our results are built upon the close connection between randomized matrix algorithms, iterative methods, and graph sparsification.

preprint2013arXiv

Parallel Graph Decompositions Using Random Shifts

We show an improved parallel algorithm for decomposing an undirected unweighted graph into small diameter pieces with a small fraction of the edges in between. These decompositions form critical subroutines in a number of graph algorithms. Our algorithm builds upon the shifted shortest path approach introduced in [Blelloch, Gupta, Koutis, Miller, Peng, Tangwongsan, SPAA 2011]. By combining various stages of the previous algorithm, we obtain a significantly simpler algorithm with the same asymptotic guarantees as the best sequential algorithm.

preprint2012arXiv

Approximate Maximum Flow on Separable Undirected Graphs

We present faster algorithms for approximate maximum flow in undirected graphs with good separator structures, such as bounded genus, minor free, and geometric graphs. Given such a graph with $n$ vertices, $m$ edges along with a recursive $\sqrt{n}$-vertex separator structure, our algorithm finds an $1-ε$ approximate maximum flow in time $\tilde{O}(m^{6/5} \poly{ε^{-1}})$, ignoring poly-logarithmic terms. Similar speedups are also achieved for separable graphs with larger size separators albeit with larger run times. These bounds also apply to image problems in two and three dimensions. Key to our algorithm is an intermediate problem that we term grouped $L_2$ flow, which exists between maximum flows and electrical flows. Our algorithm also makes use of spectral vertex sparsifiers in order to remove vertices while preserving the energy dissipation of electrical flows. We also give faster spectral vertex sparsification algorithms on well separated graphs, which may be of independent interest.

preprint2012arXiv

Faster Approximate Multicommodity Flow Using Quadratically Coupled Flows

The maximum multicommodity flow problem is a natural generalization of the maximum flow problem to route multiple distinct flows. Obtaining a $1-ε$ approximation to the multicommodity flow problem on graphs is a well-studied problem. In this paper we present an adaptation of recent advances in single-commodity flow algorithms to this problem. As the underlying linear systems in the electrical problems of multicommodity flow problems are no longer Laplacians, our approach is tailored to generate specialized systems which can be preconditioned and solved efficiently using Laplacians. Given an undirected graph with m edges and k commodities, we give algorithms that find $1-ε$ approximate solutions to the maximum concurrent flow problem and the maximum weighted multicommodity flow problem in time $\tilde{O}(m^{4/3}\poly(k,ε^{-1}))$.

preprint2012arXiv

Runtime Guarantees for Regression Problems

We study theoretical runtime guarantees for a class of optimization problems that occur in a wide variety of inference problems. these problems are motivated by the lasso framework and have applications in machine learning and computer vision. Our work shows a close connection between these problems and core questions in algorithmic graph theory. While this connection demonstrates the difficulties of obtaining runtime guarantees, it also suggests an approach of using techniques originally developed for graph algorithms. We then show that most of these problems can be formulated as a grouped least squares problem, and give efficient algorithms for this formulation. Our algorithms rely on routines for solving quadratic minimization problems, which in turn are equivalent to solving linear systems. Finally we present some experimental results on applying our approximation algorithm to image processing problems.

preprint2011arXiv

A nearly-mlogn time solver for SDD linear systems

We present an improved algorithm for solving symmetrically diagonally dominant linear systems. On input of an $n\times n$ symmetric diagonally dominant matrix $A$ with $m$ non-zero entries and a vector $b$ such that $A\bar{x} = b$ for some (unknown) vector $\bar{x}$, our algorithm computes a vector $x$ such that $||{x}-\bar{x}||_A < ε||\bar{x}||_A $ {$||\cdot||_A$ denotes the A-norm} in time $${\tilde O}(m\log n \log (1/ε)).$$ The solver utilizes in a standard way a `preconditioning' chain of progressively sparser graphs. To claim the faster running time we make a two-fold improvement in the algorithm for constructing the chain. The new chain exploits previously unknown properties of the graph sparsification algorithm given in [Koutis,Miller,Peng, FOCS 2010], allowing for stronger preconditioning properties. We also present an algorithm of independent interest that constructs nearly-tight low-stretch spanning trees in time $\tilde{O}(m\log{n})$, a factor of $O(\log{n})$ faster than the algorithm in [Abraham,Bartal,Neiman, FOCS 2008]. This speedup directly reflects on the construction time of the preconditioning chain.

preprint2011arXiv

Near Linear-Work Parallel SDD Solvers, Low-Diameter Decomposition, and Low-Stretch Subgraphs

We present the design and analysis of a near linear-work parallel algorithm for solving symmetric diagonally dominant (SDD) linear systems. On input of a SDD $n$-by-$n$ matrix $A$ with $m$ non-zero entries and a vector $b$, our algorithm computes a vector $\tilde{x}$ such that $\norm[A]{\tilde{x} - A^+b} \leq \vareps \cdot \norm[A]{A^+b}$ in $O(m\log^{O(1)}{n}\log{\frac1ε})$ work and $O(m^{1/3+θ}\log \frac1ε)$ depth for any fixed $θ> 0$. The algorithm relies on a parallel algorithm for generating low-stretch spanning trees or spanning subgraphs. To this end, we first develop a parallel decomposition algorithm that in polylogarithmic depth and $\otilde(|E|)$ work, partitions a graph into components with polylogarithmic diameter such that only a small fraction of the original edges are between the components. This can be used to generate low-stretch spanning trees with average stretch $O(n^α)$ in $O(n^{1+α})$ work and $O(n^α)$ depth. Alternatively, it can be used to generate spanning subgraphs with polylogarithmic average stretch in $\otilde(|E|)$ work and polylogarithmic depth. We apply this subgraph construction to derive a parallel linear system solver. By using this solver in known applications, our results imply improved parallel randomized algorithms for several problems, including single-source shortest paths, maximum flow, minimum-cost flow, and approximate maximum flow.

preprint2010arXiv

Approaching optimality for solving SDD systems

We present an algorithm that on input of an $n$-vertex $m$-edge weighted graph $G$ and a value $k$, produces an {\em incremental sparsifier} $\hat{G}$ with $n-1 + m/k$ edges, such that the condition number of $G$ with $\hat{G}$ is bounded above by $\tilde{O}(k\log^2 n)$, with probability $1-p$. The algorithm runs in time $$\tilde{O}((m \log{n} + n\log^2{n})\log(1/p)).$$ As a result, we obtain an algorithm that on input of an $n\times n$ symmetric diagonally dominant matrix $A$ with $m$ non-zero entries and a vector $b$, computes a vector ${x}$ satisfying $||{x}-A^{+}b||_A<ε||A^{+}b||_A $, in expected time $$\tilde{O}(m\log^2{n}\log(1/ε)).$$ The solver is based on repeated applications of the incremental sparsifier that produces a chain of graphs which is then used as input to a recursive preconditioned Chebyshev iteration.

preprint2010arXiv

Approximate Dynamic Programming using Halfspace Queries and Multiscale Monge decomposition

Let $P=(P_1, P_2, \ldots, P_n)$, $P_i \in \field{R}$ for all $i$, be a signal and let $C$ be a constant. In this work our goal is to find a function $F:[n]\rightarrow \field{R}$ which optimizes the following objective function: $$ \min_{F} \sum_{i=1}^n (P_i-F_i)^2 + C\times |\{i:F_i \neq F_{i+1} \} | $$ The above optimization problem reduces to solving the following recurrence, which can be done efficiently using dynamic programming in $O(n^2)$ time: $$ OPT_i = \min_{0 \leq j \leq i-1} [ OPT_j + \sum_{k=j+1}^i (P_k - (\sum_{m=j+1}^i P_m)/(i-j) )^2 ]+ C $$ The above recurrence arises naturally in applications where we wish to approximate the original signal $P$ with another signal $F$ which consists ideally of few piecewise constant segments. Such applications include database (e.g., histogram construction), speech recognition, biology (e.g., denoising aCGH data) applications and many more. In this work we present two new techniques for optimizing dynamic programming that can handle cost functions not treated by other standard methods. The basis of our first algorithm is the definition of a constant-shifted variant of the objective function that can be efficiently approximated using state of the art methods for range searching. Our technique approximates the optimal value of our objective function within additive $ε$ error and runs in $\tilde{O}(n^{1.5} \log{(\frac{U}ε))}$ time, where $U = \max_i f_i$. The second algorithm we provide solves a similar recurrence within a factor of $ε$ and runs in $O(n \log^2n / ε)$. The new technique introduced by our algorithm is the decomposition of the initial problem into a small (logarithmic) number of Monge optimization subproblems which we can speed up using existing techniques.

preprint2010arXiv

Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning

The number of triangles is a computationally expensive graph statistic which is frequently used in complex network analysis (e.g., transitivity ratio), in various random graph models (e.g., exponential random graph model) and in important real world applications such as spam detection, uncovering of the hidden thematic structure of the Web and link recommendation. Counting triangles in graphs with millions and billions of edges requires algorithms which run fast, use small amount of space, provide accurate estimates of the number of triangles and preferably are parallelizable. In this paper we present an efficient triangle counting algorithm which can be adapted to the semistreaming model. The key idea of our algorithm is to combine the sampling algorithm of Tsourakakis et al. and the partitioning of the set of vertices into a high degree and a low degree subset respectively as in the Alon, Yuster and Zwick work treating each set appropriately. We obtain a running time $O \left(m + \frac{m^{3/2} Δ\log{n}}{t ε^2} \right)$ and an $ε$ approximation (multiplicative error), where $n$ is the number of vertices, $m$ the number of edges and $Δ$ the maximum number of triangles an edge is contained. Furthermore, we show how this algorithm can be adapted to the semistreaming model with space usage $O\left(m^{1/2}\log{n} + \frac{m^{3/2} Δ\log{n}}{t ε^2} \right)$ and a constant number of passes (three) over the graph stream. We apply our methods in various networks with several millions of edges and we obtain excellent results. Finally, we propose a random projection based method for triangle counting and provide a sufficient condition to obtain an estimate with low variance.

Richard Peng

What is connected

Connect this record

See the researcher in context

Building this map preview

39 published item(s)

Maximum Flow and Minimum-Cost Flow in Almost-Linear Time

Minor Sparsifiers and the Distributed Laplacian Paradigm

Nested Dissection Meets IPMs: Planar Min-Cost Flow in Nearly-Linear Time

Solving Sparse Linear Systems Faster than Matrix Multiplication

A Deterministic Algorithm for Balanced Cut with Applications to Dynamic Connectivity, Flows, and Beyond

A Study of Performance of Optimal Transport

Fast Dynamic Cuts, Distances and Effective Resistances via Vertex Sparsifiers

Fast, Provably convergent IRLS Algorithm for p-norm Linear Regression

Vertex Sparsification for Edge Connectivity

A Framework for Analyzing Resparsification Algorithms

Almost-Linear-Time Algorithms for Markov Chains and New Spectral Primitives for Directed Graphs

An Empirical Study of Cycle Toggling Based Laplacian Solvers

Faster Algorithms for Computing the Stationary Distribution, Simulating Random Walks, and More

Faster and Simpler Width-Independent Parallel Algorithms for Positive Semidefinite Programming

Scalable Constrained Clustering: A Generalized Spectral Method

Approximate Undirected Maximum Flows in O(m polylog(n)) Time

Improved Parallel Algorithms for Spanners and Hopsets

Sparsified Cholesky and Multigrid Solvers for Connection Laplacians

Sparsified Cholesky Solvers for SDD linear systems

Spectral Sparsification of Random-Walk Matrix Polynomials

$\ell_p$ Row Sampling by Lewis Weights

A Generalized Cheeger Inequality

Preconditioning in Expectation

Scalable Parallel Factorizations of SDD Matrices and Efficient Sampling for Gaussian Graphical Models

Stretching Stretch

Uniform Sampling for Matrix Approximation

An Efficient Parallel Solver for SDD Linear Systems

Faster spectral sparsification and numerical algorithms for SDD matrices

Fully Dynamic $(1+ε)$-Approximate Matchings

Iterative Row Sampling

Parallel Graph Decompositions Using Random Shifts

Approximate Maximum Flow on Separable Undirected Graphs

Faster Approximate Multicommodity Flow Using Quadratically Coupled Flows

Runtime Guarantees for Regression Problems

A nearly-mlogn time solver for SDD linear systems

Near Linear-Work Parallel SDD Solvers, Low-Diameter Decomposition, and Low-Stretch Subgraphs

Approaching optimality for solving SDD systems

Approximate Dynamic Programming using Halfspace Queries and Multiscale Monge decomposition

Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning