Source author record

Yin Tat Lee

Yin Tat Lee appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms math.OC Machine Learning math.NA Cryptography and Security Discrete Mathematics Numerical Analysis math.PR math.SP Computation Computation and Language Distributed, Parallel, and Cluster Computing math.CO

Catalog footprint

What is connected

36works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A gradient sampling method with complexity guarantees for Lipschitz functions in high and low dimensions

Zhang et al. introduced a novel modification of Goldstein's classical subgradient method, with an efficiency guarantee of $O(\varepsilon^{-4})$ for minimizing Lipschitz functions. Their work, however, makes use of a nonstandard subgradient oracle model and requires the function to be directionally differentiable. In this paper, we show that both of these assumptions can be dropped by simply adding a small random perturbation in each step of their algorithm. The resulting method works on any Lipschitz function whose value and gradient can be evaluated at points of differentiability. We additionally present a new cutting plane algorithm that achieves better efficiency in low dimensions: $O(d\varepsilon^{-3})$ for Lipschitz functions and $O(d\varepsilon^{-2})$ for those that are weakly convex.

preprint2022arXiv

Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient Oracle Complexity

Many fundamental problems in machine learning can be formulated by the convex program \[ \min_{θ\in R^d}\ \sum_{i=1}^{n}f_{i}(θ), \] where each $f_i$ is a convex, Lipschitz function supported on a subset of $d_i$ coordinates of $θ$. One common approach to this problem, exemplified by stochastic gradient descent, involves sampling one $f_i$ term at every iteration to make progress. This approach crucially relies on a notion of uniformity across the $f_i$'s, formally captured by their condition number. In this work, we give an algorithm that minimizes the above convex formulation to $ε$-accuracy in $\widetilde{O}(\sum_{i=1}^n d_i \log (1 /ε))$ gradient computations, with no assumptions on the condition number. The previous best algorithm independent of the condition number is the standard cutting plane method, which requires $O(nd \log (1/ε))$ gradient computations. As a corollary, we improve upon the evaluation oracle complexity for decomposable submodular minimization by Axiotis et al. (ICML 2021). Our main technical contribution is an adaptive procedure to select an $f_i$ term at every iteration via a novel combination of cutting-plane and interior-point methods.

preprint2022arXiv

Differentially Private Fine-tuning of Language Models

We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models, which achieve the state-of-the-art privacy versus utility tradeoffs on many standard NLP tasks. We propose a meta-framework for this problem, inspired by the recent success of highly parameter-efficient methods for fine-tuning. Our experiments show that differentially private adaptations of these approaches outperform previous private algorithms in three important dimensions: utility, privacy, and the computational and memory cost of private training. On many commonly studied datasets, the utility of private models approaches that of non-private models. For example, on the MNLI dataset we achieve an accuracy of $87.8\%$ using RoBERTa-Large and $83.5\%$ using RoBERTa-Base with a privacy budget of $ε= 6.7$. In comparison, absent privacy constraints, RoBERTa-Large achieves an accuracy of $90.2\%$. Our findings are similar for natural language generation tasks. Privately fine-tuning with DART, GPT-2-Small, GPT-2-Medium, GPT-2-Large, and GPT-2-XL achieve BLEU scores of 38.5, 42.0, 43.1, and 43.8 respectively (privacy budget of $ε= 6.8,δ=$ 1e-5) whereas the non-private baseline is $48.1$. All our experiments suggest that larger models are better suited for private fine-tuning: while they are well known to achieve superior accuracy non-privately, we find that they also better maintain their accuracy when privacy is introduced.

preprint2022arXiv

Nested Dissection Meets IPMs: Planar Min-Cost Flow in Nearly-Linear Time

We present a nearly-linear time algorithm for finding a minimum-cost flow in planar graphs with polynomially bounded integer costs and capacities. The previous fastest algorithm for this problem is based on interior point methods (IPMs) and works for general sparse graphs in $O(n^{1.5}\text{poly}(\log n))$ time [Daitch-Spielman, STOC'08]. Intuitively, $Ω(n^{1.5})$ is a natural runtime barrier for IPM-based methods, since they require $\sqrt{n}$ iterations, each routing a possibly-dense electrical flow. To break this barrier, we develop a new implicit representation for flows based on generalized nested-dissection [Lipton-Rose-Tarjan, JSTOR'79] and approximate Schur complements [Kyng-Sachdeva, FOCS'16]. This implicit representation permits us to design a data structure to route an electrical flow with sparse demands in roughly $\sqrt{n}$ update time, resulting in a total running time of $O(n\cdot\text{poly}(\log n))$. Our results immediately extend to all families of separable graphs.

preprint2022arXiv

Private Convex Optimization via Exponential Mechanism

In this paper, we study private optimization problems for non-smooth convex functions $F(x)=\mathbb{E}_i f_i(x)$ on $\mathbb{R}^d$. We show that modifying the exponential mechanism by adding an $\ell_2^2$ regularizer to $F(x)$ and sampling from $π(x)\propto \exp(-k(F(x)+μ\|x\|_2^2/2))$ recovers both the known optimal empirical risk and population loss under $(ε,δ)$-DP. Furthermore, we show how to implement this mechanism using $\widetilde{O}(n \min(d, n))$ queries to $f_i(x)$ for the DP-SCO where $n$ is the number of samples/users and $d$ is the ambient dimension. We also give a (nearly) matching lower bound $\widetildeΩ(n \min(d, n))$ on the number of evaluation queries. Our results utilize the following tools that are of independent interest: (1) We prove Gaussian Differential Privacy (GDP) of the exponential mechanism if the loss function is strongly convex and the perturbation is Lipschitz. Our privacy bound is \emph{optimal} as it includes the privacy of Gaussian mechanism as a special case and is proved using the isoperimetric inequality for strongly log-concave measures. (2) We show how to sample from $\exp(-F(x)-μ\|x\|^2_2/2)$ for $G$-Lipschitz $F$ with $η$ error in total variation (TV) distance using $\widetilde{O}((G^2/μ) \log^2(d/η))$ unbiased queries to $F(x)$. This is the first sampler whose query complexity has \emph{polylogarithmic dependence} on both dimension $d$ and accuracy $η$.

preprint2022arXiv

Short-step Methods Are Not Strongly Polynomial-Time

Short-step methods are an important class of algorithms for solving convex constrained optimization problems. In this short paper, we show that under very mild assumptions on the self-concordant barrier and the width of the $\ell_2$-neighbourhood, any short-step interior-point method is not strongly polynomial-time.

preprint2022arXiv

Universal Barrier is $n$-Self-Concordant

This paper shows that the self-concordance parameter of the universal barrier on any $n$-dimensional proper convex domain is upper bounded by $n$. This bound is tight and improves the previous $O(n)$ bound by Nesterov and Nemirovski. The key to our main result is a pair of new, sharp moment inequalities for $s$-concave distributions, which could be of independent interest.

preprint2021arXiv

Complexity of Highly Parallel Non-Smooth Convex Optimization

A landmark result of non-smooth convex optimization is that gradient descent is an optimal algorithm whenever the number of computed gradients is smaller than the dimension $d$. In this paper we study the extension of this result to the parallel optimization setting. Namely we consider optimization algorithms interacting with a highly parallel gradient oracle, that is one that can answer $\mathrm{poly}(d)$ gradient queries in parallel. We show that in this case gradient descent is optimal only up to $\tilde{O}(\sqrt{d})$ rounds of interactions with the oracle. The lower bound improves upon a decades old construction by Nemirovski which proves optimality only up to $d^{1/3}$ rounds (as recently observed by Balkanski and Singer), and the suboptimality of gradient descent after $\sqrt{d}$ rounds was already observed by Duchi, Bartlett and Wainwright. In the latter regime we propose a new method with improved complexity, which we conjecture to be optimal. The analysis of this new method is based upon a generalized version of the recent results on optimal acceleration for highly smooth convex optimization.

preprint2021arXiv

Fast and Memory Efficient Differentially Private-SGD via JL Projections

Differentially Private-SGD (DP-SGD) of Abadi et al. (2016) and its variations are the only known algorithms for private training of large scale neural networks. This algorithm requires computation of per-sample gradients norms which is extremely slow and memory intensive in practice. In this paper, we present a new framework to design differentially private optimizers called DP-SGD-JL and DP-Adam-JL. Our approach uses Johnson-Lindenstrauss (JL) projections to quickly approximate the per-sample gradient norms without exactly computing them, thus making the training time and memory requirements of our optimizers closer to that of their non-DP versions. Unlike previous attempts to make DP-SGD faster which work only on a subset of network architectures or use compiler techniques, we propose an algorithmic solution which works for any network in a black-box manner which is the main contribution of this paper. To illustrate this, on IMDb dataset, we train a Recurrent Neural Network (RNN) to achieve good privacy-vs-accuracy tradeoff, while being significantly faster than DP-SGD and with a similar memory footprint as non-private SGD. The privacy analysis of our algorithms is more involved than DP-SGD, we use the recently proposed f-DP framework of Dong et al. (2019) to prove privacy.

preprint2020arXiv

A Faster Interior Point Method for Semidefinite Programming

Semidefinite programs (SDPs) are a fundamental class of optimization problems with important recent applications in approximation algorithms, quantum complexity, robust learning, algorithmic rounding, and adversarial deep learning. This paper presents a faster interior point method to solve generic SDPs with variable size $n \times n$ and $m$ constraints in time \begin{align*} \widetilde{O}(\sqrt{n}( mn^2 + m^ω+ n^ω) \log(1 / ε) ), \end{align*} where $ω$ is the exponent of matrix multiplication and $ε$ is the relative accuracy. In the predominant case of $m \geq n$, our runtime outperforms that of the previous fastest SDP solver, which is based on the cutting plane method of Jiang, Lee, Song, and Wong [JLSW20]. Our algorithm's runtime can be naturally interpreted as follows: $\widetilde{O}(\sqrt{n} \log (1/ε))$ is the number of iterations needed for our interior point method, $mn^2$ is the input size, and $m^ω+ n^ω$ is the time to invert the Hessian and slack matrix in each iteration. These constitute natural barriers to further improving the runtime of interior point methods for solving generic SDPs.

preprint2020arXiv

A near-optimal algorithm for approximating the John Ellipsoid

We develop a simple and efficient algorithm for approximating the John Ellipsoid of a symmetric polytope. Our algorithm is near optimal in the sense that our time complexity matches the current best verification algorithm. We also provide the MATLAB code for further research.

preprint2020arXiv

Acceleration with a Ball Optimization Oracle

Consider an oracle which takes a point $x$ and returns the minimizer of a convex function $f$ in an $\ell_2$ ball of radius $r$ around $x$. It is straightforward to show that roughly $r^{-1}\log\frac{1}ε$ calls to the oracle suffice to find an $ε$-approximate minimizer of $f$ in an $\ell_2$ unit ball. Perhaps surprisingly, this is not optimal: we design an accelerated algorithm which attains an $ε$-approximate minimizer with roughly $r^{-2/3} \log \frac{1}ε$ oracle queries, and give a matching lower bound. Further, we implement ball optimization oracles for functions with locally stable Hessians using a variant of Newton's method. The resulting algorithm applies to a number of problems of practical and theoretical import, improving upon previous results for logistic and $\ell_\infty$ regression and achieving guarantees comparable to the state-of-the-art for $\ell_p$ regression.

preprint2020arXiv

An Improved Cutting Plane Method for Convex Optimization, Convex-Concave Games and its Applications

Given a separation oracle for a convex set $K \subset \mathbb{R}^n$ that is contained in a box of radius $R$, the goal is to either compute a point in $K$ or prove that $K$ does not contain a ball of radius $ε$. We propose a new cutting plane algorithm that uses an optimal $O(n \log (κ))$ evaluations of the oracle and an additional $O(n^2)$ time per evaluation, where $κ= nR/ε$. $\bullet$ This improves upon Vaidya's $O( \text{SO} \cdot n \log (κ) + n^{ω+1} \log (κ))$ time algorithm [Vaidya, FOCS 1989a] in terms of polynomial dependence on $n$, where $ω< 2.373$ is the exponent of matrix multiplication and $\text{SO}$ is the time for oracle evaluation. $\bullet$ This improves upon Lee-Sidford-Wong's $O( \text{SO} \cdot n \log (κ) + n^3 \log^{O(1)} (κ))$ time algorithm [Lee, Sidford and Wong, FOCS 2015] in terms of dependence on $κ$. For many important applications in economics, $κ= Ω(\exp(n))$ and this leads to a significant difference between $\log(κ)$ and $\mathrm{poly}(\log (κ))$. We also provide evidence that the $n^2$ time per evaluation cannot be improved and thus our running time is optimal. A bottleneck of previous cutting plane methods is to compute leverage scores, a measure of the relative importance of past constraints. Our result is achieved by a novel multi-layered data structure for leverage score maintenance, which is a sophisticated combination of diverse techniques such as random projection, batched low-rank update, inverse maintenance, polynomial interpolation, and fast rectangular matrix multiplication. Interestingly, our method requires a combination of different fast rectangular matrix multiplication algorithms.

preprint2020arXiv

Composite Logconcave Sampling with a Restricted Gaussian Oracle

We consider sampling from composite densities on $\mathbb{R}^d$ of the form $dπ(x) \propto \exp(-f(x) - g(x))dx$ for well-conditioned $f$ and convex (but possibly non-smooth) $g$, a family generalizing restrictions to a convex set, through the abstraction of a restricted Gaussian oracle. For $f$ with condition number $κ$, our algorithm runs in $O \left(κ^2 d \log^2\tfrac{κd}ε\right)$ iterations, each querying a gradient of $f$ and a restricted Gaussian oracle, to achieve total variation distance $ε$. The restricted Gaussian oracle, which draws samples from a distribution whose negative log-likelihood sums a quadratic and $g$, has been previously studied and is a natural extension of the proximal oracle used in composite optimization. Our algorithm is conceptually simple and obtains stronger provable guarantees and greater generality than existing methods for composite sampling. We conduct experiments showing our algorithm vastly improves upon the hit-and-run algorithm for sampling the restriction of a (non-diagonal) Gaussian to the positive orthant.

preprint2020arXiv

Logsmooth Gradient Concentration and Tighter Runtimes for Metropolized Hamiltonian Monte Carlo

We show that the gradient norm $\|\nabla f(x)\|$ for $x \sim \exp(-f(x))$, where $f$ is strongly convex and smooth, concentrates tightly around its mean. This removes a barrier in the prior state-of-the-art analysis for the well-studied Metropolized Hamiltonian Monte Carlo (HMC) algorithm for sampling from a strongly logconcave distribution. We correspondingly demonstrate that Metropolized HMC mixes in $\tilde{O}(κd)$ iterations, improving upon the $\tilde{O}(κ^{1.5}\sqrt{d} + κd)$ runtime of (Dwivedi et. al. '18, Chen et. al. '19) by a factor $(κ/d)^{1/2}$ when the condition number $κ$ is large. Our mixing time analysis introduces several techniques which to our knowledge have not appeared in the literature and may be of independent interest, including restrictions to a nonconvex set with good conductance behavior, and a new reduction technique for boosting a constant-accuracy total variation guarantee under weak warmness assumptions. This is the first high-accuracy mixing time result for logconcave distributions using only first-order function information which achieves linear dependence on $κ$; we also give evidence that this dependence is likely to be necessary for standard Metropolized first-order methods.

preprint2020arXiv

Solving Linear Programs with Sqrt(rank) Linear System Solves

We present an algorithm that given a linear program with $n$ variables, $m$ constraints, and constraint matrix $A$, computes an $ε$-approximate solution in $\tilde{O}(\sqrt{rank(A)}\log(1/ε))$ iterations with high probability. Each iteration of our method consists of solving $\tilde{O}(1)$ linear systems and additional nearly linear time computation, improving by a factor of $\tildeΩ((m/rank(A))^{1/2})$ over the previous fastest method with this iteration cost due to Renegar (1988). Further, we provide a deterministic polynomial time computable $\tilde{O}(rank(A))$-self-concordant barrier function for the polytope, resolving an open question of Nesterov and Nemirovski (1994) on the theory of "universal barriers" for interior point methods. Applying our techniques to the linear program formulation of maximum flow yields an $\tilde{O}(|E|\sqrt{|V|}\log(U))$ time algorithm for solving the maximum flow problem on directed graphs with $|E|$ edges, $|V|$ vertices, and integer capacities of size at most $U$. This improves upon the previous fastest polynomial running time of $O(|E|\min\{|E|^{1/2},|V|^{2/3}\}\log(|V|^{2}/|E|)\log(U))$ achieved by Goldberg and Rao (1998). In the special case of solving dense directed unit capacity graphs our algorithm improves upon the previous fastest running times of $O(|E|\min\{|E|^{1/2},|V|^{2/3}\})$ achieved by Even and Tarjan (1975) and Karzanov (1973) and of $\tilde{O}(|E|^{10/7})$ achieved more recently by Mądry (2013).

preprint2020arXiv

Strong Self-Concordance and Sampling

Motivated by the Dikin walk, we develop aspects of an interior-point theory for sampling in high dimension. Specifically, we introduce a symmetric parameter and the notion of strong self-concordance. These properties imply that the corresponding Dikin walk mixes in $\tilde{O}(n\barν)$ steps from a warm start in a convex body in $\mathbb{R}^{n}$ using a strongly self-concordant barrier with symmetric self-concordance parameter $\barν$. For many natural barriers, $\barν$ is roughly bounded by $ν$, the standard self-concordance parameter. We show that this property and strong self-concordance hold for the Lee-Sidford barrier. As a consequence, we obtain the first walk to mix in $\tilde{O}(n^{2})$ steps for an arbitrary polytope in $\mathbb{R}^{n}$. Strong self-concordance for other barriers leads to an interesting (and unexpected) connection -- for the universal and entropic barriers, it is implied by the KLS conjecture.

preprint2016arXiv

Geometric Median in Nearly Linear Time

In this paper we provide faster algorithms for solving the geometric median problem: given $n$ points in $\mathbb{R}^{d}$ compute a point that minimizes the sum of Euclidean distances to the points. This is one of the oldest non-trivial problems in computational geometry yet despite an abundance of research the previous fastest algorithms for computing a $(1+ε)$-approximate geometric median were $O(d\cdot n^{4/3}ε^{-8/3})$ by Chin et. al, $\tilde{O}(d\exp{ε^{-4}\logε^{-1}})$ by Badoiu et. al, $O(nd+\mathrm{poly}(d,ε^{-1})$ by Feldman and Langberg, and $O((nd)^{O(1)}\log\frac{1}ε)$ by Parrilo and Sturmfels and Xue and Ye. In this paper we show how to compute a $(1+ε)$-approximate geometric median in time $O(nd\log^{3}\frac{1}ε)$ and $O(dε^{-2})$. While our $O(dε^{-2})$ is a fairly straightforward application of stochastic subgradient descent, our $O(nd\log^{3}\frac{1}ε)$ time algorithm is a novel long step interior point method. To achieve this running time we start with a simple $O((nd)^{O(1)}\log\frac{1}ε)$ time interior point method and show how to improve it, ultimately building an algorithm that is quite non-standard from the perspective of interior point literature. Our result is one of very few cases we are aware of outperforming traditional interior point theory and the only we are aware of using interior point methods to obtain a nearly linear time algorithm for a canonical optimization problem that traditionally requires superlinear time. We hope our work leads to further improvements in this line of research.

preprint2016arXiv

Kernel-based methods for bandit convex optimization

We consider the adversarial convex bandit problem and we build the first $\mathrm{poly}(T)$-time algorithm with $\mathrm{poly}(n) \sqrt{T}$-regret for this problem. To do so we introduce three new ideas in the derivative-free optimization literature: (i) kernel methods, (ii) a generalization of Bernoulli convolutions, and (iii) a new annealing schedule for exponential weights (with increasing learning rate). The basic version of our algorithm achieves $\tilde{O}(n^{9.5} \sqrt{T})$-regret, and we show that a simple variant of this algorithm can be run in $\mathrm{poly}(n \log(T))$-time per step at the cost of an additional $\mathrm{poly}(n) T^{o(1)}$ factor in the regret. These results improve upon the $\tilde{O}(n^{11} \sqrt{T})$-regret and $\exp(\mathrm{poly}(T))$-time result of the first two authors, and the $\log(T)^{\mathrm{poly}(n)} \sqrt{T}$-regret and $\log(T)^{\mathrm{poly}(n)}$-time result of Hazan and Li. Furthermore we conjecture that another variant of the algorithm could achieve $\tilde{O}(n^{1.5} \sqrt{T})$-regret, and moreover that this regret is unimprovable (the current best lower bound being $Ω(n \sqrt{T})$ and it is achieved with linear functions). For the simpler situation of zeroth order stochastic convex optimization this corresponds to the conjecture that the optimal query complexity is of order $n^3 / ε^2$.

preprint2016arXiv

Subquadratic Submodular Function Minimization

Submodular function minimization (SFM) is a fundamental discrete optimization problem which generalizes many well known problems, has applications in various fields, and can be solved in polynomial time. Owing to applications in computer vision and machine learning, fast SFM algorithms are highly desirable. The current fastest algorithms [Lee, Sidford, Wong, FOCS 2015] run in $O(n^{2}\log nM\cdot\textrm{EO} +n^{3}\log^{O(1)}nM)$ time and $O(n^{3}\log^{2}n\cdot \textrm{EO} +n^{4}\log^{O(1)}n$) time respectively, where $M$ is the largest absolute value of the function (assuming the range is integers) and $\textrm{EO}$ is the time taken to evaluate the function on any set. Although the best known lower bound on the query complexity is only $Ω(n)$, the current shortest non-deterministic proof certifying the optimum value of a function requires $Θ(n^{2})$ function evaluations. The main contribution of this paper are subquadratic SFM algorithms. For integer-valued submodular functions, we give an SFM algorithm which runs in $O(nM^{3}\log n\cdot\textrm{EO})$ time giving the first nearly linear time algorithm in any known regime. For real-valued submodular functions with range in $[-1,1]$, we give an algorithm which in $\tilde{O}(n^{5/3}\cdot\textrm{EO}/\varepsilon^{2})$ time returns an $\varepsilon$-additive approximate solution. At the heart of it, our algorithms are projected stochastic subgradient descent methods on the Lovasz extension of submodular functions where we crucially exploit submodularity and data structures to obtain fast, i.e. sublinear time subgradient updates. . The latter is crucial for beating the $n^{2}$ bound as we show that algorithms which access only subgradients of the Lovasz extension, and these include the theoretically best algorithms mentioned above, must make $Ω(n)$ subgradient calls (even for functions whose range is $\{-1,0,1\}$).

preprint2016arXiv

Using Optimization to Obtain a Width-Independent, Parallel, Simpler, and Faster Positive SDP Solver

We study the design of polylogarithmic depth algorithms for approximately solving packing and covering semidefinite programs (or positive SDPs for short). This is a natural SDP generalization of the well-studied positive LP problem. Although positive LPs can be solved in polylogarithmic depth while using only $\tilde{O}(\log^{2} n/\varepsilon^2)$ parallelizable iterations, the best known positive SDP solvers due to Jain and Yao require $O(\log^{14} n /\varepsilon^{13})$ parallelizable iterations. Several alternative solvers have been proposed to reduce the exponents in the number of iterations. However, the correctness of the convergence analyses in these works has been called into question, as they both rely on algebraic monotonicity properties that do not generalize to matrix algebra. In this paper, we propose a very simple algorithm based on the optimization framework proposed for LP solvers. Our algorithm only needs $\tilde{O}(\log^2 n / \varepsilon^2)$ iterations, matching that of the best LP solver. To surmount the obstacles encountered by previous approaches, our analysis requires a new matrix inequality that extends Lieb-Thirring's inequality, and a sign-consistent, randomized variant of the gradient truncation technique proposed in.

preprint2015arXiv

A Faster Cutting Plane Method and its Implications for Combinatorial and Convex Optimization

We improve upon the running time for finding a point in a convex set given a separation oracle. In particular, given a separation oracle for a convex set $K\subset \mathbb{R}^n$ contained in a box of radius $R$, we show how to either find a point in $K$ or prove that $K$ does not contain a ball of radius $ε$ using an expected $O(n\log(nR/ε))$ oracle evaluations and additional time $O(n^3\log^{O(1)}(nR/ε))$. This matches the oracle complexity and improves upon the $O(n^{ω+1}\log(nR/ε))$ additional time of the previous fastest algorithm achieved over 25 years ago by Vaidya for the current matrix multiplication constant $ω<2.373$ when $R/ε=n^{O(1)}$. Using a mix of standard reductions and new techniques, our algorithm yields improved runtimes for solving classic problems in continuous and combinatorial optimization: Submodular Minimization: Our weakly and strongly polynomial time algorithms have runtimes of $O(n^2\log nM\cdot\text{EO}+n^3\log^{O(1)}nM)$ and $O(n^3\log^2 n\cdot\text{EO}+n^4\log^{O(1)}n)$, improving upon the previous best of $O((n^4\text{EO}+n^5)\log M)$ and $O(n^5\text{EO}+n^6)$. Matroid Intersection: Our runtimes are $O(nrT_{\text{rank}}\log n\log (nM) +n^3\log^{O(1)}(nM))$ and $O(n^2\log (nM) T_{\text{ind}}+n^3 \log^{O(1)} (nM))$, achieving the first quadratic bound on the query complexity for the independence and rank oracles. In the unweighted case, this is the first improvement since 1986 for independence oracle. Submodular Flow: Our runtime is $O(n^2\log nCU\cdot\text{EO}+n^3\log^{O(1)}nCU)$, improving upon the previous bests from 15 years ago roughly by a factor of $O(n^4)$. Semidefinite Programming: Our runtime is $\tilde{O}(n(n^2+m^ω+S))$, improving upon the previous best of $\tilde{O}(n(n^ω+m^ω+S))$ for the regime where the number of nonzeros $S$ is small.

preprint2015arXiv

A geometric alternative to Nesterov's accelerated gradient descent

We propose a new method for unconstrained optimization of a smooth and strongly convex function, which attains the optimal rate of convergence of Nesterov's accelerated gradient descent. The new algorithm has a simple geometric interpretation, loosely inspired by the ellipsoid method. We provide some numerical evidence that the new method can be superior to Nesterov's accelerated gradient descent.

preprint2015arXiv

Constructing Linear-Sized Spectral Sparsification in Almost-Linear Time

We present the first almost-linear time algorithm for constructing linear-sized spectral sparsification for graphs. This improves all previous constructions of linear-sized spectral sparsification, which requires $Ω(n^2)$ time. A key ingredient in our algorithm is a novel combination of two techniques used in literature for constructing spectral sparsification: Random sampling by effective resistance, and adaptive constructions based on barrier functions.

preprint2015arXiv

Efficient Inverse Maintenance and Faster Algorithms for Linear Programming

In this paper, we consider the following inverse maintenance problem: given $A \in \mathbb{R}^{n\times d}$ and a number of rounds $r$, we receive a $n\times n$ diagonal matrix $D^{(k)}$ at round $k$ and we wish to maintain an efficient linear system solver for $A^{T}D^{(k)}A$ under the assumption $D^{(k)}$ does not change too rapidly. This inverse maintenance problem is the computational bottleneck in solving multiple optimization problems. We show how to solve this problem with $\tilde{O}(nnz(A)+d^ω)$ preprocessing time and amortized $\tilde{O}(nnz(A)+d^{2})$ time per round, improving upon previous running times for solving this problem. Consequently, we obtain the fastest known running times for solving multiple problems including, linear programming and computing a rounding of a polytope. In particular given a feasible point in a linear program with $d$ variables, $n$ constraints, and constraint matrix $A\in\mathbb{R}^{n\times d}$, we show how to solve the linear program in time $\tilde{O}(nnz(A)+d^{2})\sqrt{d}\log(ε^{-1}))$. We achieve our results through a novel combination of classic numerical techniques of low rank update, preconditioning, and fast matrix multiplication as well as recent work on subspace embeddings and spectral sparsification that we hope will be of independent interest.

preprint2015arXiv

Improved Cheeger's Inequality and Analysis of Local Graph Partitioning using Vertex Expansion and Expansion Profile

We prove two generalizations of the Cheeger's inequality. The first generalization relates the second eigenvalue to the edge expansion and the vertex expansion of the graph G, $λ_2 = Ω(ϕ^V(G) ϕ(G))$, where $ϕ^V(G)$ denotes the robust vertex expansion of G and $ϕ(G)$ denotes the edge expansion of G. The second generalization relates the second eigenvalue to the edge expansion and the expansion profile of G, for all $k \ge 2$, $λ_2 = Ω(ϕ_k(G) ϕ(G) / k)$, where $ϕ_k(G)$ denotes the k-way expansion of G. These show that the spectral partitioning algorithm has better performance guarantees when $ϕ^V(G)$ is large (e.g. planted random instances) or $ϕ_k(G)$ is large (instances with few disjoint non-expanding sets). Both bounds are tight up to a constant factor. Our approach is based on a method to analyze solutions of Laplacian systems, and this allows us to extend the results to local graph partitioning algorithms. In particular, we show that our approach can be used to analyze personal pagerank vectors, and to give a local graph partitioning algorithm for the small-set expansion problem with performance guarantees similar to the generalizations of Cheeger's inequality. We also present a spectral approach to prove similar results for the truncated random walk algorithm. These show that local graph partitioning algorithms almost match the performance of the spectral partitioning algorithm, with the additional advantages that they apply to the small-set expansion problem and their running time could be sublinear. Our techniques provide common approaches to analyze the spectral partitioning algorithm and local graph partitioning algorithms.

preprint2015arXiv

Path Finding I :Solving Linear Programs with Õ(sqrt(rank)) Linear System Solves

In this paper we present a new algorithm for solving linear programs that requires only $\tilde{O}(\sqrt{rank(A)}L)$ iterations to solve a linear program with $m$ constraints, $n$ variables, and constraint matrix $A$, and bit complexity $L$. Each iteration of our method consists of solving $\tilde{O}(1)$ linear systems and additional nearly linear time computation. Our method improves upon the previous best iteration bound by factor of $\tildeΩ((m/rank(A))^{1/4})$ for methods with polynomial time computable iterations and by $\tildeΩ((m/rank(A))^{1/2})$ for methods which solve at most $\tilde{O}(1)$ linear systems in each iteration. Our method is parallelizable and amenable to linear algebraic techniques for accelerating the linear system solver. As such, up to polylogarithmic factors we either match or improve upon the best previous running times in both depth and work for different ratios of $m$ and $rank(A)$. Moreover, our method matches up to polylogarithmic factors a theoretical limit established by Nesterov and Nemirovski in 1994 regarding the use of a "universal barrier" for interior point methods, thereby resolving a long-standing open question regarding the running time of polynomial time interior point methods for linear programming.

preprint2015arXiv

Path Finding II : An Õ(m sqrt(n)) Algorithm for the Minimum Cost Flow Problem

In this paper we present an $\tilde{O}(m\sqrt{n}\log^{O(1)}U)$ time algorithm for solving the maximum flow problem on directed graphs with $m$ edges, $n$ vertices, and capacity ratio $U$. This improves upon the previous fastest running time of $O(m\min\left(n^{2/3},m^{1/2}\right)\log\left(n^{2}/m\right)\log U)$ achieved over 15 years ago by Goldberg and Rao. In the special case of solving dense directed unit capacity graphs our algorithm improves upon the previous fastest running times of of $O(\min\{m^{3/2},mn^{^{2/3}}\})$ achieved by Even and Tarjan and Karzanov over 35 years ago and of $\tilde{O}(m^{10/7})$ achieved recently by Mądry. We achieve these results through the development and application of a new general interior point method that we believe is of independent interest. The number of iterations required by this algorithm is better than that predicted by analyzing the best self-concordant barrier of the feasible region. By applying this method to the linear programming formulations of maximum flow, minimum cost flow, and lossy generalized minimum cost flow and applying analysis by Daitch and Spielman we achieve running time of $\tilde{O}(m\sqrt{n}\log^{O(1)}(U/ε))$ for these problems as well. Furthermore, our algorithm is parallelizable and using a recent nearly linear time work polylogarithmic depth Laplacian system solver of Spielman and Peng we achieve a $\tilde{O}(\sqrt{n}\log^{O(1)}(U/ε))$ depth algorithm and $\tilde{O}(m\sqrt{n}\log^{O(1)}(U/ε))$ work algorithm for solving these problems.

preprint2015arXiv

Single Pass Spectral Sparsification in Dynamic Streams

We present the first single pass algorithm for computing spectral sparsifiers of graphs in the dynamic semi-streaming model. Given a single pass over a stream containing insertions and deletions of edges to a graph G, our algorithm maintains a randomized linear sketch of the incidence matrix of G into dimension O((1/epsilon^2) n polylog(n)). Using this sketch, at any point, the algorithm can output a (1 +/- epsilon) spectral sparsifier for G with high probability. While O((1/epsilon^2) n polylog(n)) space algorithms are known for computing "cut sparsifiers" in dynamic streams [AGM12b, GKP12] and spectral sparsifiers in "insertion-only" streams [KL11], prior to our work, the best known single pass algorithm for maintaining spectral sparsifiers in dynamic streams required sketches of dimension Omega((1/epsilon^2) n^(5/3)) [AGM14]. To achieve our result, we show that, using a coarse sparsifier of G and a linear sketch of G's incidence matrix, it is possible to sample edges by effective resistance, obtaining a spectral sparsifier of arbitrary precision. Sampling from the sketch requires a novel application of ell_2/ell_2 sparse recovery, a natural extension of the ell_0 methods used for cut sparsifiers in [AGM12b]. Recent work of [MP12] on row sampling for matrix approximation gives a recursive approach for obtaining the required coarse sparsifiers. Under certain restrictions, our approach also extends to the problem of maintaining a spectral approximation for a general matrix A^T A given a stream of updates to rows in A.

preprint2015arXiv

Sparsified Cholesky and Multigrid Solvers for Connection Laplacians

We introduce the sparsified Cholesky and sparsified multigrid algorithms for solving systems of linear equations. These algorithms accelerate Gaussian elimination by sparsifying the nonzero matrix entries created by the elimination process. We use these new algorithms to derive the first nearly linear time algorithms for solving systems of equations in connection Laplacians, a generalization of Laplacian matrices that arise in many problems in image and signal processing. We also prove that every connection Laplacian has a linear sized approximate inverse. This is an LU factorization with a linear number of nonzero entries that is a strong approximation of the original matrix. Using such a factorization one can solve systems of equations in a connection Laplacian in linear time. Such a factorization was unknown even for ordinary graph Laplacians.

preprint2015arXiv

Sparsified Cholesky Solvers for SDD linear systems

We show that Laplacian and symmetric diagonally dominant (SDD) matrices can be well approximated by linear-sized sparse Cholesky factorizations. We show that these matrices have constant-factor approximations of the form $L L^{T}$, where $L$ is a lower-triangular matrix with a number of nonzero entries linear in its dimension. Furthermore linear systems in $L$ and $L^{T}$ can be solved in $O (n)$ work and $O(\log{n}\log^2\log{n})$ depth, where $n$ is the dimension of the matrix. We present nearly linear time algorithms that construct solvers that are almost this efficient. In doing so, we give the first nearly-linear work routine for constructing spectral vertex sparsifiers---that is, spectral approximations of Schur complements of Laplacian matrices.

preprint2014arXiv

Uniform Sampling for Matrix Approximation

Random sampling has become a critical tool in solving massive matrix problems. For linear regression, a small, manageable set of data rows can be randomly selected to approximate a tall, skinny data matrix, improving processing time significantly. For theoretical performance guarantees, each row must be sampled with probability proportional to its statistical leverage score. Unfortunately, leverage scores are difficult to compute. A simple alternative is to sample rows uniformly at random. While this often works, uniform sampling will eliminate critical row information for many natural instances. We take a fresh look at uniform sampling by examining what information it does preserve. Specifically, we show that uniform sampling yields a matrix that, in some sense, well approximates a large fraction of the original. While this weak form of approximation is not enough for solving linear regression directly, it is enough to compute a better approximation. This observation leads to simple iterative row sampling algorithms for matrix approximation that run in input-sparsity time and preserve row structure and sparsity at all intermediate steps. In addition to an improved understanding of uniform sampling, our main proof introduces a structural result of independent interest: we show that every matrix can be made to have low coherence by reweighting a small subset of its rows.

preprint2013arXiv

An Almost-Linear-Time Algorithm for Approximate Max Flow in Undirected Graphs, and its Multicommodity Generalizations

In this paper, we introduce a new framework for approximately solving flow problems in capacitated, undirected graphs and apply it to provide asymptotically faster algorithms for the maximum $s$-$t$ flow and maximum concurrent multicommodity flow problems. For graphs with $n$ vertices and $m$ edges, it allows us to find an $ε$-approximate maximum $s$-$t$ flow in time $O(m^{1+o(1)}ε^{-2})$, improving on the previous best bound of $\tilde{O}(mn^{1/3} poly(1/ε))$. Applying the same framework in the multicommodity setting solves a maximum concurrent multicommodity flow problem with $k$ commodities in $O(m^{1+o(1)}ε^{-2}k^2)$ time, improving on the existing bound of $\tilde{O}(m^{4/3} poly(k,ε^{-1})$. Our algorithms utilize several new technical tools that we believe may be of independent interest: - We give a non-Euclidean generalization of gradient descent and provide bounds on its performance. Using this, we show how to reduce approximate maximum flow and maximum concurrent flow to the efficient construction of oblivious routings with a low competitive ratio. - We define and provide an efficient construction of a new type of flow sparsifier. In addition to providing the standard properties of a cut sparsifier our construction allows for flows in the sparse graph to be routed (very efficiently) in the original graph with low congestion. - We give the first almost-linear-time construction of an $O(m^{o(1)})$-competitive oblivious routing scheme. No previous such algorithm ran in time better than $\tilde{Ω}(mn)$. We also note that independently Jonah Sherman produced an almost linear time algorithm for maximum flow and we thank him for coordinating submissions.

preprint2013arXiv

Efficient Accelerated Coordinate Descent Methods and Faster Algorithms for Solving Linear Systems

In this paper we show how to accelerate randomized coordinate descent methods and achieve faster convergence rates without paying per-iteration costs in asymptotic running time. In particular, we show how to generalize and efficiently implement a method proposed by Nesterov, giving faster asymptotic running times for various algorithms that use standard coordinate descent as a black box. In addition to providing a proof of convergence for this new general method, we show that it is numerically stable, efficiently implementable, and in certain regimes, asymptotically optimal. To highlight the computational power of this algorithm, we show how it can used to create faster linear system solvers in several regimes: - We show how this method achieves a faster asymptotic runtime than conjugate gradient for solving a broad class of symmetric positive definite systems of equations. - We improve the best known asymptotic convergence guarantees for Kaczmarz methods, a popular technique for image reconstruction and solving overdetermined systems of equations, by accelerating a randomized algorithm of Strohmer and Vershynin. - We achieve the best known running time for solving Symmetric Diagonally Dominant (SDD) system of equations in the unit-cost RAM model, obtaining an O(m log^{3/2} n (log log n)^{1/2} log (log n / eps)) asymptotic running time by accelerating a recent solver by Kelner et al. Beyond the independent interest of these solvers, we believe they highlight the versatility of the approach of this paper and we hope that they will open the door for further algorithmic improvements in the future.

preprint2013arXiv

Improved Cheeger's Inequality: Analysis of Spectral Partitioning Algorithms through Higher Order Spectral Gap

Let ϕ(G) be the minimum conductance of an undirected graph G, and let 0=λ_1 <= λ_2 <=... <= λ_n <= 2 be the eigenvalues of the normalized Laplacian matrix of G. We prove that for any graph G and any k >= 2, ϕ(G) = O(k) λ_2 / \sqrt{λ_k}, and this performance guarantee is achieved by the spectral partitioning algorithm. This improves Cheeger's inequality, and the bound is optimal up to a constant factor for any k. Our result shows that the spectral partitioning algorithm is a constant factor approximation algorithm for finding a sparse cut if λ_k$ is a constant for some constant k. This provides some theoretical justification to its empirical performance in image segmentation and clustering problems. We extend the analysis to other graph partitioning problems, including multi-way partition, balanced separator, and maximum cut.

preprint2013arXiv

Probabilistic Spectral Sparsification In Sublinear Time

In this paper, we introduce a variant of spectral sparsification, called probabilistic $(\varepsilon,δ)$-spectral sparsification. Roughly speaking, it preserves the cut value of any cut $(S,S^{c})$ with an $1\pm\varepsilon$ multiplicative error and a $δ\left|S\right|$ additive error. We show how to produce a probabilistic $(\varepsilon,δ)$-spectral sparsifier with $O(n\log n/\varepsilon^{2})$ edges in time $\tilde{O}(n/\varepsilon^{2}δ)$ time for unweighted undirected graph. This gives fastest known sub-linear time algorithms for different cut problems on unweighted undirected graph such as - An $\tilde{O}(n/OPT+n^{3/2+t})$ time $O(\sqrt{\log n/t})$-approximation algorithm for the sparsest cut problem and the balanced separator problem. - A $n^{1+o(1)}/\varepsilon^{4}$ time approximation minimum s-t cut algorithm with an $\varepsilon n$ additive error.

Yin Tat Lee

What is connected

Connect this record

See the researcher in context

Building this map preview

36 published item(s)

A gradient sampling method with complexity guarantees for Lipschitz functions in high and low dimensions

Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient Oracle Complexity

Differentially Private Fine-tuning of Language Models

Nested Dissection Meets IPMs: Planar Min-Cost Flow in Nearly-Linear Time

Private Convex Optimization via Exponential Mechanism

Short-step Methods Are Not Strongly Polynomial-Time

Universal Barrier is $n$-Self-Concordant

Complexity of Highly Parallel Non-Smooth Convex Optimization

Fast and Memory Efficient Differentially Private-SGD via JL Projections

A Faster Interior Point Method for Semidefinite Programming

A near-optimal algorithm for approximating the John Ellipsoid

Acceleration with a Ball Optimization Oracle

An Improved Cutting Plane Method for Convex Optimization, Convex-Concave Games and its Applications

Composite Logconcave Sampling with a Restricted Gaussian Oracle

Logsmooth Gradient Concentration and Tighter Runtimes for Metropolized Hamiltonian Monte Carlo

Solving Linear Programs with Sqrt(rank) Linear System Solves

Strong Self-Concordance and Sampling

Geometric Median in Nearly Linear Time

Kernel-based methods for bandit convex optimization

Subquadratic Submodular Function Minimization

Using Optimization to Obtain a Width-Independent, Parallel, Simpler, and Faster Positive SDP Solver

A Faster Cutting Plane Method and its Implications for Combinatorial and Convex Optimization

A geometric alternative to Nesterov's accelerated gradient descent

Constructing Linear-Sized Spectral Sparsification in Almost-Linear Time

Efficient Inverse Maintenance and Faster Algorithms for Linear Programming

Improved Cheeger's Inequality and Analysis of Local Graph Partitioning using Vertex Expansion and Expansion Profile

Path Finding I :Solving Linear Programs with Õ(sqrt(rank)) Linear System Solves

Path Finding II : An Õ(m sqrt(n)) Algorithm for the Minimum Cost Flow Problem

Single Pass Spectral Sparsification in Dynamic Streams

Sparsified Cholesky and Multigrid Solvers for Connection Laplacians

Sparsified Cholesky Solvers for SDD linear systems

Uniform Sampling for Matrix Approximation

An Almost-Linear-Time Algorithm for Approximate Max Flow in Undirected Graphs, and its Multicommodity Generalizations

Efficient Accelerated Coordinate Descent Methods and Faster Algorithms for Solving Linear Systems

Improved Cheeger's Inequality: Analysis of Spectral Partitioning Algorithms through Higher Order Spectral Gap

Probabilistic Spectral Sparsification In Sublinear Time