Source author record

Stephen J. Wright

Stephen J. Wright appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Numerical Analysis math.NA Machine Learning Artificial Intelligence Computer Vision Computational Engineering, Finance, and Science stat.OT

Catalog footprint

What is connected

28works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Data selection: at the interface of PDE-based inverse problem and randomized linear algebra

All inverse problems rely on data to recover unknown parameters, yet not all data are equally informative. This raises the central question of data selection. A distinctive challenge in PDE-based inverse problems is their inherently infinite-dimensional nature: both the parameter space and the design space are infinite, which greatly complicates the selection process. Somewhat unexpectedly, randomized numerical linear algebra (RNLA), originally developed in very different contexts, has provided powerful tools for addressing this challenge. These methods are inherently probabilistic, with guarantees typically stating that information is preserved with probability at least 1-p when using N randomly selected, weighted samples. Here, the notion of "information" can take different mathematical forms depending on the setting. In this review, we survey the problem of data selection in PDE-based inverse problems, emphasize its unique infinite-dimensional aspects, and highlight how RNLA strategies have been adapted and applied in this context.

preprint2022arXiv

Inexact Newton-CG Algorithms With Complexity Guarantees

We consider variants of a recently-developed Newton-CG algorithm for nonconvex problems \citep{royer2018newton} in which inexact estimates of the gradient and the Hessian information are used for various steps. Under certain conditions on the inexactness measures, we derive iteration complexity bounds for achieving $ε$-approximate second-order optimality that match best-known lower bounds. Our inexactness condition on the gradient is adaptive, allowing for crude accuracy in regions with large gradients. We describe two variants of our approach, one in which the step-size along the computed search direction is chosen adaptively and another in which the step-size is pre-defined. To obtain second-order optimality, our algorithms will make use of a negative curvature direction on some steps. These directions can be obtained, with high-probability, using a certain randomized algorithm. In this sense, all of our results hold with high-probability over the run of the algorithm. We evaluate the performance of our proposed algorithms empirically on several machine learning models.

preprint2022arXiv

Low-rank approximation for multiscale PDEs

Historically, analysis for multiscale PDEs is largely unified while numerical schemes tend to be equation-specific. In this paper, we propose a unified framework for computing multiscale problems through random sampling. This is achieved by incorporating randomized SVD solvers and manifold learning techniques to numerically reconstruct the low-rank features of multiscale PDEs. We use multiscale radiative transfer equation and elliptic equation with rough media to showcase the application of this framework.

preprint2022arXiv

On the Complexity of a Practical Primal-Dual Coordinate Method

We prove complexity bounds for the primal-dual algorithm with random extrapolation and coordinate descent (PURE-CD), which has been shown to obtain good practical performance for solving convex-concave min-max problems with bilinear coupling. Our complexity bounds either match or improve the best-known results in the literature for both dense and sparse (strongly)-convex-(strongly)-concave problems.

preprint2022arXiv

Randomized Algorithms for Scientific Computing (RASC)

Randomized algorithms have propelled advances in artificial intelligence and represent a foundational research area in advancing AI for Science. Future advancements in DOE Office of Science priority areas such as climate science, astrophysics, fusion, advanced materials, combustion, and quantum computing all require randomized algorithms for surmounting challenges of complexity, robustness, and scalability. This report summarizes the outcomes of that workshop, "Randomized Algorithms for Scientific Computing (RASC)," held virtually across four days in December 2020 and January 2021.

preprint2020arXiv

A Line-Search Descent Algorithm for Strict Saddle Functions with Complexity Guarantees

We describe a line-search algorithm which achieves the best-known worst-case complexity results for problems with a certain "strict saddle" property that has been observed to hold in low-rank matrix optimization problems. Our algorithm is adaptive, in the sense that it makes use of backtracking line searches and does not require prior knowledge of the parameters that define the strict saddle property.

preprint2020arXiv

A low-rank Schwarz method for radiative transport equation with heterogeneous scattering coefficient

Random sampling has been used to find low-rank structure and to build fast direct solvers for multiscale partial differential equations of various types. In this work, we design an accelerated Schwarz method for radiative transfer equations that makes use of approximate local solution maps constructed offline via a random sampling strategy. Numerical examples demonstrate the accuracy, robustness, and efficiency of the proposed approach.

preprint2020arXiv

Analyzing Random Permutations for Cyclic Coordinate Descent

We consider coordinate descent methods on convex quadratic problems, in which exact line searches are performed at each iteration. (This algorithm is identical to Gauss-Seidel on the equivalent symmetric positive definite linear system.) We describe a class of convex quadratic problems for which the random-permutations version of cyclic coordinate descent (RPCD) outperforms the standard cyclic coordinate descent (CCD) approach, yielding convergence behavior similar to the fully-random variant (RCD). A convergence analysis is developed to explain the empirical observations.

preprint2020arXiv

Complexity of Proximal augmented Lagrangian for nonconvex optimization with nonlinear equality constraints

We analyze worst-case complexity of a Proximal augmented Lagrangian (Proximal AL) framework for nonconvex optimization with nonlinear equality constraints. When an approximate first-order (second-order) optimal point is obtained in the subproblem, an $ε$ first-order (second-order) optimal point for the original problem can be guaranteed within $\mathcal{O}(1/ ε^{2 - η})$ outer iterations (where $η$ is a user-defined parameter with $η\in[0,2]$ for the first-order result and $η\in [1,2]$ for the second-order result) when the proximal term coefficient $β$ and penalty parameter $ρ$ satisfy $β= \mathcal{O}(ε^η)$ and $ρ= Ω(1/ε^η)$, respectively. We also investigate the total iteration complexity and operation complexity when a Newton-conjugate-gradient algorithm is used to solve the subproblems. Finally, we discuss an adaptive scheme for determining a value of the parameter $ρ$ that satisfies the requirements of the analysis.

preprint2020arXiv

Random Sampling and Efficient Algorithms for Multiscale PDEs

We describe a numerical framework that uses random sampling to efficiently capture low-rank local solution spaces of multiscale PDE problems arising in domain decomposition. In contrast to existing techniques, our method does not rely on detailed analytical understanding of specific multiscale PDEs, in particular, their asymptotic limits. We present the application of the framework on two examples --- a linear kinetic equation and an elliptic equation with rough media. On these two examples, this framework achieves the asymptotic preserving property for the kinetic equations and numerical homogenization for the elliptic equations.

preprint2016arXiv

Online Algorithms for Factorization-Based Structure from Motion

We present a family of online algorithms for real-time factorization-based structure from motion, leveraging a relationship between incremental singular value decomposition and recently proposed methods for online matrix completion. Our methods are orders of magnitude faster than previous state of the art, can handle missing data and a variable number of feature points, and are robust to noise and sparse outliers. We demonstrate our methods on both real and synthetic sequences and show that they perform well in both online and batch settings. We also provide an implementation which is able to produce 3D models in real time using a laptop with a webcam.

preprint2016arXiv

Sorting Network Relaxations for Vector Permutation Problems

The Birkhoff polytope (the convex hull of the set of permutation matrices) is frequently invoked in formulating relaxations of optimization problems over permutations. The Birkhoff polytope is represented using $Θ(n^2)$ variables and constraints, significantly more than the $n$ variables one could use to represent a permutation as a vector. Using a recent construction of Goemans (2010), we show that when optimizing over the convex hull of the permutation vectors (the permutahedron), we can reduce the number of variables and constraints to $Θ(n \log n)$ in theory and $Θ(n \log^2 n)$ in practice. We modify the recent convex formulation of the 2-SUM problem introduced by Fogel et al. (2013) to use this polytope, and demonstrate how we can attain results of similar quality in significantly less computational time for large $n$. To our knowledge, this is the first usage of Goemans' compact formulation of the permutahedron in a convex optimization problem. We also introduce a simpler regularization scheme for this convex formulation of the 2-SUM problem that yields good empirical results.

preprint2015arXiv

An S$\ell_1$LP-Active Set Approach for Feasibility Restoration in Power Systems

We consider power networks in which it is not possible to satisfy all loads at the demand nodes, due to some attack or disturbance to the network. We formulate a model, based on AC power flow equations, to restore the network to feasibility by shedding load at demand nodes, but doing so in a way that minimizes a weighted measure of the total load shed, and affects as few demand nodes as possible. Besides suggesting an optimal response to a given attack, our approach can be used to quantify disruption, thereby enabling "stress testing" to be performed and vulnerabilities to be identified. Optimization techniques including nonsmooth penalty functions, sequential linear programming, and active-set heuristics are used to solve this model. We describe an algorithmic framework and present convergence results, including a quadratic convergence result for the case in which the solution is fully determined by its constraints, a situation that arises frequently in the power systems application.

preprint2015arXiv

Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties

We describe an asynchronous parallel stochastic proximal coordinate descent algorithm for minimizing a composite objective function, which consists of a smooth convex function plus a separable convex function. In contrast to previous analyses, our model of asynchronous computation accounts for the fact that components of the unknown vector may be written by some cores simultaneously with being read by others. Despite the complications arising from this possibility, the method achieves a linear convergence rate on functions that satisfy an optimal strong convexity property and a sublinear rate ($1/k$) on general convex functions. Near-linear speedup on a multicore system can be expected if the number of processors is $O(n^{1/4})$. We describe results from implementation on ten cores of a multicore processor.

preprint2015arXiv

Vulnerability Analysis of Power Systems

Potential vulnerabilities in a power grid can be exposed by identifying those transmission lines on which attacks (in the form of interference with their transmission capabilities) causes maximum disruption to the grid. In this study, we model the grid by (nonlinear) AC power flow equations, and assume that attacks take the form of increased impedance along transmission lines. We quantify disruption in several different ways, including (a) overall deviation of the voltages at the buses from 1.0 per unit (p.u.), and (b) the minimal amount of load that must be shed in order to restore the grid to stable operation. We describe optimization formulations of the problem of finding the most disruptive attack, which are either nonlinear programing problems or nonlinear bilevel optimization problems, and describe customized algorithms for solving these problems. Experimental results on the IEEE 118-Bus system and a Polish 2383-Bus system are presented.

preprint2014arXiv

An Accelerated Randomized Kaczmarz Algorithm

The randomized Kaczmarz ($\RK$) algorithm is a simple but powerful approach for solving consistent linear systems $Ax=b$. This paper proposes an accelerated randomized Kaczmarz ($\ARK$) algorithm with better convergence than the standard $\RK$ algorithm on ill conditioned problems. The per-iteration cost of $\RK$ and $\ARK$ are similar if $A$ is dense, but $\RK$ is much more able to exploit sparsity in $A$ than is $\ARK$. To deal with the sparse case, an efficient implementation for $\ARK$, called $\SARK$, is proposed. A comparison of convergence rates and average per-iteration complexities among $\RK$, $\ARK$, and $\SARK$ is given, taking into account different levels of sparseness and conditioning. Comparisons with the leading deterministic algorithm --- conjugate gradient applied to the normal equations --- are also given. Finally, the analysis is validated via computational testing.

preprint2014arXiv

An Asynchronous Parallel Randomized Kaczmarz Algorithm

We describe an asynchronous parallel variant of the randomized Kaczmarz (RK) algorithm for solving the linear system $Ax=b$. The analysis shows linear convergence and indicates that nearly linear speedup can be expected if the number of processors is bounded by a multiple of the number of rows in $A$.

preprint2014arXiv

An Asynchronous Parallel Stochastic Coordinate Descent Algorithm

We describe an asynchronous parallel stochastic coordinate descent algorithm for minimizing smooth unconstrained or separably constrained functions. The method achieves a linear convergence rate on functions that satisfy an essential strong convexity property and a sublinear rate ($1/K$) on general convex functions. Near-linear speedup on a multicore system can be expected if the number of processors is $O(n^{1/2})$ in unconstrained optimization and $O(n^{1/4})$ in the separable-constrained case, where $n$ is the number of variables. We describe results from implementation on 40-core processors.

preprint2014arXiv

Local Convergence of an Algorithm for Subspace Identification from Partial Data

GROUSE (Grassmannian Rank-One Update Subspace Estimation) is an iterative algorithm for identifying a linear subspace of R^n from data consisting of partial observations of random vectors from that subspace. This paper examines local convergence properties of GROUSE, under assumptions on the randomness of the observed vectors, the randomness of the subset of elements observed at each iteration, and incoherence of the subspace with the coordinate directions. Convergence at an expected linear rate is demonstrated under certain assumptions. The case in which the full random vector is revealed at each iteration allows for much simpler analysis, and is also described. GROUSE is related to incremental SVD methods and to gradient projection algorithms in optimization.

preprint2014arXiv

PMU Placement for Line Outage Identification via Multiclass Logistic Regression

We consider the problem of identifying a single line outage in a power grid by using data from phasor measurement units (PMUs). When a line outage occurs, the voltage phasor of each bus node changes in response to the change in network topology. Each individual line outage has a consistent "signature," and a multiclass logistic regression (MLR) classifier can be trained to distinguish between these signatures reliably. We consider first the ideal case in which PMUs are attached to every bus, but phasor data alone is used to detect outage signatures. We then describe techniques for placing PMUs selectively on a subset of buses, with the subset being chosen to allow discrimination between as many outage events as possible. We also discuss extensions of the MLR technique that incorporate explicit information about identification of outages by PMUs measuring line current flow in or out of a bus. Experimental results with synthetic 24-hour demand profile data generated for 14, 30, 57 and 118-bus systems are presented.

preprint2014arXiv

Validating Sample Average Approximation Solutions with Negatively Dependent Batches

Sample-average approximations (SAA) are a practical means of finding approximate solutions of stochastic programming problems involving an extremely large (or infinite) number of scenarios. SAA can also be used to find estimates of a lower bound on the optimal objective value of the true problem which, when coupled with an upper bound, provides confidence intervals for the true optimal objective value and valuable information about the quality of the approximate solutions. Specifically, the lower bound can be estimated by solving multiple SAA problems (each obtained using a particular sampling method) and averaging the obtained objective values. State-of-the-art methods for lower-bound estimation generate batches of scenarios for the SAA problems independently. In this paper, we describe sampling methods that produce negatively dependent batches, thus reducing the variance of the sample-averaged lower bound estimator and increasing its usefulness in defining a confidence interval for the optimal objective value. We provide conditions under which the new sampling methods can reduce the variance of the lower bound estimator, and present computational results to verify that our scheme can reduce the variance significantly, by comparison with the traditional Latin hypercube approach.

preprint2013arXiv

An Approximate, Efficient Solver for LP Rounding

Many problems in machine learning can be solved by rounding the solution of an appropriate linear program (LP). This paper shows that we can recover solutions of comparable quality by rounding an approximate LP solution instead of the ex- act one. These approximate LP solutions can be computed efficiently by applying a parallel stochastic-coordinate-descent method to a quadratic-penalty formulation of the LP. We derive worst-case runtime and solution quality guarantees of this scheme using novel perturbation and convergence analysis. Our experiments demonstrate that on such combinatorial problems as vertex cover, independent set and multiway-cut, our approximate rounding scheme is up to an order of magnitude faster than Cplex (a commercial LP solver) while producing solutions of similar quality.

preprint2013arXiv

On GROUSE and Incremental SVD

GROUSE (Grassmannian Rank-One Update Subspace Estimation) is an incremental algorithm for identifying a subspace of Rn from a sequence of vectors in this subspace, where only a subset of components of each vector is revealed at each iteration. Recent analysis has shown that GROUSE converges locally at an expected linear rate, under certain assumptions. GROUSE has a similar flavor to the incremental singular value decomposition algorithm, which updates the SVD of a matrix following addition of a single column. In this paper, we modify the incremental SVD approach to handle missing data, and demonstrate that this modified approach is equivalent to GROUSE, for a certain choice of an algorithmic parameter.

preprint2013arXiv

Robust Dequantized Compressive Sensing

We consider the reconstruction problem in compressed sensing in which the observations are recorded in a finite number of bits. They may thus contain quantization errors (from being rounded to the nearest representable value) and saturation errors (from being outside the range of representable values). Our formulation has an objective of weighted $\ell_2$-$\ell_1$ type, along with constraints that account explicitly for quantization and saturation errors, and is solved with an augmented Lagrangian method. We prove a consistency result for the recovered solution, stronger than those that have appeared to date in the literature, showing in particular that asymptotic consistency can be obtained without oversampling. We present extensive computational comparisons with formulations proposed previously, and variants thereof.

preprint2012arXiv

Packing ellipsoids with overlap

The problem of packing ellipsoids of different sizes and shapes into an ellipsoidal container so as to minimize a measure of overlap between ellipsoids is considered. A bilevel optimization formulation is given, together with an algorithm for the general case and a simpler algorithm for the special case in which all ellipsoids are in fact spheres. Convergence results are proved and computational experience is described and illustrated. The motivating application - chromosome organization in the human cell nucleus - is discussed briefly, and some illustrative results are presented.

preprint2011arXiv

Approximate Stochastic Subgradient Estimation Training for Support Vector Machines

Subgradient algorithms for training support vector machines have been quite successful for solving large-scale and online learning problems. However, they have been restricted to linear kernels and strongly convex formulations. This paper describes efficient subgradient approaches without such limitations. Our approaches make use of randomized low-dimensional approximations to nonlinear kernels, and minimization of a reduced primal formulation using an algorithm based on robust stochastic approximation, which do not require strong convexity. Experiments illustrate that our approaches produce solutions of comparable prediction accuracy with the solutions acquired from existing SVM solvers, but often in much shorter time. We also suggest efficient prediction schemes that depend only on the dimension of kernel approximation, not on the number of support vectors.

preprint2011arXiv

Convex Approaches to Model Wavelet Sparsity Patterns

Statistical dependencies among wavelet coefficients are commonly represented by graphical models such as hidden Markov trees(HMTs). However, in linear inverse problems such as deconvolution, tomography, and compressed sensing, the presence of a sensing or observation matrix produces a linear mixing of the simple Markovian dependency structure. This leads to reconstruction problems that are non-convex optimizations. Past work has dealt with this issue by resorting to greedy or suboptimal iterative reconstruction methods. In this paper, we propose new modeling approaches based on group-sparsity penalties that leads to convex optimizations that can be solved exactly and efficiently. We show that the methods we develop perform significantly better in deconvolution and compressed sensing applications, while being as computationally efficient as standard coefficient-wise approaches such as lasso.

preprint2011arXiv

HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent

Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented without any locking. We present an update scheme called HOGWILD! which allows processors access to shared memory with the possibility of overwriting each other's work. We show that when the associated optimization problem is sparse, meaning most gradient updates only modify small parts of the decision variable, then HOGWILD! achieves a nearly optimal rate of convergence. We demonstrate experimentally that HOGWILD! outperforms alternative schemes that use locking by an order of magnitude.

Stephen J. Wright

What is connected

Connect this record

See the researcher in context

Building this map preview

28 published item(s)

Data selection: at the interface of PDE-based inverse problem and randomized linear algebra

Inexact Newton-CG Algorithms With Complexity Guarantees

Low-rank approximation for multiscale PDEs

On the Complexity of a Practical Primal-Dual Coordinate Method

Randomized Algorithms for Scientific Computing (RASC)

A Line-Search Descent Algorithm for Strict Saddle Functions with Complexity Guarantees

A low-rank Schwarz method for radiative transport equation with heterogeneous scattering coefficient

Analyzing Random Permutations for Cyclic Coordinate Descent

Complexity of Proximal augmented Lagrangian for nonconvex optimization with nonlinear equality constraints

Random Sampling and Efficient Algorithms for Multiscale PDEs

Online Algorithms for Factorization-Based Structure from Motion

Sorting Network Relaxations for Vector Permutation Problems

An S$\ell_1$LP-Active Set Approach for Feasibility Restoration in Power Systems

Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties

Vulnerability Analysis of Power Systems

An Accelerated Randomized Kaczmarz Algorithm

An Asynchronous Parallel Randomized Kaczmarz Algorithm

An Asynchronous Parallel Stochastic Coordinate Descent Algorithm

Local Convergence of an Algorithm for Subspace Identification from Partial Data

PMU Placement for Line Outage Identification via Multiclass Logistic Regression

Validating Sample Average Approximation Solutions with Negatively Dependent Batches

An Approximate, Efficient Solver for LP Rounding

On GROUSE and Incremental SVD

Robust Dequantized Compressive Sensing

Packing ellipsoids with overlap

Approximate Stochastic Subgradient Estimation Training for Support Vector Machines

Convex Approaches to Model Wavelet Sparsity Patterns

HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent