Source author record

Javad Lavaei

Javad Lavaei appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning Systems and Control eess.SY Distributed, Parallel, and Cluster Computing Information Theory math.IT

Catalog footprint

What is connected

13works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Structural Correspondence and Universal Approximation in Diagonal plus Low-Rank Neural Networks

The massive computational costs of scaling modern deep learning architectures have driven the widespread use of parameter-efficient low-rank structures, such as LoRA and low-rank factorization. However, theoretical guarantees for their expressive power are less explored, often relying on restrictive priors like a pretrained base matrix, ReLU activations or non-verifiable singularity conditions. We first investigate the limits of neural networks constrained strictly to low-rank manifolds without pretrained dense priors. We demonstrate a theoretical paradox: while purely rank-1 layers can exactly interpolate arbitrary scalar datasets, they collapse for function approximations. To overcome this bottleneck without surrendering parameter efficiency, we introduce a unified \textit{Structural Correspondence} framework. We prove that augmenting low-rank layers with only a minimal sparse diagonal component, say a Diagonal plus Low-Rank (DLoR) structure, is sufficient to reach Universal Approximation. We show that any full-rank transformation can be exactly reconstructed using these DLoR components by trading off network width (additive decomposition) or depth (multiplicative decomposition). By tracking asymptotic Taylor remainders, we prove that DLoR neural networks fully restore the Universal Approximation Theorem for general activation functions. Finally, we establish that multiplicative depth provides superior parameter-to-expressivity scaling compared to additive width. Our results show that dense matrices and specific activation functions are not topological prerequisites for universal expressivity.

preprint2022arXiv

Gradient-based Algorithms for Convex Discrete Optimization via Simulation

We propose new sequential simulation-optimization algorithms for general convex optimization via simulation problems with high-dimensional discrete decision space. The performance of each choice of discrete decision variables is evaluated via stochastic simulation replications. If an upper bound on the overall level of uncertainties is known, our proposed simulation-optimization algorithms utilize the discrete convex structure and are guaranteed with high probability to find a solution that is close to the best within any given user-specified precision level. The proposed algorithms work for any general convex problem and the efficiency is demonstrated by proven upper bounds on simulation costs. The upper bounds demonstrate a polynomial dependence on the dimension and scale of the decision space. For some discrete optimization via simulation problems, a gradient estimator may be available at low costs along with a single simulation replication. By integrating gradient estimators, which are possibly biased, we propose simulation-optimization algorithms to achieve optimality guarantees with a reduced dependence on the dimension under moderate assumptions on the bias.

preprint2022arXiv

Local and Global Linear Convergence of General Low-rank Matrix Recovery Problems

We study the convergence rate of gradient-based local search methods for solving low-rank matrix recovery problems with general objectives in both symmetric and asymmetric cases, under the assumption of the restricted isometry property. First, we develop a new technique to verify the Polyak-Lojasiewicz inequality in a neighborhood of the global minimizers, which leads to a local linear convergence region for the gradient descent method. Second, based on the local convergence result and a sharp strict saddle property proven in this paper, we present two new conditions that guarantee the global linear convergence of the perturbed gradient descent method. The developed local and global convergence results provide much stronger theoretical guarantees than the existing results. As a by-product, this work significantly improves the existing bounds on the RIP constant required to guarantee the non-existence of spurious solutions.

preprint2022arXiv

On the Global Optimum Convergence of Momentum-based Policy Gradient

Policy gradient (PG) methods are popular and efficient for large-scale reinforcement learning due to their relative stability and incremental nature. In recent years, the empirical success of PG methods has led to the development of a theoretical foundation for these methods. In this work, we generalize this line of research by studying the global convergence of stochastic PG methods with momentum terms, which have been demonstrated to be efficient recipes for improving PG methods. We study both the soft-max and the Fisher-non-degenerate policy parametrizations, and show that adding a momentum improves the global optimality sample complexity of vanilla PG methods by $\tilde{\mathcal{O}}(ε^{-1.5})$ and $\tilde{\mathcal{O}}(ε^{-1})$, respectively, where $ε>0$ is the target tolerance. Our work is the first one that obtains global convergence results for the momentum-based PG methods. For the generic Fisher-non-degenerate policy parametrizations, our result is the first single-loop and finite-batch PG algorithm achieving $\tilde{O}(ε^{-3})$ global optimality sample complexity. Finally, as a by-product, our methods also provide general framework for analyzing the global convergence rates of stochastic PG methods, which can be easily applied and extended to different PG estimators.

preprint2022arXiv

Semidefinite Programming versus Burer-Monteiro Factorization for Matrix Sensing

Many fundamental low-rank optimization problems, such as matrix completion, phase synchronization/retrieval, power system state estimation, and robust PCA, can be formulated as the matrix sensing problem. Two main approaches for solving matrix sensing are based on semidefinite programming (SDP) and Burer-Monteiro (B-M) factorization. The SDP method suffers from high computational and space complexities, whereas the B-M method may return a spurious solution due to the non-convexity of the problem. The existing theoretical guarantees for the success of these methods have led to similar conservative conditions, which may wrongly imply that these methods have comparable performances. In this paper, we shed light on some major differences between these two methods. First, we present a class of structured matrix completion problems for which the B-M methods fail with an overwhelming probability, while the SDP method works correctly. Second, we identify a class of highly sparse matrix completion problems for which the B-M method works and the SDP method fails. Third, we prove that although the B-M method exhibits the same performance independent of the rank of the unknown solution, the success of the SDP method is correlated to the rank of the solution and improves as the rank increases. Unlike the existing literature that has mainly focused on those instances of matrix sensing for which both SDP and B-M work, this paper offers the first result on the unique merit of each method over the alternative approach.

preprint2022arXiv

Stochastic Localization Methods for Convex Discrete Optimization via Simulation

We develop and analyze a set of new sequential simulation-optimization algorithms for large-scale multi-dimensional discrete optimization via simulation problems with a convexity structure. The "large-scale" notion refers to that the decision variable has a large number of values to choose from on each dimension. The proposed algorithms are targeted to identify a solution that is close to the optimal solution given any precision level with any given probability. To achieve this target, utilizing the convexity structure, our algorithm design does not need to scan all the choices of the decision variable, but instead sequentially draws a subset of choices of the decision variable and uses them to "localize" potentially near-optimal solutions to an adaptively shrinking region. To show the power of the localization operation, we first consider one-dimensional large-scale problems. We propose the shrinking uniform sampling algorithm, which is proved to achieve the target with an optimal expected simulation cost under an asymptotic criterion. For multi-dimensional problems, we combine the idea of localization with subgradient information and propose a framework to design stochastic cutting-plane methods and the dimension reduction algorithm, whose expected simulation cost have a low dependence on the scale and the dimension of the problems. The proposed algorithms do not require prior information about the Lipschitz constant of the objective function and the simulation costs are upper bounded by a value that is independent of the Lipschitz constant. Finally, we propose an adaptive algorithm to deal with the unknown noise variance case under the assumption that the randomness of the system is Gaussian. We implement the proposed algorithms on both synthetic and queueing simulation optimization problems, and demonstrate better performances compared to benchmark methods.

preprint2021arXiv

Escaping spurious local minimum trajectories in online time-varying nonconvex optimization

A major limitation of online algorithms that track the optimizers of time-varying nonconvex optimization problems is that they focus on a specific local minimum trajectory, which may lead to poor spurious local solutions. In this paper, we show that the natural temporal variation may help simple online tracking methods find and track time-varying global minima. To this end, we investigate the properties of a time-varying projected gradient flow system with inertia, which can be regarded as the continuous-time limit of (1) the optimality conditions for a discretized sequential optimization problem with a proximal regularization and (2) the online tracking scheme. We introduce the notion of the dominant trajectory and show that the inherent temporal variation could re-shape the landscape of the Lagrange functional and help a proximal algorithm escape the spurious local minimum trajectories if the global minimum trajectory is dominant. For a problem with twice continuously differentiable objective function and constraints, sufficient conditions are derived to guarantee that no matter how a local search method is initialized, it will track a time-varying global solution after some time. The results are illustrated on a benchmark example with many local minima.

preprint2020arXiv

Penalized Semidefinite Programming for Quadratically-Constrained Quadratic Optimization

In this paper, we give a new penalized semidefinite programming approach for non-convex quadratically-constrained quadratic programs (QCQPs). We incorporate penalty terms into the objective of convex relaxations in order to retrieve feasible and near-optimal solutions for non-convex QCQPs. We introduce a generalized linear independence constraint qualification (GLICQ) criterion and prove that any GLICQ regular point that is sufficiently close to the feasible set can be used to construct an appropriate penalty term and recover a feasible solution. Inspired by these results, we develop a heuristic sequential procedure that preserves feasibility and aims to improve the objective value at each iteration. Numerical experiments on large-scale system identification problems as well as benchmark instances from the library of quadratic programming (QPLIB) demonstrate the ability of the proposed penalized semidefinite programs in finding near-optimal solutions for non-convex QCQP.

preprint2020arXiv

Sparse Semidefinite Programs with Guaranteed Near-Linear Time Complexity via Dualized Clique Tree Conversion

Clique tree conversion solves large-scale semidefinite programs by splitting an $n\times n$ matrix variable into up to $n$ smaller matrix variables, each representing a principal submatrix of up to $ω\timesω$. Its fundamental weakness is the need to introduce overlap constraints that enforce agreement between different matrix variables, because these can result in dense coupling. In this paper, we show that by dualizing the clique tree conversion, the coupling due to the overlap constraints is guaranteed to be sparse over dense blocks, with a block sparsity pattern that coincides with the adjacency matrix of a tree. We consider two classes of semidefinite programs with favorable sparsity patterns that encompass the MAXCUT and MAX $k$-CUT relaxations, the Lovasz Theta problem, and the AC optimal power flow relaxation. Assuming that $ω\ll n$, we prove that the per-iteration cost of an interior-point method is linear $O(n)$ time and memory, so an $ε$-accurate and $ε$-feasible iterate is obtained after $O(\sqrt{n}\log(1/ε))$ iterations in near-linear $O(n^{1.5}\log(1/ε))$ time. We confirm our theoretical insights with numerical results on semidefinite programs as large as $n=13659$. (Supporting code at https://github.com/ryz-codes/dual_ctc )

preprint2019arXiv

Large-Scale Traffic Signal Offset Optimization

The offset optimization problem seeks to coordinate and synchronize the timing of traffic signals throughout a network in order to enhance traffic flow and reduce stops and delays. Recently, offset optimization was formulated into a continuous optimization problem without integer variables by modeling traffic flow as sinusoidal. In this paper, we present a novel algorithm to solve this new formulation to near-global optimality on a large-scale. Specifically, we solve a convex relaxation of the nonconvex problem using a tree decomposition reduction, and use randomized rounding to recover a near-global solution. We prove that the algorithm always delivers solutions of expected value at least 0.785 times the globally optimal value. Moreover, assuming that the topology of the traffic network is "tree-like", we prove that the algorithm has near-linear time complexity with respect to the number of intersections. These theoretical guarantees are experimentally validated on the Berkeley, Manhattan, and Los Angeles traffic networks. In our numerical results, the empirical time complexity of the algorithm is linear, and the solutions have objectives within 0.99 times the globally optimal value.

preprint2019arXiv

Sharp Restricted Isometry Bounds for the Inexistence of Spurious Local Minima in Nonconvex Matrix Recovery

Nonconvex matrix recovery is known to contain no spurious local minima under a restricted isometry property (RIP) with a sufficiently small RIP constant $δ$. If $δ$ is too large, however, then counterexamples containing spurious local minima are known to exist. In this paper, we introduce a proof technique that is capable of establishing sharp thresholds on $δ$ to guarantee the inexistence of spurious local minima. Using the technique, we prove that in the case of a rank-1 ground truth, an RIP constant of $δ<1/2$ is both necessary and sufficient for exact recovery from any arbitrary initial point (such as a random point). We also prove a local recovery result: given an initial point $x_{0}$ satisfying $f(x_{0})\le(1-δ)^{2}f(0)$, any descent algorithm that converges to second-order optimality guarantees exact recovery.

preprint2013arXiv

Geometry of Power Flows and Optimization in Distribution Networks

We investigate the geometry of injection regions and its relationship to optimization of power flows in tree networks. The injection region is the set of all vectors of bus power injections that satisfy the network and operation constraints. The geometrical object of interest is the set of Pareto-optimal points of the injection region. If the voltage magnitudes are fixed, the injection region of a tree network can be written as a linear transformation of the product of two-bus injection regions, one for each line in the network. Using this decomposition, we show that under the practical condition that the angle difference across each line is not too large, the set of Pareto-optimal points of the injection region remains unchanged by taking the convex hull. Moreover, the resulting convexified optimal power flow problem can be efficiently solved via }{ semi-definite programming or second order cone relaxations. These results improve upon earlier works by removing the assumptions on active power lower bounds. It is also shown that our practical angle assumption guarantees two other properties: (i) the uniqueness of the solution of the power flow problem, and (ii) the non-negativity of the locational marginal prices. Partial results are presented for the case when the voltage magnitudes are not fixed but can lie within certain bounds.

preprint2012arXiv

Message Passing for Dynamic Network Energy Management

We consider a network of devices, such as generators, fixed loads, deferrable loads, and storage devices, each with its own dynamic constraints and objective, connected by lossy capacitated lines. The problem is to minimize the total network objective subject to the device and line constraints, over a given time horizon. This is a large optimization problem, with variables for consumption or generation in each time period for each device. In this paper we develop a decentralized method for solving this problem. The method is iterative: At each step, each device exchanges simple messages with its neighbors in the network and then solves its own optimization problem, minimizing its own objective function, augmented by a term determined by the messages it has received. We show that this message passing method converges to a solution when the device objective and constraints are convex. The method is completely decentralized, and needs no global coordination other than synchronizing iterations; the problems to be solved by each device can typically be solved extremely efficiently and in parallel. The method is fast enough that even a serial implementation can solve substantial problems in reasonable time frames. We report results for several numerical experiments, demonstrating the method's speed and scaling, including the solution of a problem instance with over 30 million variables in 52 minutes for a serial implementation; with decentralized computing, the solve time would be less than one second.

Javad Lavaei

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

Structural Correspondence and Universal Approximation in Diagonal plus Low-Rank Neural Networks

Gradient-based Algorithms for Convex Discrete Optimization via Simulation

Local and Global Linear Convergence of General Low-rank Matrix Recovery Problems

On the Global Optimum Convergence of Momentum-based Policy Gradient

Semidefinite Programming versus Burer-Monteiro Factorization for Matrix Sensing

Stochastic Localization Methods for Convex Discrete Optimization via Simulation

Escaping spurious local minimum trajectories in online time-varying nonconvex optimization

Penalized Semidefinite Programming for Quadratically-Constrained Quadratic Optimization

Sparse Semidefinite Programs with Guaranteed Near-Linear Time Complexity via Dualized Clique Tree Conversion

Large-Scale Traffic Signal Offset Optimization

Sharp Restricted Isometry Bounds for the Inexistence of Spurious Local Minima in Nonconvex Matrix Recovery

Geometry of Power Flows and Optimization in Distribution Networks

Message Passing for Dynamic Network Energy Management