Source author record

Ermin Wei

Ermin Wei appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Systems and Control Artificial Intelligence Computer Science and Game Theory Computer Vision eess.SY Machine Learning math.PR math.ST Social and Information Networks Statistics Theory

Catalog footprint

What is connected

10works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Pro-KLShampoo: Projected KL-Shampoo with Whitening Recovered by Orthogonalization

Optimizers that exploit the matrix structure of gradients are central to modern LLM pre-training, with two distinct frontiers: explicit Kronecker-factored preconditioning -- most recently KL-Shampoo, which estimates the preconditioner via KL divergence minimization -- and orthogonalization of the gradient momentum, exemplified by Muon and analyzed as steepest descent under the spectral norm. The two routes are typically developed in isolation. We make a structural observation about KL-Shampoo's Kronecker preconditioners: their eigenvalue spectra exhibit a \emph{spike-and-flat} shape -- a few dominant eigenvalues followed by an approximately uniform tail -- across layers and training stages, holding exactly under a rank-$ρ$ signal-plus-noise gradient model. We exploit this structure by restricting one of KL-Shampoo's Kronecker factors to a parametric family aligned with the spike-and-flat shape: full spectral structure on a tracked $r$-dimensional subspace, single shared eigenvalue across the remaining $n-r$ directions. On these directions, we apply orthogonalization. An identity shows that this orthogonalization recovers the algebraic form of full KL-Shampoo's preconditioner. On four pre-training scales (GPT-2 124M / 350M, LLaMA 134M / 450M), Pro-KLShampoo consistently outperforms KL-Shampoo at every subspace rank we test in validation loss, peak per-GPU memory, and wallclock time to reach each loss level.

preprint2024arXiv

Exact Community Recovery in the Geometric SBM

We study the problem of exact community recovery in the Geometric Stochastic Block Model (GSBM), where each vertex has an unknown community label as well as a known position, generated according to a Poisson point process in $\mathbb{R}^d$. Edges are formed independently conditioned on the community labels and positions, where vertices may only be connected by an edge if they are within a prescribed distance of each other. The GSBM thus favors the formation of dense local subgraphs, which commonly occur in real-world networks, a property that makes the GSBM qualitatively very different from the standard Stochastic Block Model (SBM). We propose a linear-time algorithm for exact community recovery, which succeeds down to the information-theoretic threshold, confirming a conjecture of Abbe, Baccelli, and Sankararaman. The algorithm involves two phases. The first phase exploits the density of local subgraphs to propagate estimated community labels among sufficiently occupied subregions, and produces an almost-exact vertex labeling. The second phase then refines the initial labels using a Poisson testing procedure. Thus, the GSBM enjoys local to global amplification just as the SBM, with the advantage of admitting an information-theoretically optimal, linear-time algorithm.

preprint2022arXiv

Recent Developments in Security-Constrained AC Optimal Power Flow: Overview of Challenge 1 in the ARPA-E Grid Optimization Competition

The optimal power flow problem is central to many tasks in the design and operation of electric power grids. This problem seeks the minimum cost operating point for an electric power grid while satisfying both engineering requirements and physical laws describing how power flows through the electric network. By additionally considering the possibility of component failures and using an accurate AC power flow model of the electric network, the security-constrained AC optimal power flow (SC-AC-OPF) problem is of paramount practical relevance. To assess recent progress in solution algorithms for SC-AC-OPF problems and spur new innovations, the U.S. Department of Energy's Advanced Research Projects Agency--Energy (ARPA-E) organized Challenge 1 of the Grid Optimization (GO) competition. This paper describes the SC-AC-OPF problem formulation used in the competition, overviews historical developments and the state of the art in SC-AC-OPF algorithms, discusses the competition, and summarizes the algorithms used by the top three teams in Challenge 1 of the GO Competition (Teams gollnlp, GO-SNIP, and GMI-GO).

preprint2021arXiv

S-NEAR-DGD: A Flexible Distributed Stochastic Gradient Method for Inexact Communication

We present and analyze a stochastic distributed method (S-NEAR-DGD) that can tolerate inexact computation and inaccurate information exchange to alleviate the problems of costly gradient evaluations and bandwidth-limited communication in large-scale systems. Our method is based on a class of flexible, distributed first order algorithms that allow for the trade-off of computation and communication to best accommodate the application setting. We assume that all the information exchange between nodes is subject to random distortion and that only stochastic approximations of the true gradients are available. Our theoretical results prove that the proposed algorithm converges linearly in expectation to a neighborhood of the optimal solution for strongly convex objective functions with Lipschitz gradients. We characterize the dependence of this neighborhood on algorithm and network parameters, the quality of the communication channel and the precision of the stochastic gradient approximations used. Finally, we provide numerical results to evaluate the empirical performance of our method.

preprint2020arXiv

A Two-Stage Decomposition Approach for AC Optimal Power Flow

The alternating current optimal power flow (AC-OPF) problem is critical to power system operations and planning, but it is generally hard to solve due to its nonconvex and large-scale nature. This paper proposes a scalable decomposition approach in which the power network is decomposed into a master network and a number of subnetworks, where each network has its own AC-OPF subproblem. This formulates a two-stage optimization problem and requires only a small amount of communication between the master and subnetworks. The key contribution is a smoothing technique that renders the response of a subnetwork differentiable with respect to the input from the master problem, utilizing properties of the barrier problem formulation that naturally arises when subproblems are solved by a primal-dual interior-point algorithm. Consequently, existing efficient nonlinear programming solvers can be used for both the master problem and the subproblems. The advantage of this framework is that speedup can be obtained by processing the subnetworks in parallel, and it has convergence guarantees under reasonable assumptions. The formulation is readily extended to instances with stochastic subnetwork loads. Numerical results show favorable performance and illustrate the scalability of the algorithm which is able to solve instances with more than 11 million buses.

preprint2020arXiv

Distributed Multi-agent Video Fast-forwarding

In many intelligent systems, a network of agents collaboratively perceives the environment for better and more efficient situation awareness. As these agents often have limited resources, it could be greatly beneficial to identify the content overlapping among camera views from different agents and leverage it for reducing the processing, transmission and storage of redundant/unimportant video frames. This paper presents a consensus-based distributed multi-agent video fast-forwarding framework, named DMVF, that fast-forwards multi-view video streams collaboratively and adaptively. In our framework, each camera view is addressed by a reinforcement learning based fast-forwarding agent, which periodically chooses from multiple strategies to selectively process video frames and transmits the selected frames at adjustable paces. During every adaptation period, each agent communicates with a number of neighboring agents, evaluates the importance of the selected frames from itself and those from its neighbors, refines such evaluation together with other agents via a system-wide consensus algorithm, and uses such evaluation to decide their strategy for the next period. Compared with approaches in the literature on a real-world surveillance video dataset VideoWeb, our method significantly improves the coverage of important frames and also reduces the number of frames processed in the system.

preprint2020arXiv

FlexPD: A Flexible Framework Of First-Order Primal-Dual Algorithms for Distributed Optimization

In this paper, we study the problem of minimizing a sum of convex objective functions, which are locally available to agents in a network. Distributed optimization algorithms make it possible for the agents to cooperatively solve the problem through local computations and communications with neighbors. Lagrangian-based distributed optimization algorithms have received significant attention in recent years, due to their exact convergence property. However, many of these algorithms have slow convergence or are expensive to execute. In this paper, we develop a flexible framework of first-order primal-dual algorithms (FlexPD), which allows for multiple primal steps per iteration. This framework includes three algorithms, FlexPD-F, FlexPD-G, and FlexPD-C that can be used for various applications with different computation and communication limitations. For strongly convex and Lipschitz gradient objective functions, we establish linear convergence of our proposed framework to the optimal solution. Simulation results confirm the superior performance of our framework compared to the existing methods.

preprint2020arXiv

Learning to Price Vehicle Service with Unknown Demand

It can be profitable for vehicle service providers to set service prices based on users' travel demand on different origin-destination pairs. The prior studies on the spatial pricing of vehicle service rely on the assumption that providers know users' demand. In this paper, we study a monopolistic provider who initially does not know users' demand and needs to learn it over time by observing the users' responses to the service prices. We design a pricing and vehicle supply policy, considering the tradeoff between exploration (i.e., learning the demand) and exploitation (i.e., maximizing the provider's short-term payoff). Considering that the provider needs to ensure the vehicle flow balance at each location, its pricing and supply decisions for different origin-destination pairs are tightly coupled. This makes it challenging to theoretically analyze the performance of our policy. We analyze the gap between the provider's expected time-average payoffs under our policy and a clairvoyant policy, which makes decisions based on complete information of the demand. We prove that after running our policy for D days, the loss in the expected time-average payoff can be at most O((ln D)^0.5 D^(-0.25)), which decays to zero as D approaches infinity.

preprint2013arXiv

On the O(1/k) Convergence of Asynchronous Distributed Alternating Direction Method of Multipliers

We consider a network of agents that are cooperatively solving a global optimization problem, where the objective function is the sum of privately known local objective functions of the agents and the decision variables are coupled via linear constraints. Recent literature focused on special cases of this formulation and studied their distributed solution through either subgradient based methods with O(1/sqrt(k)) rate of convergence (where k is the iteration number) or Alternating Direction Method of Multipliers (ADMM) based methods, which require a synchronous implementation and a globally known order on the agents. In this paper, we present a novel asynchronous ADMM based distributed method for the general formulation and show that it converges at the rate O(1/k).

preprint2011arXiv

A Distributed Newton Method for Network Utility Maximization

Most existing work uses dual decomposition and subgradient methods to solve Network Utility Maximization (NUM) problems in a distributed manner, which suffer from slow rate of convergence properties. This work develops an alternative distributed Newton-type fast converging algorithm for solving network utility maximization problems with self-concordant utility functions. By using novel matrix splitting techniques, both primal and dual updates for the Newton step can be computed using iterative schemes in a decentralized manner with limited information exchange. Similarly, the stepsize can be obtained via an iterative consensus-based averaging scheme. We show that even when the Newton direction and the stepsize in our method are computed within some error (due to finite truncation of the iterative schemes), the resulting objective function value still converges superlinearly to an explicitly characterized error neighborhood. Simulation results demonstrate significant convergence rate improvement of our algorithm relative to the existing subgradient methods based on dual decomposition.

Ermin Wei

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Pro-KLShampoo: Projected KL-Shampoo with Whitening Recovered by Orthogonalization

Exact Community Recovery in the Geometric SBM

Recent Developments in Security-Constrained AC Optimal Power Flow: Overview of Challenge 1 in the ARPA-E Grid Optimization Competition

S-NEAR-DGD: A Flexible Distributed Stochastic Gradient Method for Inexact Communication

A Two-Stage Decomposition Approach for AC Optimal Power Flow

Distributed Multi-agent Video Fast-forwarding

FlexPD: A Flexible Framework Of First-Order Primal-Dual Algorithms for Distributed Optimization

Learning to Price Vehicle Service with Unknown Demand

On the O(1/k) Convergence of Asynchronous Distributed Alternating Direction Method of Multipliers

A Distributed Newton Method for Network Utility Maximization