Source author record

Haihao Lu

Haihao Lu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning math.NA Numerical Analysis

Catalog footprint

What is connected

11works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MPAX: Mathematical Programming in JAX

We present MPAX (Mathematical Programming in JAX), an open-source first-order solver for large-scale linear programming (LP) and convex quadratic programming (QP) built natively in JAX. The primary goal of MPAX is to exploit modern machine learning infrastructure for large-scale mathematical programming, while also providing advanced mathematical programming algorithms that are easy to integrate into machine learning workflows. MPAX implements two PDHG variants, r2HPDHG for LP and rAPDHG for QP, together with diagonal preconditioning, adaptive restarts, adaptive step sizes, primal-weight updates, infeasibility detection, and feasibility polishing. Leveraging JAX's compilation and parallelization ecosystem, MPAX provides across-hardware portability, batched solving, distributed optimization, and automatic differentiation. We evaluate MPAX on CPUs, NVIDIA GPUs, and Google TPUs, observing substantial GPU speedups over CPU baselines and competitive performance relative to GPU-based codebases on standard LP/QP benchmarks. Our numerical experiments further demonstrate MPAX's capabilities in high-throughput batched solving, near-linear multi-GPU scaling for dense LPs, and efficient end-to-end differentiable training. The solver is publicly available at https://github.com/MIT-Lu-Lab/MPAX.

preprint2024arXiv

cuPDLP-C: A Strengthened Implementation of cuPDLP for Linear Programming by C language

A recent GPU implementation of the Restarted Primal-Dual Hybrid Gradient Method for Linear Programming was proposed in Lu and Yang (2023). Its computational results demonstrate the significant computational advantages of the GPU-based first-order algorithm on certain large-scale problems. The average performance also achieves a level close to commercial solvers for the first time in history. However, due to limitations in experimental hardware and the disadvantage of implementing the algorithm in Julia compared to C language, neither the commercial solver nor cuPDLP reached their maximum efficiency. Therefore, in this report, we have re-implemented and optimized cuPDLP in C language. Utilizing state-of-the-art CPU and GPU hardware, we extensively compare cuPDLP with the best commercial solvers. The experiments further highlight its substantial computational advantages and potential for solving large-scale linear programming problems. We also discuss the profound impact this breakthrough may have on mathematical programming research and the entire operations research community.

preprint2023arXiv

A $J$-Symmetric Quasi-Newton Method for Minimax Problems

Minimax problems have gained tremendous attentions across the optimization and machine learning community recently. In this paper, we introduce a new quasi-Newton method for minimax problems, which we call $J$-symmetric quasi-Newton method. The method is obtained by exploiting the $J$-symmetric structure of the second-order derivative of the objective function in minimax problem. We show that the Hessian estimation (as well as its inverse) can be updated by a rank-2 operation, and it turns out that the update rule is a natural generalization of the classic Powell symmetric Broyden (PSB) method from minimization problems to minimax problems. In theory, we show that our proposed quasi-Newton algorithm enjoys local Q-superlinear convergence to a desirable solution under standard regularity conditions. Furthermore, we introduce a trust-region variant of the algorithm that enjoys global R-superlinear convergence. Finally, we present numerical experiments that verify our theory and show the effectiveness of our proposed algorithms compared to Broyden's method and the extragradient method on three classes of minimax problems.

preprint2023arXiv

Nearly Optimal Linear Convergence of Stochastic Primal-Dual Methods for Linear Programming

There is a recent interest on first-order methods for linear programming (LP). In this paper,we propose a stochastic algorithm using variance reduction and restarts for solving sharp primal-dual problems such as LP. We show that the proposed stochastic method exhibits a linear convergence rate for solving sharp instances with a high probability. In addition, we propose an efficient coordinate-based stochastic oracle for unconstrained bilinear problems, which has $\mathcal O(1)$ per iteration cost and improves the complexity of the existing deterministic and stochastic algorithms. Finally, we show that the obtained linear convergence rate is nearly optimal (upto $\log$ terms) for a wide class of stochastic primal dual methods.

preprint2022arXiv

Practical Large-Scale Linear Programming using Primal-Dual Hybrid Gradient

We present PDLP, a practical first-order method for linear programming (LP) that can solve to the high levels of accuracy that are expected in traditional LP applications. In addition, it can scale to very large problems because its core operation is matrix-vector multiplications. PDLP is derived by applying the primal-dual hybrid gradient (PDHG) method, popularized by Chambolle and Pock (2011), to a saddle-point formulation of LP. PDLP enhances PDHG for LP by combining several new techniques with older tricks from the literature; the enhancements include diagonal preconditioning, presolving, adaptive step sizes, and adaptive restarting. PDLP improves the state of the art for first-order methods applied to LP. We compare PDLP with SCS, an ADMM-based solver, on a set of 383 LP instances derived from MIPLIB 2017. With a target of $10^{-8}$ relative accuracy and 1 hour time limit, PDLP achieves a 6.3x reduction in the geometric mean of solve times and a 4.6x reduction in the number of instances unsolved (from 227 to 49). Furthermore, we highlight standard benchmark instances and a large-scale application (PageRank) where our open-source prototype of PDLP, written in Julia, outperforms a commercial LP solver.

preprint2021arXiv

Infeasibility detection with primal-dual hybrid gradient for large-scale linear programming

We study the problem of detecting infeasibility of large-scale linear programming problems using the primal-dual hybrid gradient method (PDHG) of Chambolle and Pock (2011). The literature on PDHG has mostly focused on settings where the problem at hand is assumed to be feasible. When the problem is not feasible, the iterates of the algorithm do not converge. In this scenario, we show that the iterates diverge at a controlled rate towards a well-defined ray. The direction of this ray is known as the infimal displacement vector $v$. The first contribution of our work is to prove that this vector recovers certificates of primal and dual infeasibility whenever they exist. Based on this fact, we propose a simple way to extract approximate infeasibility certificates from the iterates of PDHG. We study three different sequences that converge to the infimal displacement vector: the difference of iterates, the normalized iterates, and the normalized average. All of them are easy to compute, and thus the approach is suitable for large-scale problems. Our second contribution is to establish tight convergence rates for these sequences. We demonstrate that the normalized iterates and the normalized average achieve a convergence rate of $O(1/k)$, improving over the known rate of $O(1/\sqrt{k})$. This rate is general and applies to any fixed-point iteration of a nonexpansive operator. Thus, it is a result of independent interest since it covers a broad family of algorithms, including, for example, ADMM, and can be applied settings beyond linear programming, such as quadratic and semidefinite programming. Further, in the case of linear programming we show that, under nondegeneracy assumptions, the iterates of PDHG identify the active set of an auxiliary feasible problem in finite time, which ensures that the difference of iterates exhibits eventual linear convergence to the infimal displacement vector.

preprint2021arXiv

Limiting Behaviors of Nonconvex-Nonconcave Minimax Optimization via Continuous-Time Systems

Unlike nonconvex optimization, where gradient descent is guaranteed to converge to a local optimizer, algorithms for nonconvex-nonconcave minimax optimization can have topologically different solution paths: sometimes converging to a solution, sometimes never converging and instead following a limit cycle, and sometimes diverging. In this paper, we study the limiting behaviors of three classic minimax algorithms: gradient descent ascent (GDA), alternating gradient descent ascent (AGDA), and the extragradient method (EGM). Numerically, we observe that all of these limiting behaviors can arise in Generative Adversarial Networks (GAN) training and are easily demonstrated for a range of GAN problems. To explain these different behaviors, we study the high-order resolution continuous-time dynamics that correspond to each algorithm, which results in the sufficient (and almost necessary) conditions for the local convergence by each method. Moreover, this ODE perspective allows us to characterize the phase transition between these different limiting behaviors caused by introducing regularization as Hopf Bifurcations.

preprint2020arXiv

Accelerating Gradient Boosting Machine

Gradient Boosting Machine (GBM) is an extremely powerful supervised learning algorithm that is widely used in practice. GBM routinely features as a leading algorithm in machine learning competitions such as Kaggle and the KDDCup. In this work, we propose Accelerated Gradient Boosting Machine (AGBM) by incorporating Nesterov's acceleration techniques into the design of GBM. The difficulty in accelerating GBM lies in the fact that weak (inexact) learners are commonly used, and therefore the errors can accumulate in the momentum term. To overcome it, we design a "corrected pseudo residual" and fit best weak learner to this corrected pseudo residual, in order to perform the z-update. Thus, we are able to derive novel computational guarantees for AGBM. This is the first GBM type of algorithm with theoretically-justified accelerated convergence rate. Finally we demonstrate with a number of numerical experiments the effectiveness of AGBM over conventional GBM in obtaining a model with good training and/or testing data fidelity.

preprint2020arXiv

Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization

We propose a new stochastic optimization framework for empirical risk minimization problems such as those that arise in machine learning. The traditional approaches, such as (mini-batch) stochastic gradient descent (SGD), utilize an unbiased gradient estimator of the empirical average loss. In contrast, we develop a computationally efficient method to construct a gradient estimator that is purposely biased toward those observations with higher current losses. On the theory side, we show that the proposed method minimizes a new ordered modification of the empirical average loss, and is guaranteed to converge at a sublinear rate to a global optimum for convex loss and to a critical point for weakly convex (non-convex) loss. Furthermore, we prove a new generalization bound for the proposed algorithm. On the empirical side, the numerical experiments show that our proposed method consistently improves the test errors compared with the standard mini-batch SGD in various models including SVM, logistic regression, and deep learning problems.

preprint2020arXiv

Randomized Gradient Boosting Machine

Gradient Boosting Machine (GBM) introduced by Friedman is a powerful supervised learning algorithm that is very widely used in practice---it routinely features as a leading algorithm in machine learning competitions such as Kaggle and the KDDCup. In spite of the usefulness of GBM in practice, our current theoretical understanding of this method is rather limited. In this work, we propose Randomized Gradient Boosting Machine (RGBM) which leads to substantial computational gains compared to GBM, by using a randomization scheme to reduce search in the space of weak-learners. We derive novel computational guarantees for RGBM. We also provide a principled guideline towards better step-size selection in RGBM that does not require a line search. Our proposed framework is inspired by a special variant of coordinate descent that combines the benefits of randomized coordinate descent and greedy coordinate descent; and may be of independent interest as an optimization algorithm. As a special case, our results for RGBM lead to superior computational guarantees for GBM. Our computational guarantees depend upon a curious geometric quantity that we call Minimal Cosine Angle, which relates to the density of weak-learners in the prediction space. On a series of numerical experiments on real datasets, we demonstrate the effectiveness of RGBM over GBM in terms of obtaining a model with good training and/or testing data fidelity with a fraction of the computational cost.

preprint2016arXiv

New Computational Guarantees for Solving Convex Optimization Problems with First Order Methods, via a Function Growth Condition Measure

Motivated by recent work of Renegar, we present new computational methods and associated computational guarantees for solving convex optimization problems using first-order methods. Our problem of interest is the general convex optimization problem $f^* = \min_{x \in Q} f(x)$, where we presume knowledge of a strict lower bound $f_{\mathrm{slb}} < f^*$. [Indeed, $f_{\mathrm{slb}}$ is naturally known when optimizing many loss functions in statistics and machine learning (least-squares, logistic loss, exponential loss, total variation loss, etc.) as well as in Renegar's transformed version of the standard conic optimization problem; in all these cases one has $f_{\mathrm{slb}} = 0 < f^*$.] We introduce a new functional measure called the growth constant $G$ for $f(\cdot)$, that measures how quickly the level sets of $f(\cdot)$ grow relative to the function value, and that plays a fundamental role in the complexity analysis. When $f(\cdot)$ is non-smooth, we present new computational guarantees for the Subgradient Descent Method and for smoothing methods, that can improve existing computational guarantees in several ways, most notably when the initial iterate $x^0$ is far from the optimal solution set. When $f(\cdot)$ is smooth, we present a scheme for periodically restarting the Accelerated Gradient Method that can also improve existing computational guarantees when $x^0$ is far from the optimal solution set, and in the presence of added structure we present a scheme using parametrically increased smoothing that further improves the associated computational guarantees.

Haihao Lu

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

MPAX: Mathematical Programming in JAX

cuPDLP-C: A Strengthened Implementation of cuPDLP for Linear Programming by C language

A $J$-Symmetric Quasi-Newton Method for Minimax Problems

Nearly Optimal Linear Convergence of Stochastic Primal-Dual Methods for Linear Programming

Practical Large-Scale Linear Programming using Primal-Dual Hybrid Gradient

Infeasibility detection with primal-dual hybrid gradient for large-scale linear programming

Limiting Behaviors of Nonconvex-Nonconcave Minimax Optimization via Continuous-Time Systems

Accelerating Gradient Boosting Machine

Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization

Randomized Gradient Boosting Machine

New Computational Guarantees for Solving Convex Optimization Problems with First Order Methods, via a Function Growth Condition Measure