Source author record

Xiaoming Yuan

Xiaoming Yuan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Machine Learning Computer Vision math.DS math.NA Distributed, Parallel, and Cluster Computing eess.SP Information Theory math.IT Mathematical Software Networking and Internet Architecture Numerical Analysis

Catalog footprint

What is connected

14works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A General Descent Aggregation Framework for Gradient-based Bi-level Optimization

In recent years, a variety of gradient-based methods have been developed to solve Bi-Level Optimization (BLO) problems in machine learning and computer vision areas. However, the theoretical correctness and practical effectiveness of these existing approaches always rely on some restrictive conditions (e.g., Lower-Level Singleton, LLS), which could hardly be satisfied in real-world applications. Moreover, previous literature only proves theoretical results based on their specific iteration strategies, thus lack a general recipe to uniformly analyze the convergence behaviors of different gradient-based BLOs. In this work, we formulate BLOs from an optimistic bi-level viewpoint and establish a new gradient-based algorithmic framework, named Bi-level Descent Aggregation (BDA), to partially address the above issues. Specifically, BDA provides a modularized structure to hierarchically aggregate both the upper- and lower-level subproblems to generate our bi-level iterative dynamics. Theoretically, we establish a general convergence analysis template and derive a new proof recipe to investigate the essential theoretical properties of gradient-based BLO methods. Furthermore, this work systematically explores the convergence behavior of BDA in different optimization scenarios, i.e., considering various solution qualities (i.e., global/local/stationary solution) returned from solving approximation subproblems. Extensive experiments justify our theoretical results and demonstrate the superiority of the proposed algorithm for hyper-parameter optimization and meta-learning tasks. Source code is available at https://github.com/vis-opt-group/BDA.

preprint2022arXiv

A Globally Convergent Proximal Newton-Type Method in Nonsmooth Convex Optimization

The paper proposes and justifies a new algorithm of the proximal Newton type to solve a broad class of nonsmooth composite convex optimization problems without strong convexity assumptions. Based on advanced notions and techniques of variational analysis, we establish implementable results on the global convergence of the proposed algorithm as well as its local convergence with superlinear and quadratic rates. For certain structured problems, the obtained local convergence conditions do not require the local Lipschitz continuity of the corresponding Hessian mappings that is a crucial assumption used in the literature to ensure a superlinear convergence of other algorithms of the proximal Newton type. The conducted numerical experiments of solving the $l_1$ regularized logistic regression model illustrate the possibility of applying the proposed algorithm to deal with practically important problems.

preprint2022arXiv

Difference of convex algorithms for bilevel programs with applications in hyperparameter selection

In this paper, we present difference of convex algorithms for solving bilevel programs in which the upper level objective functions are difference of convex functions, and the lower level programs are fully convex. This nontrivial class of bilevel programs provides a powerful modelling framework for dealing with applications arising from hyperparameter selection in machine learning. Thanks to the full convexity of the lower level program, the value function of the lower level program turns out to be convex and hence the bilevel program can be reformulated as a difference of convex bilevel program. We propose two algorithms for solving the reformulated difference of convex program and show their convergence under very mild assumptions. Finally we conduct numerical experiments to a bilevel model of support vector machine classification.

preprint2022arXiv

Network Bandwidth Allocation Problem For Cloud Computing

Cloud computing enables ubiquitous, convenient, and on-demand network access to a shared pool of computing resources. Cloud computing technologies create tremendous commercial values in various areas, while many scientific challenges have arisen accordingly. The process of transmitting data through networks is characterized by some distinctive characteristics such as nonlinear, nonconvex and even noncontinuous cost functions generated by pricing schemes, periodically updated network topology, as well as replicable data within network nodes. Because of these characteristics, data transfer scheduling is a very challenging problem both engineeringly and scientifically. On the other hand, the cost for bandwidth is a major component of the operating cost for cloud providers, and thus how to save bandwidth cost is extremely important for them to supply service with minimized cost. We propose the Network Bandwidth Allocation (NBA) problem for cloud computing and formulate it as an integer programming model on a high level, with which more comprehensive and rigorous scientific studies become possible. We also show that the NBA problem captures some of the major cloud computing scenarios including the content delivery network (CDN), the live video delivery network (LVDN), the real-time communication network (RTCN), and the cloud wide area network (Cloud-WAN).

preprint2022arXiv

On construction of splitting contraction algorithms in a prediction-correction framework for separable convex optimization

In the past decade, we had developed a series of splitting contraction algorithms for separable convex optimization problems, at the root of the alternating direction method of multipliers. Convergence of these algorithms was studied under specific model-tailored conditions, while these conditions can be conceptually abstracted as two generic conditions when these algorithms are all unified as a prediction-correction framework. In this paper, in turn, we showcase a constructive way for specifying the generic convergence-guaranteeing conditions, via which new splitting contraction algorithms can be generated automatically. It becomes possible to design more application-tailored splitting contraction algorithms by specifying the prediction-correction framework, while proving their convergence is a routine.

preprint2022arXiv

The springback penalty for robust signal recovery

We propose a new penalty, the springback penalty, for constructing models to recover an unknown signal from incomplete and inaccurate measurements. Mathematically, the springback penalty is a weakly convex function. It bears various theoretical and computational advantages of both the benchmark convex $\ell_1$ penalty and many of its non-convex surrogates that have been well studied in the literature. We establish the exact and stable recovery theory for the recovery model using the springback penalty for both sparse and nearly sparse signals, respectively, and derive an easily implementable difference-of-convex algorithm. In particular, we show its theoretical superiority to some existing models with a sharper recovery bound for some scenarios where the level of measurement noise is large or the amount of measurements is limited. We also demonstrate its numerical robustness regardless of the varying coherence of the sensing matrix. The springback penalty is particularly favorable for the scenario where the incomplete and inaccurate measurements are collected by coherence-hidden or -static sensing hardware due to its theoretical guarantee of recovery with severe measurements, computational tractability, and numerical robustness for ill-conditioned sensing matrices.

preprint2022arXiv

Towards Tailored Models on Private AIoT Devices: Federated Direct Neural Architecture Search

Neural networks often encounter various stringent resource constraints while deploying on edge devices. To tackle these problems with less human efforts, automated machine learning becomes popular in finding various neural architectures that fit diverse Artificial Intelligence of Things (AIoT) scenarios. Recently, to prevent the leakage of private information while enable automated machine intelligence, there is an emerging trend to integrate federated learning and neural architecture search (NAS). Although promising as it may seem, the coupling of difficulties from both tenets makes the algorithm development quite challenging. In particular, how to efficiently search the optimal neural architecture directly from massive non-independent and identically distributed (non-IID) data among AIoT devices in a federated manner is a hard nut to crack. In this paper, to tackle this challenge, by leveraging the advances in ProxylessNAS, we propose a Federated Direct Neural Architecture Search (FDNAS) framework that allows for hardware-friendly NAS from non- IID data across devices. To further adapt to both various data distributions and different types of devices with heterogeneous embedded hardware platforms, inspired by meta-learning, a Cluster Federated Direct Neural Architecture Search (CFDNAS) framework is proposed to achieve device-aware NAS, in the sense that each device can learn a tailored deep learning model for its particular data distribution and hardware constraint. Extensive experiments on non-IID datasets have shown the state-of-the-art accuracy-efficiency trade-offs achieved by the proposed solution in the presence of both data and device heterogeneity.

preprint2021arXiv

Bilinear Optimal Control of an Advection-reaction-diffusion System

We consider the bilinear optimal control of an advection-reaction-diffusion system, where the control arises as the velocity field in the advection term. Such a problem is generally challenging from both theoretical analysis and algorithmic design perspectives mainly because the state variable depends nonlinearly on the control variable and an additional divergence-free constraint on the control is coupled together with the state equation. Mathematically, the proof of the existence of optimal solutions is delicate, and up to now, only some results are known for a few special cases where additional restrictions are imposed on the space dimension and the regularity of the control. We prove the existence of optimal controls and derive the first-order optimality conditions in general settings without any extra assumption. Computationally, the well-known conjugate gradient (CG) method can be applied conceptually. However, due to the additional divergence-free constraint on the control variable and the nonlinear relation between the state and control variables, it is challenging to compute the gradient and the optimal stepsize at each CG iteration, and thus nontrivial to implement the CG method. To address these issues, we advocate a fast inner preconditioned CG method to ensure the divergence-free constraint and an efficient inexactness strategy to determine an appropriate stepsize. An easily implementable nested CG method is thus proposed for solving such a complicated problem. For the numerical discretization, we combine finite difference methods for the time discretization and finite element methods for the space discretization. Efficiency of the proposed nested CG method is promisingly validated by the results of some preliminary numerical experiments.

preprint2020arXiv

A Generic First-Order Algorithmic Framework for Bi-Level Programming Beyond Lower-Level Singleton

In recent years, a variety of gradient-based first-order methods have been developed to solve bi-level optimization problems for learning applications. However, theoretical guarantees of these existing approaches heavily rely on the simplification that for each fixed upper-level variable, the lower-level solution must be a singleton (a.k.a., Lower-Level Singleton, LLS). In this work, we first design a counter-example to illustrate the invalidation of such LLS condition. Then by formulating BLPs from the view point of optimistic bi-level and aggregating hierarchical objective information, we establish Bi-level Descent Aggregation (BDA), a flexible and modularized algorithmic framework for generic bi-level optimization. Theoretically, we derive a new methodology to prove the convergence of BDA without the LLS condition. Our investigations also demonstrate that BDA is indeed compatible to a verify of particular first-order computation modules. Additionally, as an interesting byproduct, we also improve these conventional first-order bi-level schemes (under the LLS simplification). Particularly, we establish their convergences with weaker assumptions. Extensive experiments justify our theoretical results and demonstrate the superiority of the proposed BDA for different tasks, including hyper-parameter optimization and meta learning.

preprint2020arXiv

Implementation of the ADMM to Parabolic Optimal Control Problems with Control Constraints and Beyond

Parabolic optimal control problems with control constraints are generally challenging, from either theoretical analysis or algorithmic design perspectives. Conceptually, the well-known alternating direction method of multipliers (ADMM) can be directly applied to such a problem. An attractive advantage of this direct ADMM application is that the control constraint can be untied from the parabolic PDE constraint; these two inherently different constraints thus can be treated individually in iterations. At each iteration of the ADMM, the main computation is for solving an unconstrained parabolic optimal control problem. Because of its high dimensionality after discretization, the unconstrained parabolic optimal control problem at each iteration can be solved only inexactly by implementing certain numerical scheme internally and thus a two-layer nested iterative scheme is required. It then becomes important to find an easily implementable and efficient inexactness criterion to execute the internal iterations, and to prove the overall convergence rigorously for the resulting two-layer nested iterative scheme. To implement the ADMM efficiently, we propose an inexactness criterion that is independent of the mesh size of the involved discretization, and it can be executed automatically with no need to set empirically perceived constant accuracy a prior. The inexactness criterion turns out to allow us to solve the resulting unconstrained optimal control problems to medium or even low accuracy and thus saves computation significantly, yet convergence of the overall two-layer nested iterative scheme can be still guaranteed rigorously. Efficiency of this ADMM implementation is promisingly validated by preliminary numerical results. Our methodology can also be extended to a range of optimal control problems constrained by other linear PDEs such as elliptic equations and hyperbolic equations.

preprint2020arXiv

The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R

This paper describes an R package named flare, which implements a family of new high dimensional regression methods (LAD Lasso, SQRT Lasso, $\ell_q$ Lasso, and Dantzig selector) and their extensions to sparse precision matrix estimation (TIGER and CLIME). These methods exploit different nonsmooth loss functions to gain modeling flexibility, estimation robustness, and tuning insensitiveness. The developed solver is based on the alternating direction method of multipliers (ADMM). The package flare is coded in double precision C, and called from R by a user-friendly interface. The memory usage is optimized by using the sparse matrix output. The experiments show that flare is efficient and can scale up to large problems.

preprint2016arXiv

Convergence analysis of the direct extension of ADMM for multiple-block separable convex minimization

Recently, the alternating direction method of multipliers (ADMM) has found many efficient applications in various areas; and it has been shown that the convergence is not guaranteed when it is directly extended to the multiple-block case of separable convex minimization problems where there are $m\ge 3$ functions without coupled variables in the objective. This fact has given great impetus to investigate various conditions on both the model and the algorithm's parameter that can ensure the convergence of the direct extension of ADMM (abbreviated as "e-ADMM"). Despite some results under very strong conditions (e.g., at least $(m-1)$ functions should be strongly convex) that are applicable to the generic case with a general $m$, some others concentrate on the special case of $m=3$ under the relatively milder condition that only one function is assumed to be strongly convex. We focus on extending the convergence analysis from the case of $m=3$ to the more general case of $m\ge3$. That is, we show the convergence of e-ADMM for the case of $m\ge 3$ with the assumption of only $(m-2)$ functions being strongly convex; and establish its convergence rates in different scenarios such as the worst-case convergence rates measured by iteration complexity and the asymptotically linear convergence rate under stronger assumptions. Thus the convergence of e-ADMM for the general case of $m\ge 4$ is proved; this result seems to be still unknown even though it is intuitive given the known result of the case of $m=3$. Even for the special case of $m=3$, our convergence results turn out to be more general than the exiting results that are derived specifically for the case of $m=3$.

preprint2016arXiv

On the Optimal Linear Convergence Rate of a Generalized Proximal Point Algorithm

The proximal point algorithm (PPA) has been well studied in the literature. In particular, its linear convergence rate has been studied by Rockafellar in 1976 under certain condition. We consider a generalized PPA in the generic setting of finding a zero point of a maximal monotone operator, and show that the condition proposed by Rockafellar can also sufficiently ensure the linear convergence rate for this generalized PPA. Indeed we show that these linear convergence rates are optimal. Both the exact and inexact versions of this generalized PPA are discussed. The motivation to consider this generalized PPA is that it includes as special cases the relaxed versions of some splitting methods that are originated from PPA. Thus, linear convergence results of this generalized PPA can be used to better understand the convergence of some widely used algorithms in the literature. We focus on the particular convex minimization context and specify Rockafellar's condition to see how to ensure the linear convergence rate for some efficient numerical schemes, including the classical augmented Lagrangian method proposed by Hensen and Powell in 1969 and its relaxed version, the original alternating direction method of multipliers (ADMM) by Glowinski and Marrocco in 1975 and its relaxed version (i.e., the generalized ADMM by Eckstein and Bertsekas in 1992). Some refined conditions weaker than existing ones are proposed in these particular contexts.

preprint2015arXiv

Adaptive Primal-Dual Hybrid Gradient Methods for Saddle-Point Problems

The Primal-Dual hybrid gradient (PDHG) method is a powerful optimization scheme that breaks complex problems into simple sub-steps. Unfortunately, PDHG methods require the user to choose stepsize parameters, and the speed of convergence is highly sensitive to this choice. We introduce new adaptive PDHG schemes that automatically tune the stepsize parameters for fast convergence without user inputs. We prove rigorous convergence results for our methods, and identify the conditions required for convergence. We also develop practical implementations of adaptive schemes that formally satisfy the convergence requirements. Numerical experiments show that adaptive PDHG methods have advantages over non-adaptive implementations in terms of both efficiency and simplicity for the user.

Xiaoming Yuan

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

A General Descent Aggregation Framework for Gradient-based Bi-level Optimization

A Globally Convergent Proximal Newton-Type Method in Nonsmooth Convex Optimization

Difference of convex algorithms for bilevel programs with applications in hyperparameter selection

Network Bandwidth Allocation Problem For Cloud Computing

On construction of splitting contraction algorithms in a prediction-correction framework for separable convex optimization

The springback penalty for robust signal recovery

Towards Tailored Models on Private AIoT Devices: Federated Direct Neural Architecture Search

Bilinear Optimal Control of an Advection-reaction-diffusion System

A Generic First-Order Algorithmic Framework for Bi-Level Programming Beyond Lower-Level Singleton

Implementation of the ADMM to Parabolic Optimal Control Problems with Control Constraints and Beyond

The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R

Convergence analysis of the direct extension of ADMM for multiple-block separable convex minimization

On the Optimal Linear Convergence Rate of a Generalized Proximal Point Algorithm

Adaptive Primal-Dual Hybrid Gradient Methods for Saddle-Point Problems