Researcher profile

Xiaoming Yuan

Xiaoming Yuan contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2022arXiv

A General Descent Aggregation Framework for Gradient-based Bi-level Optimization

In recent years, a variety of gradient-based methods have been developed to solve Bi-Level Optimization (BLO) problems in machine learning and computer vision areas. However, the theoretical correctness and practical effectiveness of these existing approaches always rely on some restrictive conditions (e.g., Lower-Level Singleton, LLS), which could hardly be satisfied in real-world applications. Moreover, previous literature only proves theoretical results based on their specific iteration strategies, thus lack a general recipe to uniformly analyze the convergence behaviors of different gradient-based BLOs. In this work, we formulate BLOs from an optimistic bi-level viewpoint and establish a new gradient-based algorithmic framework, named Bi-level Descent Aggregation (BDA), to partially address the above issues. Specifically, BDA provides a modularized structure to hierarchically aggregate both the upper- and lower-level subproblems to generate our bi-level iterative dynamics. Theoretically, we establish a general convergence analysis template and derive a new proof recipe to investigate the essential theoretical properties of gradient-based BLO methods. Furthermore, this work systematically explores the convergence behavior of BDA in different optimization scenarios, i.e., considering various solution qualities (i.e., global/local/stationary solution) returned from solving approximation subproblems. Extensive experiments justify our theoretical results and demonstrate the superiority of the proposed algorithm for hyper-parameter optimization and meta-learning tasks. Source code is available at https://github.com/vis-opt-group/BDA.

preprint2022arXiv

A Globally Convergent Proximal Newton-Type Method in Nonsmooth Convex Optimization

The paper proposes and justifies a new algorithm of the proximal Newton type to solve a broad class of nonsmooth composite convex optimization problems without strong convexity assumptions. Based on advanced notions and techniques of variational analysis, we establish implementable results on the global convergence of the proposed algorithm as well as its local convergence with superlinear and quadratic rates. For certain structured problems, the obtained local convergence conditions do not require the local Lipschitz continuity of the corresponding Hessian mappings that is a crucial assumption used in the literature to ensure a superlinear convergence of other algorithms of the proximal Newton type. The conducted numerical experiments of solving the $l_1$ regularized logistic regression model illustrate the possibility of applying the proposed algorithm to deal with practically important problems.

preprint2022arXiv

Difference of convex algorithms for bilevel programs with applications in hyperparameter selection

In this paper, we present difference of convex algorithms for solving bilevel programs in which the upper level objective functions are difference of convex functions, and the lower level programs are fully convex. This nontrivial class of bilevel programs provides a powerful modelling framework for dealing with applications arising from hyperparameter selection in machine learning. Thanks to the full convexity of the lower level program, the value function of the lower level program turns out to be convex and hence the bilevel program can be reformulated as a difference of convex bilevel program. We propose two algorithms for solving the reformulated difference of convex program and show their convergence under very mild assumptions. Finally we conduct numerical experiments to a bilevel model of support vector machine classification.

preprint2022arXiv

Network Bandwidth Allocation Problem For Cloud Computing

Cloud computing enables ubiquitous, convenient, and on-demand network access to a shared pool of computing resources. Cloud computing technologies create tremendous commercial values in various areas, while many scientific challenges have arisen accordingly. The process of transmitting data through networks is characterized by some distinctive characteristics such as nonlinear, nonconvex and even noncontinuous cost functions generated by pricing schemes, periodically updated network topology, as well as replicable data within network nodes. Because of these characteristics, data transfer scheduling is a very challenging problem both engineeringly and scientifically. On the other hand, the cost for bandwidth is a major component of the operating cost for cloud providers, and thus how to save bandwidth cost is extremely important for them to supply service with minimized cost. We propose the Network Bandwidth Allocation (NBA) problem for cloud computing and formulate it as an integer programming model on a high level, with which more comprehensive and rigorous scientific studies become possible. We also show that the NBA problem captures some of the major cloud computing scenarios including the content delivery network (CDN), the live video delivery network (LVDN), the real-time communication network (RTCN), and the cloud wide area network (Cloud-WAN).

preprint2022arXiv

On construction of splitting contraction algorithms in a prediction-correction framework for separable convex optimization

In the past decade, we had developed a series of splitting contraction algorithms for separable convex optimization problems, at the root of the alternating direction method of multipliers. Convergence of these algorithms was studied under specific model-tailored conditions, while these conditions can be conceptually abstracted as two generic conditions when these algorithms are all unified as a prediction-correction framework. In this paper, in turn, we showcase a constructive way for specifying the generic convergence-guaranteeing conditions, via which new splitting contraction algorithms can be generated automatically. It becomes possible to design more application-tailored splitting contraction algorithms by specifying the prediction-correction framework, while proving their convergence is a routine.

preprint2022arXiv

The springback penalty for robust signal recovery

We propose a new penalty, the springback penalty, for constructing models to recover an unknown signal from incomplete and inaccurate measurements. Mathematically, the springback penalty is a weakly convex function. It bears various theoretical and computational advantages of both the benchmark convex $\ell_1$ penalty and many of its non-convex surrogates that have been well studied in the literature. We establish the exact and stable recovery theory for the recovery model using the springback penalty for both sparse and nearly sparse signals, respectively, and derive an easily implementable difference-of-convex algorithm. In particular, we show its theoretical superiority to some existing models with a sharper recovery bound for some scenarios where the level of measurement noise is large or the amount of measurements is limited. We also demonstrate its numerical robustness regardless of the varying coherence of the sensing matrix. The springback penalty is particularly favorable for the scenario where the incomplete and inaccurate measurements are collected by coherence-hidden or -static sensing hardware due to its theoretical guarantee of recovery with severe measurements, computational tractability, and numerical robustness for ill-conditioned sensing matrices.

preprint2022arXiv

Towards Tailored Models on Private AIoT Devices: Federated Direct Neural Architecture Search

Neural networks often encounter various stringent resource constraints while deploying on edge devices. To tackle these problems with less human efforts, automated machine learning becomes popular in finding various neural architectures that fit diverse Artificial Intelligence of Things (AIoT) scenarios. Recently, to prevent the leakage of private information while enable automated machine intelligence, there is an emerging trend to integrate federated learning and neural architecture search (NAS). Although promising as it may seem, the coupling of difficulties from both tenets makes the algorithm development quite challenging. In particular, how to efficiently search the optimal neural architecture directly from massive non-independent and identically distributed (non-IID) data among AIoT devices in a federated manner is a hard nut to crack. In this paper, to tackle this challenge, by leveraging the advances in ProxylessNAS, we propose a Federated Direct Neural Architecture Search (FDNAS) framework that allows for hardware-friendly NAS from non- IID data across devices. To further adapt to both various data distributions and different types of devices with heterogeneous embedded hardware platforms, inspired by meta-learning, a Cluster Federated Direct Neural Architecture Search (CFDNAS) framework is proposed to achieve device-aware NAS, in the sense that each device can learn a tailored deep learning model for its particular data distribution and hardware constraint. Extensive experiments on non-IID datasets have shown the state-of-the-art accuracy-efficiency trade-offs achieved by the proposed solution in the presence of both data and device heterogeneity.

preprint2021arXiv

Bilinear Optimal Control of an Advection-reaction-diffusion System

We consider the bilinear optimal control of an advection-reaction-diffusion system, where the control arises as the velocity field in the advection term. Such a problem is generally challenging from both theoretical analysis and algorithmic design perspectives mainly because the state variable depends nonlinearly on the control variable and an additional divergence-free constraint on the control is coupled together with the state equation. Mathematically, the proof of the existence of optimal solutions is delicate, and up to now, only some results are known for a few special cases where additional restrictions are imposed on the space dimension and the regularity of the control. We prove the existence of optimal controls and derive the first-order optimality conditions in general settings without any extra assumption. Computationally, the well-known conjugate gradient (CG) method can be applied conceptually. However, due to the additional divergence-free constraint on the control variable and the nonlinear relation between the state and control variables, it is challenging to compute the gradient and the optimal stepsize at each CG iteration, and thus nontrivial to implement the CG method. To address these issues, we advocate a fast inner preconditioned CG method to ensure the divergence-free constraint and an efficient inexactness strategy to determine an appropriate stepsize. An easily implementable nested CG method is thus proposed for solving such a complicated problem. For the numerical discretization, we combine finite difference methods for the time discretization and finite element methods for the space discretization. Efficiency of the proposed nested CG method is promisingly validated by the results of some preliminary numerical experiments.

preprint2020arXiv

A Generic First-Order Algorithmic Framework for Bi-Level Programming Beyond Lower-Level Singleton

In recent years, a variety of gradient-based first-order methods have been developed to solve bi-level optimization problems for learning applications. However, theoretical guarantees of these existing approaches heavily rely on the simplification that for each fixed upper-level variable, the lower-level solution must be a singleton (a.k.a., Lower-Level Singleton, LLS). In this work, we first design a counter-example to illustrate the invalidation of such LLS condition. Then by formulating BLPs from the view point of optimistic bi-level and aggregating hierarchical objective information, we establish Bi-level Descent Aggregation (BDA), a flexible and modularized algorithmic framework for generic bi-level optimization. Theoretically, we derive a new methodology to prove the convergence of BDA without the LLS condition. Our investigations also demonstrate that BDA is indeed compatible to a verify of particular first-order computation modules. Additionally, as an interesting byproduct, we also improve these conventional first-order bi-level schemes (under the LLS simplification). Particularly, we establish their convergences with weaker assumptions. Extensive experiments justify our theoretical results and demonstrate the superiority of the proposed BDA for different tasks, including hyper-parameter optimization and meta learning.

preprint2020arXiv

Implementation of the ADMM to Parabolic Optimal Control Problems with Control Constraints and Beyond

Parabolic optimal control problems with control constraints are generally challenging, from either theoretical analysis or algorithmic design perspectives. Conceptually, the well-known alternating direction method of multipliers (ADMM) can be directly applied to such a problem. An attractive advantage of this direct ADMM application is that the control constraint can be untied from the parabolic PDE constraint; these two inherently different constraints thus can be treated individually in iterations. At each iteration of the ADMM, the main computation is for solving an unconstrained parabolic optimal control problem. Because of its high dimensionality after discretization, the unconstrained parabolic optimal control problem at each iteration can be solved only inexactly by implementing certain numerical scheme internally and thus a two-layer nested iterative scheme is required. It then becomes important to find an easily implementable and efficient inexactness criterion to execute the internal iterations, and to prove the overall convergence rigorously for the resulting two-layer nested iterative scheme. To implement the ADMM efficiently, we propose an inexactness criterion that is independent of the mesh size of the involved discretization, and it can be executed automatically with no need to set empirically perceived constant accuracy a prior. The inexactness criterion turns out to allow us to solve the resulting unconstrained optimal control problems to medium or even low accuracy and thus saves computation significantly, yet convergence of the overall two-layer nested iterative scheme can be still guaranteed rigorously. Efficiency of this ADMM implementation is promisingly validated by preliminary numerical results. Our methodology can also be extended to a range of optimal control problems constrained by other linear PDEs such as elliptic equations and hyperbolic equations.

preprint2020arXiv

The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R

This paper describes an R package named flare, which implements a family of new high dimensional regression methods (LAD Lasso, SQRT Lasso, $\ell_q$ Lasso, and Dantzig selector) and their extensions to sparse precision matrix estimation (TIGER and CLIME). These methods exploit different nonsmooth loss functions to gain modeling flexibility, estimation robustness, and tuning insensitiveness. The developed solver is based on the alternating direction method of multipliers (ADMM). The package flare is coded in double precision C, and called from R by a user-friendly interface. The memory usage is optimized by using the sparse matrix output. The experiments show that flare is efficient and can scale up to large problems.