Source author record

Mingyi Hong

Mingyi Hong appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Machine Learning math.OC Systems and Control eess.SP Cryptography and Security Distributed, Parallel, and Cluster Computing eess.SY Artificial Intelligence Computer Vision Computer Science and Game Theory eess.IV physics.med-ph math.NA math.ST Methodology Numerical Analysis Statistics Theory

Catalog footprint

What is connected

67works

19topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

CUDAHercules: Benchmarking Hardware-Aware Expert-level CUDA Optimization for LLMs

Large language models show promise for automated CUDA programming, however even the strongest coding models (e.g., Claude-Opus-4.6) may still fall short of expert-level, architecture-aware optimization. We introduce CUDAHercules, a benchmark that evaluates generated CUDA against end-to-end human-expert SOTA systems. It spans single kernels, module-level operators, full applications, and unsolved challenge tasks across Ampere, Hopper, and Blackwell GPUs, with end-to-end tasks gated by domain-specific semantic validators. Evaluating models such as Claude-Opus-4.6 and GPT-5.4 shows a large gap between runnable CUDA and expert CUDA engineering: models often compile and pass tests, but rarely recover the optimization strategies needed to match expert performance. Application semantics further reduce success, and iterative or tool-augmented feedback can improve correctness while drifting toward slow fallback implementations. These results show that automated CUDA programming remains far from fully solved and requires stronger hardware reasoning, better tool use, and training objectives that connect code understanding to hardware architecture-grounded intelligence.

preprint2026arXiv

Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

Muon is a matrix-aware optimizer that leverages Newton-Schulz (NS) iterations to enforce spectral gradient orthogonalization by driving all singular values of the momentum matrix toward 1. While this uniform spectral whitening enhances exploration and outperforms AdamW in LLM pretraining, we show it could lead to fundamental limitations beyond pretraining in two regimes: (i) cross-modality vision-language-action (VLA) training, where inherently low-rank action-module gradients cause amplification of noisy tail directions, and (ii) reinforcement learning with verifiable rewards (RLVR), where low-SNR gradients and the need to preserve per-head specialization from prior training make whitening unstable. To address these challenges, we propose Pion, a drop-in replacement for Muon that preserves its computational efficiency while replacing uniform spectral whitening with a two-stage Promotion+Suppression mechanism, which we call the high-pass NS iteration. This design induces a sharp spectral high-pass effect, anchoring dominant singular values at 1 while suppressing noisy tail components toward 0, with controllable filter strength. To preserve pretrained per-head heterogeneity, Pion also supports a per-head mode that applies updates independently across attention heads via a simple reshape, at no extra cost. In VLA training on LIBERO and LIBERO-Plus, Pion consistently outperforms both baselines across l_1-regression (VLA-Adapter) and flow-matching (VLANeXt) architectures, e.g., reaching 100% success rate on LIBERO Object after 1,500 training steps with VLA-Adapter, vs. 97.0% for Muon and only 32.2% for AdamW. The advantage of Pion further extends to a real Franka Research 3 robot with a pi_0.5 backbone under the DROID setup on three grasp-and-place tasks. In RLVR post-training on Qwen3-1.7B/4B with GRPO and GMPO, Pion also outperforms AdamW on MATH and GSM8K while Muon collapses to zero.

preprint2026arXiv

Revisiting the Adam-SGD Gap in LLM Pre-Training: The Role of Large Effective Learning Rates

It is widely believed that stochastic gradient descent (SGD) performs significantly worse than adaptive optimizers such as Adam in pre-training Large Language Models (LLMs). Yet the underlying reason for this gap remains unclear. In this work, we attribute a large part of the discrepancy to SGD's inability to sustain learning rates comparable to Adam's much larger effective learning rates. Through empirical and theoretical analysis of LLM pre-training dynamics, we identify that training is characterized by small gradient norms and large weight-to-gradient ratios, an effect that becomes more pronounced with larger batch sizes typical in pre-training, necessitating such large effective learning rates. However, we find that output-layer gradient magnitudes become highly uneven across token classes, and that large gradient spikes frequently occur during training. Together, these effects severely restrict the admissible learning rate of SGD. Guided by this understanding, we show that simple clipping mechanisms that stabilize SGD at large learning rates enable it to recover most of Adam's performance. In our large-scale experiments, the validation loss gap between large-learning-rate SGD and Adam shrinks from more than 50% to only about 3.5% when pre-training a 1B-parameter LLaMA model with a 1M-token batch size.

preprint2024arXiv

Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate

Second-order optimization methods, such as cubic regularized Newton methods, are known for their rapid convergence rates; nevertheless, they become impractical in high-dimensional problems due to their substantial memory requirements and computational costs. One promising approach is to execute second-order updates within a lower-dimensional subspace, giving rise to subspace second-order methods. However, the majority of existing subspace second-order methods randomly select subspaces, consequently resulting in slower convergence rates depending on the problem's dimension $d$. In this paper, we introduce a novel subspace cubic regularized Newton method that achieves a dimension-independent global convergence rate of ${O}\left(\frac{1}{mk}+\frac{1}{k^2}\right)$ for solving convex optimization problems. Here, $m$ represents the subspace dimension, which can be significantly smaller than $d$. Instead of adopting a random subspace, our primary innovation involves performing the cubic regularized Newton update within the Krylov subspace associated with the Hessian and the gradient of the objective function. This result marks the first instance of a dimension-independent convergence rate for a subspace second-order method. Furthermore, when specific spectral conditions of the Hessian are met, our method recovers the convergence rate of a full-dimensional cubic regularized Newton method. Numerical experiments show our method converges faster than existing random subspace methods, especially for high-dimensional problems.

preprint2022arXiv

A Framework for Understanding Model Extraction Attack and Defense

The privacy of machine learning models has become a significant concern in many emerging Machine-Learning-as-a-Service applications, where prediction services based on well-trained models are offered to users via pay-per-query. The lack of a defense mechanism can impose a high risk on the privacy of the server's model since an adversary could efficiently steal the model by querying only a few `good' data points. The interplay between a server's defense and an adversary's attack inevitably leads to an arms race dilemma, as commonly seen in Adversarial Machine Learning. To study the fundamental tradeoffs between model utility from a benign user's view and privacy from an adversary's view, we develop new metrics to quantify such tradeoffs, analyze their theoretical properties, and develop an optimization problem to understand the optimal adversarial attack and defense strategies. The developed concepts and theory match the empirical findings on the `equilibrium' between privacy and utility. In terms of optimization, the key ingredient that enables our results is a unified representation of the attack-defense problem as a min-max bi-level problem. The developed results will be demonstrated by examples and experiments.

preprint2022arXiv

A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic

This paper analyzes a two-timescale stochastic algorithm framework for bilevel optimization. Bilevel optimization is a class of problems which exhibit a two-level structure, and its goal is to minimize an outer objective function with variables which are constrained to be the optimal solution to an (inner) optimization problem. We consider the case when the inner problem is unconstrained and strongly convex, while the outer problem is constrained and has a smooth objective function. We propose a two-timescale stochastic approximation (TTSA) algorithm for tackling such a bilevel problem. In the algorithm, a stochastic gradient update with a larger step size is used for the inner problem, while a projected stochastic gradient update with a smaller step size is used for the outer problem. We analyze the convergence rates for the TTSA algorithm under various settings: when the outer problem is strongly convex (resp.~weakly convex), the TTSA algorithm finds an $\mathcal{O}(K^{-2/3})$-optimal (resp.~$\mathcal{O}(K^{-2/5})$-stationary) solution, where $K$ is the total iteration number. As an application, we show that a two-timescale natural actor-critic proximal policy optimization algorithm can be viewed as a special case of our TTSA framework. Importantly, the natural actor-critic algorithm is shown to converge at a rate of $\mathcal{O}(K^{-1/4})$ in terms of the gap in expected discounted reward compared to a global optimal policy.

preprint2022arXiv

Deep Spectrum Cartography: Completing Radio Map Tensors Using Learned Neural Models

The spectrum cartography (SC) technique constructs multi-domain (e.g., frequency, space, and time) radio frequency (RF) maps from limited measurements, which can be viewed as an ill-posed tensor completion problem. Model-based cartography techniques often rely on handcrafted priors (e.g., sparsity, smoothness and low-rank structures) for the completion task. Such priors may be inadequate to capture the essence of complex wireless environments -- especially when severe shadowing happens. To circumvent such challenges, offline-trained deep neural models of radio maps were considered for SC, as deep neural networks (DNNs) are able to "learn" intricate underlying structures from data. However, such deep learning (DL)-based SC approaches encounter serious challenges in both off-line model learning (training) and completion (generalization), possibly because the latent state space for generating the radio maps is prohibitively large. In this work, an emitter radio map disaggregation-based approach is proposed, under which only individual emitters' radio maps are modeled by DNNs. This way, the learning and generalization challenges can both be substantially alleviated. Using the learned DNNs, a fast nonnegative matrix factorization-based two-stage SC method and a performance-enhanced iterative optimization algorithm are proposed. Theoretical aspects -- such as recoverability of the radio tensor, sample complexity, and noise robustness -- under the proposed framework are characterized, and such theoretical properties have been elusive in the context of DL-based radio tensor completion. Experiments using synthetic and real-data from indoor and heavily shadowed environments are employed to showcase the effectiveness of the proposed methods.

preprint2022arXiv

Distributed Adversarial Training to Robustify Deep Neural Networks at Scale

Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification. To defend against such attacks, an effective and popular approach, known as adversarial training (AT), has been shown to mitigate the negative impact of adversarial attacks by virtue of a min-max robust training method. While effective, it remains unclear whether it can successfully be adapted to the distributed learning context. The power of distributed optimization over multiple machines enables us to scale up robust training over large models and datasets. Spurred by that, we propose distributed adversarial training (DAT), a large-batch adversarial training framework implemented over multiple machines. We show that DAT is general, which supports training over labeled and unlabeled data, multiple types of attack generation methods, and gradient compression operations favored for distributed optimization. Theoretically, we provide, under standard conditions in the optimization theory, the convergence rate of DAT to the first-order stationary points in general non-convex settings. Empirically, we demonstrate that DAT either matches or outperforms state-of-the-art robust accuracies and achieves a graceful training speedup (e.g., on ResNet-50 under ImageNet). Codes are available at https://github.com/dat-2022/dat.

preprint2022arXiv

Dynamic Differential-Privacy Preserving SGD

The vanilla Differentially-Private Stochastic Gradient Descent (DP-SGD), including DP-Adam and other variants, ensures the privacy of training data by uniformly distributing privacy costs across training steps. The equivalent privacy costs controlled by maintaining the same gradient clipping thresholds and noise powers in each step result in unstable updates and a lower model accuracy when compared to the non-DP counterpart. In this paper, we propose the dynamic DP-SGD (along with dynamic DP-Adam, and others) to reduce the performance loss gap while maintaining privacy by dynamically adjusting clipping thresholds and noise powers while adhering to a total privacy budget constraint. Extensive experiments on a variety of deep learning tasks, including image classification, natural language processing, and federated learning, demonstrate that the proposed dynamic DP-SGD algorithm stabilizes updates and, as a result, significantly improves model accuracy in the strong privacy protection region when compared to the vanilla DP-SGD. We also conduct theoretical analysis to better understand the privacy-utility trade-off with dynamic DP-SGD, as well as to learn why Dynamic DP-SGD can outperform vanilla DP-SGD.

preprint2022arXiv

How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

The lack of adversarial robustness has been recognized as an important issue for state-of-the-art machine learning (ML) models, e.g., deep neural networks (DNNs). Thereby, robustifying ML models against adversarial attacks is now a major focus of research. However, nearly all existing defense methods, particularly for robust training, made the white-box assumption that the defender has the access to the details of an ML model (or its surrogate alternatives if available), e.g., its architectures and parameters. Beyond existing works, in this paper we aim to address the problem of black-box defense: How to robustify a black-box model using just input queries and output feedback? Such a problem arises in practical scenarios, where the owner of the predictive model is reluctant to share model information in order to preserve privacy. To this end, we propose a general notion of defensive operation that can be applied to black-box models, and design it through the lens of denoised smoothing (DS), a first-order (FO) certified defense technique. To allow the design of merely using model queries, we further integrate DS with the zeroth-order (gradient-free) optimization. However, a direct implementation of zeroth-order (ZO) optimization suffers a high variance of gradient estimates, and thus leads to ineffective defense. To tackle this problem, we next propose to prepend an autoencoder (AE) to a given (black-box) model so that DS can be trained using variance-reduced ZO optimization. We term the eventual defense as ZO-AE-DS. In practice, we empirically show that ZO-AE- DS can achieve improved accuracy, certified robustness, and query complexity over existing baselines. And the effectiveness of our approach is justified under both image classification and image reconstruction tasks. Codes are available at https://github.com/damon-demon/Black-Box-Defense.

preprint2022arXiv

Zeroth-Order SciML: Non-intrusive Integration of Scientific Software with Deep Learning

Using deep learning (DL) to accelerate and/or improve scientific workflows can yield discoveries that are otherwise impossible. Unfortunately, DL models have yielded limited success in complex scientific domains due to large data requirements. In this work, we propose to overcome this issue by integrating the abundance of scientific knowledge sources (SKS) with the DL training process. Existing knowledge integration approaches are limited to using differentiable knowledge source to be compatible with first-order DL training paradigm. In contrast, our proposed approach treats knowledge source as a black-box in turn allowing to integrate virtually any knowledge source. To enable an end-to-end training of SKS-coupled-DL, we propose to use zeroth-order optimization (ZOO) based gradient-free training schemes, which is non-intrusive, i.e., does not require making any changes to the SKS. We evaluate the performance of our ZOO training scheme on two real-world material science applications. We show that proposed scheme is able to effectively integrate scientific knowledge with DL training and is able to outperform purely data-driven model for data-limited scientific applications. We also discuss some limitations of the proposed method and mention potentially worthwhile future directions.

preprint2021arXiv

Decentralized Riemannian Gradient Descent on the Stiefel Manifold

We consider a distributed non-convex optimization where a network of agents aims at minimizing a global function over the Stiefel manifold. The global function is represented as a finite sum of smooth local functions, where each local function is associated with one agent and agents communicate with each other over an undirected connected graph. The problem is non-convex as local functions are possibly non-convex (but smooth) and the Steifel manifold is a non-convex set. We present a decentralized Riemannian stochastic gradient method (DRSGD) with the convergence rate of $\mathcal{O}(1/\sqrt{K})$ to a stationary point. To have exact convergence with constant stepsize, we also propose a decentralized Riemannian gradient tracking algorithm (DRGTA) with the convergence rate of $\mathcal{O}(1/K)$ to a stationary point. We use multi-step consensus to preserve the iteration in the local (consensus) region. DRGTA is the first decentralized algorithm with exact convergence for distributed optimization on Stiefel manifold.

preprint2021arXiv

Hybrid Federated Learning: Algorithms and Implementation

Federated learning (FL) is a recently proposed distributed machine learning paradigm dealing with distributed and private data sets. Based on the data partition pattern, FL is often categorized into horizontal, vertical, and hybrid settings. Despite the fact that many works have been developed for the first two approaches, the hybrid FL setting (which deals with partially overlapped feature space and sample space) remains less explored, though this setting is extremely important in practice. In this paper, we first set up a new model-matching-based problem formulation for hybrid FL, then propose an efficient algorithm that can collaboratively train the global and local models to deal with full and partial featured data. We conduct numerical experiments on the multi-view ModelNet40 data set to validate the performance of the proposed algorithm. To the best of our knowledge, this is the first formulation and algorithm developed for the hybrid FL.

preprint2021arXiv

Learning to Continuously Optimize Wireless Resource in a Dynamic Environment: A Bilevel Optimization Perspective

There has been a growing interest in developing data-driven, and in particular deep neural network (DNN) based methods for modern communication tasks. For a few popular tasks such as power control, beamforming, and MIMO detection, these methods achieve state-of-the-art performance while requiring less computational efforts, less resources for acquiring channel state information (CSI), etc. However, it is often challenging for these approaches to learn in a dynamic environment. This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment. Specifically, we consider an ``episodically dynamic" setting where the environment statistics change in ``episodes", and in each episode the environment is stationary. We propose to build the notion of continual learning (CL) into wireless system design, so that the learning model can incrementally adapt to the new episodes, {\it without forgetting} knowledge learned from the previous episodes. Our design is based on a novel bilevel optimization formulation which ensures certain ``fairness" across different data samples. We demonstrate the effectiveness of the CL approach by integrating it with two popular DNN based models for power control and beamforming, respectively, and testing using both synthetic and ray-tracing based data sets. These numerical results show that the proposed CL approach is not only able to adapt to the new scenarios quickly and seamlessly, but importantly, it also maintains high performance over the previously encountered scenarios as well.

preprint2021arXiv

On Instabilities of Conventional Multi-Coil MRI Reconstruction to Small Adverserial Perturbations

Although deep learning (DL) has received much attention in accelerated MRI, recent studies suggest small perturbations may lead to instabilities in DL-based reconstructions, leading to concern for their clinical application. However, these works focus on single-coil acquisitions, which is not practical. We investigate instabilities caused by small adversarial attacks for multi-coil acquisitions. Our results suggest that, parallel imaging and multi-coil CS exhibit considerable instabilities against small adversarial perturbations.

preprint2021arXiv

On the Local Linear Rate of Consensus on the Stiefel Manifold

We study the convergence properties of Riemannian gradient method for solving the consensus problem (for an undirected connected graph) over the Stiefel manifold. The Stiefel manifold is a non-convex set and the standard notion of averaging in the Euclidean space does not work for this problem. We propose Distributed Riemannian Consensus on Stiefel Manifold (DRCS) and prove that it enjoys a local linear convergence rate to global consensus. More importantly, this local rate asymptotically scales with the second largest singular value of the communication matrix, which is on par with the well-known rate in the Euclidean space. To the best of our knowledge, this is the first work showing the equality of the two rates. The main technical challenges include (i) developing a Riemannian restricted secant inequality for convergence analysis, and (ii) to identify the conditions (e.g., suitable step-size and initialization) under which the algorithm always stays in the local region.

preprint2021arXiv

Online Proximal-ADMM For Time-varying Constrained Convex Optimization

This paper considers a convex optimization problem with cost and constraints that evolve over time. The function to be minimized is strongly convex and possibly non-differentiable, and variables are coupled through linear constraints. In this setting, the paper proposes an online algorithm based on the alternating direction method of multipliers (ADMM), to track the optimal solution trajectory of the time-varying problem; in particular, the proposed algorithm consists of a primal proximal gradient descent step and an appropriately perturbed dual ascent step. The paper derives tracking results, asymptotic bounds, and linear convergence results. The proposed algorithm is then specialized to a multi-area power grid optimization problem, and our numerical results verify the desired properties.

preprint2021arXiv

Stochastic Mirror Descent for Low-Rank Tensor Decomposition Under Non-Euclidean Losses

This work considers low-rank canonical polyadic decomposition (CPD) under a class of non-Euclidean loss functions that frequently arise in statistical machine learning and signal processing. These loss functions are often used for certain types of tensor data, e.g., count and binary tensors, where the least squares loss is considered unnatural.Compared to the least squares loss, the non-Euclidean losses are generally more challenging to handle. Non-Euclidean CPD has attracted considerable interests and a number of prior works exist. However, pressing computational and theoretical challenges, such as scalability and convergence issues, still remain. This work offers a unified stochastic algorithmic framework for large-scale CPD decomposition under a variety of non-Euclidean loss functions. Our key contribution lies in a tensor fiber sampling strategy-based flexible stochastic mirror descent framework. Leveraging the sampling scheme and the multilinear algebraic structure of low-rank tensors, the proposed lightweight algorithm ensures global convergence to a stationary point under reasonable conditions. Numerical results show that our framework attains promising non-Euclidean CPD performance. The proposed framework also exhibits substantial computational savings compared to state-of-the-art methods.

preprint2020arXiv

A Communication Efficient Collaborative Learning Framework for Distributed Features

We introduce a collaborative learning framework allowing multiple parties having different sets of attributes about the same user to jointly build models without exposing their raw data or model parameters. In particular, we propose a Federated Stochastic Block Coordinate Descent (FedBCD) algorithm, in which each party conducts multiple local updates before each communication to effectively reduce the number of communication rounds among parties, a principal bottleneck for collaborative learning problems. We analyze theoretically the impact of the number of local updates and show that when the batch size, sample size, and the local iterations are selected appropriately, within $T$ iterations, the algorithm performs $\mathcal{O}(\sqrt{T})$ communication rounds and achieves some $\mathcal{O}(1/\sqrt{T})$ accuracy (measured by the average of the gradient norm squared). The approach is supported by our empirical evaluations on a variety of tasks and datasets, demonstrating advantages over stochastic gradient descent (SGD) approaches.

preprint2020arXiv

Dense Recurrent Neural Networks for Accelerated MRI: History-Cognizant Unrolling of Optimization Algorithms

Inverse problems for accelerated MRI typically incorporate domain-specific knowledge about the forward encoding operator in a regularized reconstruction framework. Recently physics-driven deep learning (DL) methods have been proposed to use neural networks for data-driven regularization. These methods unroll iterative optimization algorithms to solve the inverse problem objective function, by alternating between domain-specific data consistency and data-driven regularization via neural networks. The whole unrolled network is then trained end-to-end to learn the parameters of the network. Due to simplicity of data consistency updates with gradient descent steps, proximal gradient descent (PGD) is a common approach to unroll physics-driven DL reconstruction methods. However, PGD methods have slow convergence rates, necessitating a higher number of unrolled iterations, leading to memory issues in training and slower reconstruction times in testing. Inspired by efficient variants of PGD methods that use a history of the previous iterates, we propose a history-cognizant unrolling of the optimization algorithm with dense connections across iterations for improved performance. In our approach, the gradient descent steps are calculated at a trainable combination of the outputs of all the previous regularization units. We also apply this idea to unrolling variable splitting methods with quadratic relaxation. Our results in reconstruction of the fastMRI knee dataset show that the proposed history-cognizant approach reduces residual aliasing artifacts compared to its conventional unrolled counterpart without requiring extra computational power or increasing reconstruction time.

preprint2020arXiv

Distributed Learning in the Non-Convex World: From Batch to Streaming Data, and Beyond

Distributed learning has become a critical enabler of the massively connected world envisioned by many. This article discusses four key elements of scalable distributed processing and real-time intelligence --- problems, data, communication and computation. Our aim is to provide a fresh and unique perspective about how these elements should work together in an effective and coherent manner. In particular, we {provide a selective review} about the recent techniques developed for optimizing non-convex models (i.e., problem classes), processing batch and streaming data (i.e., data types), over the networks in a distributed manner (i.e., communication and computation paradigm). We describe the intuitions and connections behind a core set of popular distributed algorithms, emphasizing how to trade off between computation and communication costs. Practical issues and future research directions will also be discussed.

preprint2020arXiv

Generalization Bounds for Stochastic Saddle Point Problems

This paper studies the generalization bounds for the empirical saddle point (ESP) solution to stochastic saddle point (SSP) problems. For SSP with Lipschitz continuous and strongly convex-strongly concave objective functions, we establish an $\mathcal{O}(1/n)$ generalization bound by using a uniform stability argument. We also provide generalization bounds under a variety of assumptions, including the cases without strong convexity and without bounded domains. We illustrate our results in two examples: batch policy learning in Markov decision process, and mixed strategy Nash equilibrium estimation for stochastic games. In each of these examples, we show that a regularized ESP solution enjoys a near-optimal sample complexity. To the best of our knowledge, this is the first set of results on the generalization theory of ESP.

preprint2020arXiv

Imitation Privacy

In recent years, there have been many cloud-based machine learning services, where well-trained models are provided to users on a pay-per-query scheme through a prediction API. The emergence of these services motivates this work, where we will develop a general notion of model privacy named imitation privacy. We show the broad applicability of imitation privacy in classical query-response MLaaS scenarios and new multi-organizational learning scenarios. We also exemplify the fundamental difference between imitation privacy and the usual data-level privacy.

preprint2020arXiv

Joint Channel Assignment and Power Allocation for Multi-UAV Communication

Unmanned aerial vehicle (UAV) swarm has emerged as a promising novel paradigm to achieve better coverage and higher capacity for future wireless network by exploiting the more favorable line-of-sight (LoS) propagation. To reap the potential gains of UAV swarm, the remote control signal sent by ground control unit (GCU) is essential, whereas the control signal quality are susceptible in practice due to the effect of the adjacent channel interference (ACI) and the external interference (EI) from radiation sources distributed across the region. To tackle these challenges, this paper considers priority-aware resource coordination in a multi-UAV communication system, where multiple UAVs are controlled by a GCU to perform certain tasks with a pre-defined trajectory. Specifically, we maximize the minimum signal-to-interference-plus-noise ratio (SINR) among all the UAVs by jointly optimizing channel assignment and power allocation strategy under stringent resource availability constraints. According to the intensity of ACI, we consider the corresponding problem in two scenarios, i.e., Null-ACI and ACI systems. By virtue of the particular problem structure in Null-ACI case, we first recast the formulation into an equivalent yet more tractable form and obtain the global optimal solution via Hungarian algorithm. For general ACI systems, we develop an efficient iterative algorithm for its solution based on the smooth approximation and alternating optimization methods. Extensive simulation results demonstrate that the proposed algorithms can significantly enhance the minimum SINR among all the UAVs and adapt the allocation of communication resources to diverse mission priority.

preprint2020arXiv

On the Divergence of Decentralized Non-Convex Optimization

We study a generic class of decentralized algorithms in which $N$ agents jointly optimize the non-convex objective $f(u):=1/N\sum_{i=1}^{N}f_i(u)$, while only communicating with their neighbors. This class of problems has become popular in modeling many signal processing and machine learning applications, and many efficient algorithms have been proposed. However, by constructing some counter-examples, we show that when certain local Lipschitz conditions (LLC) on the local function gradient $\nabla f_i$'s are not satisfied, most of the existing decentralized algorithms diverge, even if the global Lipschitz condition (GLC) is satisfied, where the sum function $f$ has Lipschitz gradient. This observation raises an important open question: How to design decentralized algorithms when the LLC, or even the GLC, is not satisfied? To address the above question, we design a first-order algorithm called Multi-stage gradient tracking algorithm (MAGENTA), which is capable of computing stationary solutions with neither the LLC nor the GLC. In particular, we show that the proposed algorithm converges sublinearly to certain $ε$-stationary solution, where the precise rate depends on various algorithmic and problem parameters. In particular, if the local function $f_i$'s are $Q$th order polynomials, then the rate becomes $\mathcal{O}(1/ε^{Q-1})$. Such a rate is tight for the special case of $Q=2$ where each $f_i$ satisfies LLC. To our knowledge, this is the first attempt that studies decentralized non-convex optimization problems with neither the LLC nor the GLC.

preprint2020arXiv

Private Stochastic Non-Convex Optimization: Adaptive Algorithms and Tighter Generalization Bounds

We study differentially private (DP) algorithms for stochastic non-convex optimization. In this problem, the goal is to minimize the population loss over a $p$-dimensional space given $n$ i.i.d. samples drawn from a distribution. We improve upon the population gradient bound of ${\sqrt{p}}/{\sqrt{n}}$ from prior work and obtain a sharper rate of $\sqrt[4]{p}/\sqrt{n}$. We obtain this rate by providing the first analyses on a collection of private gradient-based methods, including adaptive algorithms DP RMSProp and DP Adam. Our proof technique leverages the connection between differential privacy and adaptive data analysis to bound gradient estimation error at every iterate, which circumvents the worse generalization bound from the standard uniform convergence argument. Finally, we evaluate the proposed algorithms on two popular deep learning tasks and demonstrate the empirical advantages of DP adaptive gradient methods over standard DP SGD.

preprint2019arXiv

Distributed Non-Convex First-Order Optimization and Information Processing: Lower Complexity Bounds and Rate Optimal Algorithms

We consider a class of popular distributed non-convex optimization problems, in which agents connected by a network $\mathcal{G}$ collectively optimize a sum of smooth (possibly non-convex) local objective functions. We address the following question: if the agents can only access the gradients of local functions, what are the fastest rates that any distributed algorithms can achieve, and how to achieve those rates. First, we show that there exist difficult problem instances, such that it takes a class of distributed first-order methods at least $\mathcal{O}(1/\sqrt{ξ(\mathcal{G})} \times \bar{L} /ε)$ communication rounds to achieve certain $ε$-solution [where $ξ(\mathcal{G})$ denotes the spectral gap of the graph Laplacian matrix, and $\bar{L}$ is some Lipschitz constant]. Second, we propose (near) optimal methods whose rates match the developed lower rate bound (up to a polylog factor). The key in the algorithm design is to properly embed the classical polynomial filtering techniques into modern first-order algorithms. To the best of our knowledge, this is the first time that lower rate bounds and optimal methods have been developed for distributed non-convex optimization problems.

preprint2016arXiv

Asynchronous Distributed ADMM for Large-Scale Optimization- Part I: Algorithm and Convergence Analysis

Aiming at solving large-scale learning problems, this paper studies distributed optimization methods based on the alternating direction method of multipliers (ADMM). By formulating the learning problem as a consensus problem, the ADMM can be used to solve the consensus problem in a fully parallel fashion over a computer network with a star topology. However, traditional synchronized computation does not scale well with the problem size, as the speed of the algorithm is limited by the slowest workers. This is particularly true in a heterogeneous network where the computing nodes experience different computation and communication delays. In this paper, we propose an asynchronous distributed ADMM (AD-AMM) which can effectively improve the time efficiency of distributed optimization. Our main interest lies in analyzing the convergence conditions of the AD-ADMM, under the popular partially asynchronous model, which is defined based on a maximum tolerable delay of the network. Specifically, by considering general and possibly non-convex cost functions, we show that the AD-ADMM is guaranteed to converge to the set of Karush-Kuhn-Tucker (KKT) points as long as the algorithm parameters are chosen appropriately according to the network delay. We further illustrate that the asynchrony of the ADMM has to be handled with care, as slightly modifying the implementation of the AD-ADMM can jeopardize the algorithm convergence, even under a standard convex setting.

preprint2016arXiv

Decomposing Linearly Constrained Nonconvex Problems by a Proximal Primal Dual Approach: Algorithms, Convergence, and Applications

In this paper, we propose a new decomposition approach named the proximal primal dual algorithm (Prox-PDA) for smooth nonconvex linearly constrained optimization problems. The proposed approach is primal-dual based, where the primal step minimizes certain approximation of the augmented Lagrangian of the problem, and the dual step performs an approximate dual ascent. The approximation used in the primal step is able to decompose the variable blocks, making it possible to obtain simple subproblems by leveraging the problem structures. Theoretically, we show that whenever the penalty parameter in the augmented Lagrangian is larger than a given threshold, the Prox-PDA converges to the set of stationary solutions, globally and in a sublinear manner (i.e., certain measure of stationarity decreases in the rate of $\mathcal{O}(1/r)$, where $r$ is the iteration counter). Interestingly, when applying a variant of the Prox-PDA to the problem of distributed nonconvex optimization (over a connected undirected graph), the resulting algorithm coincides with the popular EXTRA algorithm [Shi et al 2014], which is only known to work in convex cases. Our analysis implies that EXTRA and its variants converge globally sublinearly to stationary solutions of certain nonconvex distributed optimization problem. There are many possible extensions of the Prox-PDA, and we present one particular extension to certain nonconvex distributed matrix factorization problem.

preprint2016arXiv

Joint Source-Relay Design for Full--Duplex MIMO AF Relay Systems

The performance of full-duplex (FD) relay systems can be greatly impacted by the self-interference (SI) at relays. By exploiting multi-antenna in FD relay systems, the spectral efficiency of FD relay systems can be enhanced through spatial SI mitigation. This paper studies joint source transmit beamforming and relay processing to achieve rate maximization for FD MIMO amplify-and-forward (AF) relay systems with consideration of relay processing delay. The problem is difficult to solve due mainly to the SI constraint induced by the relay processing delay. In this paper, we first present a sufficient condition under which the relay amplification matrix has rank one structure. Then, for the case of rank one amplification matrix, the rate maximization problem is equivalently simplified into an unconstrained problem which can be locally solved using gradient ascent method. Next, we propose a penalty-based algorithmic framework, called P-BSUM, for a class of constrained optimization problems which have difficult equality constraints in addition to some convex constraints. By rewriting the rate maximization problem with a set of auxiliary variables, we apply the P-BSUM algorithm to the rate maximization problem in the general case. Finally, numerical results validate the efficiency of the proposed algorithms and show that the joint source-relay design approach under the rank one assumption could be strictly suboptimal as compared to the P-BSUM-based joint source-relay design approach.

preprint2016arXiv

Sample Approximation-Based Deflation Approaches for Chance SINR Constrained Joint Power and Admission Control

Consider the joint power and admission control (JPAC) problem for a multi-user single-input single-output (SISO) interference channel. Most existing works on JPAC assume the perfect instantaneous channel state information (CSI). In this paper, we consider the JPAC problem with the imperfect CSI, that is, we assume that only the channel distribution information (CDI) is available. We formulate the JPAC problem into a chance (probabilistic) constrained program, where each link's SINR outage probability is enforced to be less than or equal to a specified tolerance. To circumvent the computational difficulty of the chance SINR constraints, we propose to use the sample (scenario) approximation scheme to convert them into finitely many simple linear constraints. Furthermore, we reformulate the sample approximation of the chance SINR constrained JPAC problem as a composite group sparse minimization problem and then approximate it by a second-order cone program (SOCP). The solution of the SOCP approximation can be used to check the simultaneous supportability of all links in the network and to guide an iterative link removal procedure (the deflation approach). We exploit the special structure of the SOCP approximation and custom-design an efficient algorithm for solving it. Finally, we illustrate the effectiveness and efficiency of the proposed sample approximation-based deflation approaches by simulations.

preprint2016arXiv

Stochastic Proximal Gradient Consensus Over Random Networks

We consider solving a convex, possibly stochastic optimization problem over a randomly time-varying multi-agent network. Each agent has access to some local objective function, and it only has unbiased estimates of the gradients of the smooth component. We develop a dynamic stochastic proximal-gradient consensus (DySPGC) algorithm, with the following key features: i) it works for both the static and certain randomly time-varying networks, ii) it allows the agents to utilize either the exact or stochastic gradient information, iii) it is convergent with provable rate. In particular, we show that the proposed algorithm converges to a global optimal solution, with a rate of $\mathcal{O}(1/r)$ [resp. $\mathcal{O}(1/\sqrt{r})$] when the exact (resp. stochastic) gradient is available, where r is the iteration counter. Interestingly, the developed algorithm bridges a number of (seemingly unrelated) distributed optimization algorithms, such as the EXTRA (Shi et al. 2014), the PG-EXTRA (Shi et al. 2015), the IC/IDC-ADMM (Chang et al. 2014), and the DLM (Ling et al. 2015) and the classical distributed subgradient method. Identifying such relationship allows for significant generalization of these methods. We also discuss one such generalization which accelerates the DySPGC (hence accelerating EXTRA, PG-EXTRA, IC-ADMM).

preprint2015arXiv

A Unified Algorithmic Framework for Block-Structured Optimization Involving Big Data

This article presents a powerful algorithmic framework for big data optimization, called the Block Successive Upper bound Minimization (BSUM). The BSUM includes as special cases many well-known methods for analyzing massive data sets, such as the Block Coordinate Descent (BCD), the Convex-Concave Procedure (CCCP), the Block Coordinate Proximal Gradient (BCPG) method, the Nonnegative Matrix Factorization (NMF), the Expectation Maximization (EM) method and so on. In this article, various features and properties of the BSUM are discussed from the viewpoint of design flexibility, computational efficiency, parallel/distributed implementation and the required communication overhead. Illustrative examples from networking, signal processing and machine learning are presented to demonstrate the practical performance of the BSUM framework

preprint2015arXiv

Alternating direction method of multipliers for penalized zero-variance discriminant analysis

We consider the task of classification in the high dimensional setting where the number of features of the given data is significantly greater than the number of observations. To accomplish this task, we propose a heuristic, called sparse zero-variance discriminant analysis (SZVD), for simultaneously performing linear discriminant analysis and feature selection on high dimensional data. This method combines classical zero-variance discriminant analysis, where discriminant vectors are identified in the null space of the sample within-class covariance matrix, with penalization applied to induce sparse structures in the resulting vectors. To approximately solve the resulting nonconvex problem, we develop a simple algorithm based on the alternating direction method of multipliers. Further, we show that this algorithm is applicable to a larger class of penalized generalized eigenvalue problems, including a particular relaxation of the sparse principal component analysis problem. Finally, we establish theoretical guarantees for convergence of our algorithm to stationary points of the original nonconvex problem, and empirically demonstrate the effectiveness of our heuristic for classifying simulated data and data drawn from applications in time-series classification.

preprint2015arXiv

Asynchronous Distributed ADMM for Large-Scale Optimization- Part II: Linear Convergence Analysis and Numerical Performance

The alternating direction method of multipliers (ADMM) has been recognized as a versatile approach for solving modern large-scale machine learning and signal processing problems efficiently. When the data size and/or the problem dimension is large, a distributed version of ADMM can be used, which is capable of distributing the computation load and the data set to a network of computing nodes. Unfortunately, a direct synchronous implementation of such algorithm does not scale well with the problem size, as the algorithm speed is limited by the slowest computing nodes. To address this issue, in a companion paper, we have proposed an asynchronous distributed ADMM (AD-ADMM) and studied its worst-case convergence conditions. In this paper, we further the study by characterizing the conditions under which the AD-ADMM achieves linear convergence. Our conditions as well as the resulting linear rates reveal the impact that various algorithm parameters, network delay and network size have on the algorithm performance. To demonstrate the superior time efficiency of the proposed AD-ADMM, we test the AD-ADMM on a high-performance computer cluster by solving a large-scale logistic regression problem.

preprint2015arXiv

Convergence Analysis of Alternating Direction Method of Multipliers for a Family of Nonconvex Problems

The alternating direction method of multipliers (ADMM) is widely used to solve large-scale linearly constrained optimization problems, convex or nonconvex, in many engineering fields. However there is a general lack of theoretical understanding of the algorithm when the objective function is nonconvex. In this paper we analyze the convergence of the ADMM for solving certain nonconvex consensus and sharing problems, and show that the classical ADMM converges to the set of stationary solutions, provided that the penalty parameter in the augmented Lagrangian is chosen to be sufficiently large. For the sharing problems, we show that the ADMM is convergent regardless of the number of variable blocks. Our analysis does not impose any assumptions on the iterates generated by the algorithm, and is broadly applicable to many ADMM variants involving proximal update rules and various flexible block selection rules.

preprint2015arXiv

Decomposition by Successive Convex Approximation: A Unifying Approach for Linear Transceiver Design in Heterogeneous Networks

We study the downlink linear precoder design problem in a multi-cell dense heterogeneous network (HetNet). The problem is formulated as a general sum-utility maximization (SUM) problem, which includes as special cases many practical precoder design problems such as multi-cell coordinated linear precoding, full and partial per-cell coordinated multi-point transmission, zero-forcing precoding and joint BS clustering and beamforming/precoding. The SUM problem is difficult due to its non-convexity and the tight coupling of the users' precoders. In this paper we propose a novel convex approximation technique to approximate the original problem by a series of convex subproblems, each of which decomposes across all the cells. The convexity of the subproblems allows for efficient computation, while their decomposability leads to distributed implementation. {Our approach hinges upon the identification of certain key convexity properties of the sum-utility objective, which allows us to transform the problem into a form that can be solved using a popular algorithmic framework called BSUM (Block Successive Upper-Bound Minimization).} Simulation experiments show that the proposed framework is effective for solving interference management problems in large HetNet.

preprint2015arXiv

Improved Iteration Complexity Bounds of Cyclic Block Coordinate Descent for Convex Problems

The iteration complexity of the block-coordinate descent (BCD) type algorithm has been under extensive investigation. It was recently shown that for convex problems the classical cyclic BCGD (block coordinate gradient descent) achieves an $\mathcal{O}(1/r)$ complexity ($r$ is the number of passes of all blocks). However, such bounds are at least linearly depend on $K$ (the number of variable blocks), and are at least $K$ times worse than those of the gradient descent (GD) and proximal gradient (PG) methods. In this paper, we aim to close such theoretical performance gap between cyclic BCD and GD/PG. First we show that for a family of quadratic nonsmooth problems, the complexity bounds for cyclic Block Coordinate Proximal Gradient (BCPG), a popular variant of BCD, can match those of the GD/PG in terms of dependency on $K$ (up to a $\log^2(K)$ factor). For the same family of problems, we also improve the bounds of the classical BCD (with exact block minimization) by an order of $K$. Second, we establish an improved complexity bound of Coordinate Gradient Descent (CGD) for general convex problems which can match that of GD in certain scenarios. Our bounds are sharper than the known bounds as they are always at least $K$ times worse than GD. Our analyses do not depend on the update order of block variables inside each cycle, thus our results also apply to BCD methods with random permutation (random sampling without replacement, another popular variant).

preprint2015arXiv

Iteration Complexity Analysis of Block Coordinate Descent Methods

In this paper, we provide a unified iteration complexity analysis for a family of general block coordinate descent (BCD) methods, covering popular methods such as the block coordinate gradient descent (BCGD) and the block coordinate proximal gradient (BCPG), under various different coordinate update rules. We unify these algorithms under the so-called Block Successive Upper-bound Minimization (BSUM) framework, and show that for a broad class of multi-block nonsmooth convex problems, all algorithms covered by the BSUM framework achieve a global sublinear iteration complexity of $O(1/r)$, where r is the iteration index. Moreover, for the case of block coordinate minimization (BCM) where each block is minimized exactly, we establish the sublinear convergence rate of $O(1/r)$ without per block strong convexity assumption. Further, we show that when there are only two blocks of variables, a special BSUM algorithm with Gauss-Seidel rule can be accelerated to achieve an improved rate of $O(1/r^2)$.

preprint2015arXiv

Quantized Consensus ADMM for Multi-Agent Distributed Optimization

Multi-agent distributed optimization over a network minimizes a global objective formed by a sum of local convex functions using only local computation and communication. We develop and analyze a quantized distributed algorithm based on the alternating direction method of multipliers (ADMM) when inter-agent communications are subject to finite capacity and other practical constraints. While existing quantized ADMM approaches only work for quadratic local objectives, the proposed algorithm can deal with more general objective functions (possibly non-smooth) including the LASSO. Under certain convexity assumptions, our algorithm converges to a consensus within $\log_{1+η}Ω$ iterations, where $η>0$ depends on the local objectives and the network topology, and $Ω$ is a polynomial determined by the quantization resolution, the distance between initial and optimal variable values, the local objective functions and the network topology. A tight upper bound on the consensus error is also obtained which does not depend on the size of the network.

preprint2015arXiv

SINR Constrained Beamforming for a MIMO Multi-user Downlink System

Consider a multi-input multi-output (MIMO) downlink multi-user channel. A well-studied problem in such system is the design of linear beamformers for power minimization with the quality of service (QoS) constraints. The most representative algorithms for solving this class of problems are the so-called MMSE-SOCP algorithm [11-12] and the UDD algorithm [9]. The former is based on alternating optimization of the transmit and receive beamformers, while the latter is based on the well-known uplink-dowlink duality theory. Despite their wide applicability, the convergence (to KKT solutions) of both algorithms is still open in the literature. In this paper, we rigorously establish the convergence of these algorithms for QoS-constrained power minimization (QCPM) problem with both single stream and multiple streams per user cases. Key to our analysis is the development and analysis of a new MMSE-DUAL algorithm, which connects the MMSE-SOCP and the UDD algorithm. Our numerical experiments show that 1) all these algorithms can almost always reach points with the same objective value irrespective of initialization, 2) the MMSE-SOCP/MMSE-DUAL algorithm works well while the UDD algorithm may fail with an infeasible initialization.

preprint2014arXiv

A Block Successive Upper Bound Minimization Method of Multipliers for Linearly Constrained Convex Optimization

Consider the problem of minimizing the sum of a smooth convex function and a separable nonsmooth convex function subject to linear coupling constraints. Problems of this form arise in many contemporary applications including signal processing, wireless networking and smart grid provisioning. Motivated by the huge size of these applications, we propose a new class of first order primal-dual algorithms called the block successive upper-bound minimization method of multipliers (BSUM-M) to solve this family of problems. The BSUM-M updates the primal variable blocks successively by minimizing locally tight upper-bounds of the augmented Lagrangian of the original problem, followed by a gradient type update for the dual variable in closed form. We show that under certain regularity conditions, and when the primal block variables are updated in either a deterministic or a random fashion, the BSUM-M converges to the set of optimal solutions. Moreover, in the absence of linear constraints, we show that the BSUM-M, which reduces to the block successive upper-bound minimization (BSUM) method, is capable of linear convergence without strong convexity.

preprint2014arXiv

A Distributed, Asynchronous and Incremental Algorithm for Nonconvex Optimization: An ADMM Based Approach

The alternating direction method of multipliers (ADMM) has been popular for solving many signal processing problems, convex or nonconvex. In this paper, we study an asynchronous implementation of the ADMM for solving a nonconvex nonsmooth optimization problem, whose objective is the sum of a number of component functions. The proposed algorithm allows the problem to be solved in a distributed, asynchronous and incremental manner. First, the component functions can be distributed to different computing nodes, who perform the updates asynchronously without coordinating with each other. Two sources of asynchrony are covered by our algorithm: one is caused by the heterogeneity of the computational nodes, and the other arises from unreliable communication links. Second, the algorithm can be viewed as implementing an incremental algorithm where at each step the (possibly delayed) gradients of only a subset of component functions are update d. We show that when certain bounds are put on the level of asynchrony, the proposed algorithm converges to the set of stationary solutions (resp. optimal solutions) for the nonconvex (resp. convex) problem. To the best of our knowledge, the proposed ADMM implementation can tolerate the highest degree of asynchrony, among all known asynchronous variants of the ADMM. Moreover, it is the first ADMM implementation that can deal with nonconvexity and asynchrony at the same time.

preprint2014arXiv

Joint Downlink Base Station Association and Power Control for Max-Min Fairness: Computation and Complexity

In a heterogeneous network (HetNet) with a large number of low power base stations (BSs), proper user-BS association and power control is crucial to achieving desirable system performance. In this paper, we systematically study the joint BS association and power allocation problem for a downlink cellular network under the max-min fairness criterion. First, we show that this problem is NP-hard. Second, we show that the upper bound of the optimal value can be easily computed, and propose a two-stage algorithm to find a high-quality suboptimal solution. Simulation results show that the proposed algorithm is near-optimal in the high-SNR regime. Third, we show that the problem under some additional mild assumptions can be solved to global optima in polynomial time by a semi-distributed algorithm. This result is based on a transformation of the original problem to an assignment problem with gains $\log(g_{ij})$, where $\{g_{ij}\}$ are the channel gains.

preprint2014arXiv

Multi-Agent Distributed Optimization via Inexact Consensus ADMM

Multi-agent distributed consensus optimization problems arise in many signal processing applications. Recently, the alternating direction method of multipliers (ADMM) has been used for solving this family of problems. ADMM based distributed optimization method is shown to have faster convergence rate compared with classic methods based on consensus subgradient, but can be computationally expensive, especially for problems with complicated structures or large dimensions. In this paper, we propose low-complexity algorithms that can reduce the overall computational cost of consensus ADMM by an order of magnitude for certain large-scale problems. Central to the proposed algorithms is the use of an inexact step for each ADMM update, which enables the agents to perform cheap computation at each iteration. Our convergence analyses show that the proposed methods converge well under some convexity assumptions. Numerical results show that the proposed algorithms offer considerably lower computational complexity than the standard ADMM based distributed optimization methods.

preprint2014arXiv

Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization

Consider the problem of minimizing the sum of a smooth (possibly non-convex) and a convex (possibly nonsmooth) function involving a large number of variables. A popular approach to solve this problem is the block coordinate descent (BCD) method whereby at each iteration only one variable block is updated while the remaining variables are held fixed. With the recent advances in the developments of the multi-core parallel processing technology, it is desirable to parallelize the BCD method by allowing multiple blocks to be updated simultaneously at each iteration of the algorithm. In this work, we propose an inexact parallel BCD approach where at each iteration, a subset of the variables is updated in parallel by minimizing convex approximations of the original objective function. We investigate the convergence of this parallel BCD method for both randomized and cyclic variable selection rules. We analyze the asymptotic and non-asymptotic convergence behavior of the algorithm for both convex and non-convex objective functions. The numerical experiments suggest that for a special case of Lasso minimization problem, the cyclic block selection rule can outperform the randomized rule.

preprint2014arXiv

Semidefinite approximation for mixed binary quadratically constrained quadratic programs

Motivated by applications in wireless communications, this paper develops semidefinite programming (SDP) relaxation techniques for some mixed binary quadratically constrained quadratic programs (MBQCQP) and analyzes their approximation performance. We consider both a minimization and a maximization model of this problem. For the minimization model, the objective is to find a minimum norm vector in $N$-dimensional real or complex Euclidean space, such that $M$ concave quadratic constraints and a cardinality constraint are satisfied with both binary and continuous variables. {\color{blue}By employing a special randomized rounding procedure, we show that the ratio between the norm of the optimal solution of the minimization model and its SDP relaxation is upper bounded by $\cO(Q^2(M-Q+1)+M^2)$ in the real case and by $\cO(M(M-Q+1))$ in the complex case.} For the maximization model, the goal is to find a maximum norm vector subject to a set of quadratic constraints and a cardinality constraint with both binary and continuous variables. We show that in this case the approximation ratio is bounded from below by $\cO(ε/\ln(M))$ for both the real and the complex cases. Moreover, this ratio is tight up to a constant factor.

preprint2014arXiv

Semidefinite Relaxation for Two Mixed Binary Quadratically Constrained Quadratic Programs: Algorithms and Approximation Bounds

This paper develops new semidefinite programming (SDP) relaxation techniques for two classes of mixed binary quadratically constrained quadratic programs (MBQCQP) and analyzes their approximation performance. The first class of problem finds two minimum norm vectors in $N$-dimensional real or complex Euclidean space, such that $M$ out of $2M$ concave quadratic functions are satisfied. By employing a special randomized rounding procedure, we show that the ratio between the norm of the optimal solution of this model and its SDP relaxation is upper bounded by $\frac{54M^2}π$ in the real case and by $\frac{24M}{\sqrtπ}$ in the complex case. The second class of problem finds a series of minimum norm vectors subject to a set of quadratic constraints and a cardinality constraint with both binary and continuous variables. We show that in this case the approximation ratio is also bounded and independent of problem dimension for both the real and the complex cases.

preprint2013arXiv

Base Station Activation and Linear Transceiver Design for Optimal Resource Management in Heterogeneous Networks

In a densely deployed heterogeneous network (HetNet), the number of pico/micro base stations (BS) can be comparable with the number of the users. To reduce the operational overhead of the HetNet, proper identification of the set of serving BSs becomes an important design issue. In this work, we show that by jointly optimizing the transceivers and determining the active set of BSs, high system resource utilization can be achieved with only a small number of BSs. In particular, we provide formulations and efficient algorithms for such joint optimization problem, under the following two common design criteria: i) minimization of the total power consumption at the BSs, and ii) maximization of the system spectrum efficiency. In both cases, we introduce a nonsmooth regularizer to facilitate the activation of the most appropriate BSs. We illustrate the efficiency and the efficacy of the proposed algorithms via extensive numerical simulations.

preprint2013arXiv

Joint User Grouping and Linear Virtual Beamforming: Complexity, Algorithms and Approximation Bounds

In a wireless system with a large number of distributed nodes, the quality of communication can be greatly improved by pooling the nodes to perform joint transmission/reception. In this paper, we consider the problem of optimally selecting a subset of nodes from potentially a large number of candidates to form a virtual multi-antenna system, while at the same time designing their joint linear transmission strategies. We focus on two specific application scenarios: 1) multiple single antenna transmitters cooperatively transmit to a receiver; 2) a single transmitter transmits to a receiver with the help of a number of cooperative relays. We formulate the joint node selection and beamforming problems as cardinality constrained optimization problems with both discrete variables (used for selecting cooperative nodes) and continuous variables (used for designing beamformers). For each application scenario, we first characterize the computational complexity of the joint optimization problem, and then propose novel semi-definite relaxation (SDR) techniques to obtain approximate solutions. We show that the new SDR algorithms have a guaranteed approximation performance in terms of the gap to global optimality, regardless of channel realizations. The effectiveness of the proposed algorithms is demonstrated via numerical experiments.

preprint2013arXiv

Min Flow Rate Maximization for Software Defined Radio Access Networks

We consider a heterogeneous network (HetNet) of base stations (BSs) connected via a backhaul network of routers and wired/wireless links with limited capacity. The optimal provision of such networks requires proper resource allocation across the radio access links in conjunction with appropriate traffic engineering within the backhaul network. In this paper we propose an efficient algorithm for joint resource allocation across the wireless links and the flow control within the backhaul network. The proposed algorithm, which maximizes the minimum rate among all the users and/or flows, is based on a decomposition approach that leverages both the Alternating Direction Method of Multipliers (ADMM) and the weighted-MMSE (WMMSE) algorithm. We show that this algorithm is easily parallelizable and converges globally to a stationary solution of the joint optimization problem. The proposed algorithm can also be extended to deal with per-flow quality of service constraint, or to networks with multi-antenna nodes.

preprint2013arXiv

On the Linear Convergence of the Alternating Direction Method of Multipliers

We analyze the convergence rate of the alternating direction method of multipliers (ADMM) for minimizing the sum of two or more nonsmooth convex separable functions subject to linear constraints. Previous analysis of the ADMM typically assumes that the objective function is the sum of only two convex functions defined on two separable blocks of variables even though the algorithm works well in numerical experiments for three or more blocks. Moreover, there has been no rate of convergence analysis for the ADMM without strong convexity in the objective function. In this paper we establish the global linear convergence of the ADMM for minimizing the sum of any number of convex separable functions. This result settles a key question regarding the convergence of the ADMM when the number of blocks is more than two or if the strong convexity is absent. It also implies the linear convergence of the ADMM for several contemporary applications including LASSO, Group LASSO and Sparse Group LASSO without any strong convexity assumption. Our proof is based on estimating the distance from a dual feasible solution to the optimal dual solution set by the norm of a certain proximal residual, and by requiring the dual stepsize to be sufficiently small.

preprint2013arXiv

Outage Constrained Robust Secure Transmission for MISO Wiretap Channels

In this paper we consider the robust secure beamformer design for MISO wiretap channels. Assume that the eavesdroppers' channels are only partially available at the transmitter, we seek to maximize the secrecy rate under the transmit power and secrecy rate outage probability constraint. The outage probability constraint requires that the secrecy rate exceeds certain threshold with high probability. Therefore including such constraint in the design naturally ensures the desired robustness. Unfortunately, the presence of the probabilistic constraints makes the problem non-convex and hence difficult to solve. In this paper, we investigate the outage probability constrained secrecy rate maximization problem using a novel two-step approach. Under a wide range of uncertainty models, our developed algorithms can obtain high-quality solutions, sometimes even exact global solutions, for the robust secure beamformer design problem. Simulation results are presented to verify the effectiveness and robustness of the proposed algorithms.

preprint2013arXiv

Solving Multiple-Block Separable Convex Minimization Problems Using Two-Block Alternating Direction Method of Multipliers

In this paper, we consider solving multiple-block separable convex minimization problems using alternating direction method of multipliers (ADMM). Motivated by the fact that the existing convergence theory for ADMM is mostly limited to the two-block case, we analyze in this paper, both theoretically and numerically, a new strategy that first transforms a multi-block problem into an equivalent two-block problem (either in the primal domain or in the dual domain) and then solves it using the standard two-block ADMM. In particular, we derive convergence results for this two-block ADMM approach to solve multi-block separable convex minimization problems, including an improved O(1/ε) iteration complexity result. Moreover, we compare the numerical efficiency of this approach with the standard multi-block ADMM on several separable convex minimization problems which include basis pursuit, robust principal component analysis and latent variable Gaussian graphical model selection. The numerical results show that the multiple-block ADMM, although lacks theoretical convergence guarantees, typically outperforms two-block ADMMs.

preprint2012arXiv

A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization

The block coordinate descent (BCD) method is widely used for minimizing a continuous function f of several block variables. At each iteration of this method, a single block of variables is optimized, while the remaining variables are held fixed. To ensure the convergence of the BCD method, the subproblem to be optimized in each iteration needs to be solved exactly to its unique optimal solution. Unfortunately, these requirements are often too restrictive for many practical scenarios. In this paper, we study an alternative inexact BCD approach which updates the variable blocks by successively minimizing a sequence of approximations of f which are either locally tight upper bounds of f or strictly convex local approximations of f. We focus on characterizing the convergence properties for a fairly wide class of such methods, especially for the cases where the objective functions are either non-differentiable or nonconvex. Our results unify and extend the existing convergence results for many classical algorithms such as the BCD method, the difference of convex functions (DC) method, the expectation maximization (EM) algorithm, as well as the alternating proximal minimization algorithm.

preprint2012arXiv

An Adaptive Online Ad Auction Scoring Algorithm for Revenue Maximization

Sponsored search becomes an easy platform to match potential consumers' intent with merchants' advertising. Advertisers express their willingness to pay for each keyword in terms of bids to the search engine. When a user's query matches the keyword, the search engine evaluates the bids and allocates slots to the advertisers that are displayed along side the unpaid algorithmic search results. The advertiser only pays the search engine when its ad is clicked by the user and the price-per-click is determined by the bids of other competing advertisers.

preprint2012arXiv

Distributed Linear Precoder Optimization and Base Station Selection for an Uplink Heterogeneous Network

In a heterogeneous wireless cellular network, each user may be covered by multiple access points such as macro/pico/relay/femto base stations (BS). An effective approach to maximize the sum utility (e.g., system throughput) in such a network is to jointly optimize users' linear procoders as well as their base station associations. In this paper we first show that this joint optimization problem is NP-hard and thus is difficult to solve to global optimality. To find a locally optimal solution, we formulate the problem as a noncooperative game in which the users and the BSs both act as players. We introduce a set of new utility functions for the players and show that every Nash equilibrium (NE) of the resulting game is a stationary solution of the original sum utility maximization problem. Moreover, we develop a best-response type algorithm that allows the players to distributedly reach a NE of the game. Simulation results show that the proposed distributed algorithm can effectively relieve local BS congestion and simultaneously achieve high throughput and load balancing in a heterogeneous network.

preprint2012arXiv

Joint Access Point Selection and Power Allocation for Uplink Wireless Networks

We consider the distributed uplink resource allocation problem in a multi-carrier wireless network with multiple access points (APs). Each mobile user can optimize its own transmission rate by selecting a suitable AP and by controlling its transmit power. Our objective is to devise suitable algorithms by which mobile users can jointly perform these tasks in a distributed manner. Our approach relies on a game theoretic formulation of the joint power control and AP selection problem. In the proposed game, each user is a player with an associated strategy containing a discrete variable (the AP selection decision) and a continuous vector (the power allocation among multiple channels). We provide characterizations of the Nash Equilibrium of the proposed game, and present a set of novel algorithms that allow the users to efficiently optimize their rates. Finally, we study the properties of the proposed algorithms as well as their performance via extensive simulations.

preprint2012arXiv

Joint Base Station Clustering and Beamformer Design for Partial Coordinated Transmission in Heterogenous Networks

We consider the interference management problem in a multicell MIMO heterogenous network. Within each cell there are a large number of distributed micro/pico base stations (BSs) that can be potentially coordinated for joint transmission. To reduce coordination overhead, we consider user-centric BS clustering so that each user is served by only a small number of (potentially overlapping) BSs. Thus, given the channel state information, our objective is to jointly design the BS clustering and the linear beamformers for all BSs in the network. In this paper, we formulate this problem from a {sparse optimization} perspective, and propose an efficient algorithm that is based on iteratively solving a sequence of group LASSO problems. A novel feature of the proposed algorithm is that it performs BS clustering and beamformer design jointly rather than separately as is done in the existing approaches for partial coordinated transmission. Moreover, the cluster size can be controlled by adjusting a single penalty parameter in the nonsmooth regularized utility function. The convergence of the proposed algorithm (to a local optimal solution) is guaranteed, and its effectiveness is demonstrated via extensive simulation.

preprint2012arXiv

Linear Transceiver Design for a MIMO Interfering Broadcast Channel Achieving Max-Min Fairness

We consider the problem of linear transceiver design to achieve max-min fairness in a downlink MIMO multicell network. This problem can be formulated as maximizing the minimum rate among all the users in an interfering broadcast channel (IBC). In this paper we show that when the number of antennas is at least two at each of the transmitters and the receivers, the min rate maximization problem is NP-hard in the number of users. Moreover, we develop a low-complexity algorithm for this problem by iteratively solving a sequence of convex subproblems, and establish its global convergence to a stationary point of the original minimum rate maximization problem. Numerical simulations show that this algorithm is efficient in achieving fairness among all the users.

preprint2012arXiv

Mechanism Design for Base Station Association and Resource Allocation in Downlink OFDMA Network

We consider a resource management problem in a multi-cell downlink OFDMA network, whereby the goal is to find the optimal per base station resource allocation and user-base station assignment. The users are assumed to be strategic/selfish who have private information on downlink channel states and noise levels. To induce truthfulness among the users as well as to enhance the spectrum efficiency, the resource management strategy needs to be both incentive compatible and efficient. However, due to the mixed (discrete and continuous) nature of resource management in this context, the implementation of any incentive compatible mechanism that maximizes the system throughput is NP-hard. We consider the dominant strategy implementation of an approximately optimal resource management scheme via a computationally tractable mechanism. The proposed mechanism is decentralized and dynamic. More importantly, it ensures the truthfulness of the users and it implements a resource allocation solution that yields at least 1/2 of the optimal throughput. Simulations are provided to illustrate the effectiveness of the performance of the proposed mechanism.

preprint2012arXiv

Signal Processing and Optimal Resource Allocation for the Interference Channel

In this article, we examine several design and complexity aspects of the optimal physical layer resource allocation problem for a generic interference channel (IC). The latter is a natural model for multi-user communication networks. In particular, we characterize the computational complexity, the convexity as well as the duality of the optimal resource allocation problem. Moreover, we summarize various existing algorithms for resource allocation and discuss their complexity and performance tradeoff. We also mention various open research problems throughout the article.

preprint2011arXiv

Averaged Iterative Water-Filling Algorithm: Robustness and Convergence

The convergence properties of the Iterative water-filling (IWF) based algorithms have been derived in the ideal situation where the transmitters in the network are able to obtain the exact value of the interference plus noise (IPN) experienced at the corresponding receivers in each iteration of the algorithm. However, these algorithms are not robust because they diverge when there is it time-varying estimation error of the IPN, a situation that arises in real communication system. In this correspondence, we propose an algorithm that possesses convergence guarantees in the presence of various forms of such time-varying error. Moreover, we also show by simulation that in scenarios where the interference is strong, the conventional IWF diverges while our proposed algorithm still converges.

preprint2011arXiv

Distributed Uplink Resource Allocation in Cognitive Radio Networks -- Part I: Equilibria and Algorithms for Power Allocation

Spectrum management has been identified as a crucial step towards enabling the technology of a cognitive radio network (CRN). Most of the current works dealing with spectrum management in the CRN focus on a single task of the problem, e.g., spectrum sensing, spectrum decision, spectrum sharing or spectrum mobility. In this two-part paper, we argue that for certain network configurations, jointly performing several tasks of the spectrum management improves the spectrum efficiency. Specifically, our aim is to study the uplink resource management problem in a CRN where there exist multiple cognitive users (CUs) and access points (APs). The CUs, in order to maximize their uplink transmission rates, have to associate to a suitable AP (spectrum decision), and to share the channels used by this AP with other CUs (spectrum sharing). These tasks are clearly interdependent, and the problem of how they should be carried out efficiently and in a distributed manner is still open in the literature.

preprint2011arXiv

Distributed Uplink Resource Allocation in Cognitive Radio Networks -- Part II: Equilibria and Algorithms for Joint Access Point Selection and Power Allocation

In the first part of this paper, we have studied solely the spectrum sharing aspect of the above problem, and proposed algorithms for the CUs in the single AP network to efficiently share the spectrum. In this second part of the paper, we build upon our previous understanding of the single AP network, and formulate the joint spectrum decision and spectrum sharing problem in a multiple AP network into a non-cooperative game, in which the feasible strategy of a player contains a discrete variable (the AP/spectrum decision) and a continuous vector (the power allocation among multiple channels). The structure of the game is hence very different from most non-cooperative spectrum management game proposed in the literature. We provide characterization of the Nash Equilibrium (NE) of this game, and present a set of novel algorithms that allow the CUs to distributively and efficiently select the suitable AP and share the channels with other CUs. Finally, we study the properties of the proposed algorithms as well as their performance via extensive simulations.

preprint2011arXiv

Joint Distributed Access Point Selection and Power Allocation in Cognitive Radio Networks

Spectrum management has been identified as a crucial step towards enabling the technology of the cognitive radio network (CRN). Most of the current works dealing with spectrum management in the CRN focus on a single task of the problem, e.g., spectrum sensing, spectrum decision, spectrum sharing or spectrum mobility. In this work, we argue that for certain network configurations, jointly performing several tasks of the spectrum management improves the spectrum efficiency. Specifically, we study the uplink resource management problem in a CRN where there exist multiple cognitive users (CUs) and access points (APs), with each AP operates on a set of non-overlapping channels. The CUs, in order to maximize their uplink transmission rates, have to associate to a suitable AP (spectrum decision), and to share the channels belong to this AP with other CUs (spectrum sharing). These tasks are clearly interdependent, and the problem of how they should be carried out efficiently and distributedly is still open in the literature. In this work we formulate this joint spectrum decision and spectrum sharing problem into a non-cooperative game, in which the feasible strategy of a player contains a discrete variable and a continuous vector. The structure of the game is hence very different from most non-cooperative spectrum management game proposed in the literature. We provide characterization of the Nash Equilibrium (NE) of this game, and present a set of novel algorithms that allow the CUs to distributively and efficiently select the suitable AP and share the channels with other CUs. Finally, we study the properties of the proposed algorithms as well as their performance via extensive simulations.

preprint2011arXiv

Lower Bounds Optimization for Coordinated Linear Transmission Beamformer Design in Multicell Network Downlink

We consider the coordinated downlink beamforming problem in a cellular network with the base stations (BSs) equipped with multiple antennas, and with each user equipped with a single antenna. The BSs cooperate in sharing their local interference information, and they aim at maximizing the sum rate of the users in the network. A set of new lower bounds (one bound for each BS) of the non-convex sum rate is identified. These bounds facilitate the development of a set of algorithms that allow the BSs to update their beams by optimizing their respective lower bounds. We show that when there is a single user per-BS, the lower bound maximization problem can be solved exactly with rank-1 solutions. In this case, the overall sum rate maximization problem can be solved to a KKT point. Numerical results show that the proposed algorithms achieve high system throughput with reduced backhaul information exchange among the BSs.

Mingyi Hong

What is connected

Connect this record

See the researcher in context

Building this map preview

67 published item(s)

CUDAHercules: Benchmarking Hardware-Aware Expert-level CUDA Optimization for LLMs

Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

Revisiting the Adam-SGD Gap in LLM Pre-Training: The Role of Large Effective Learning Rates

Krylov Cubic Regularized Newton: A Subspace Second-Order Method with Dimension-Free Convergence Rate

A Framework for Understanding Model Extraction Attack and Defense

A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic

Deep Spectrum Cartography: Completing Radio Map Tensors Using Learned Neural Models

Distributed Adversarial Training to Robustify Deep Neural Networks at Scale

Dynamic Differential-Privacy Preserving SGD

How to Robustify Black-Box ML Models? A Zeroth-Order Optimization Perspective

Zeroth-Order SciML: Non-intrusive Integration of Scientific Software with Deep Learning

Decentralized Riemannian Gradient Descent on the Stiefel Manifold

Hybrid Federated Learning: Algorithms and Implementation

Learning to Continuously Optimize Wireless Resource in a Dynamic Environment: A Bilevel Optimization Perspective

On Instabilities of Conventional Multi-Coil MRI Reconstruction to Small Adverserial Perturbations

On the Local Linear Rate of Consensus on the Stiefel Manifold

Online Proximal-ADMM For Time-varying Constrained Convex Optimization

Stochastic Mirror Descent for Low-Rank Tensor Decomposition Under Non-Euclidean Losses

A Communication Efficient Collaborative Learning Framework for Distributed Features

Dense Recurrent Neural Networks for Accelerated MRI: History-Cognizant Unrolling of Optimization Algorithms

Distributed Learning in the Non-Convex World: From Batch to Streaming Data, and Beyond

Generalization Bounds for Stochastic Saddle Point Problems

Imitation Privacy

Joint Channel Assignment and Power Allocation for Multi-UAV Communication

On the Divergence of Decentralized Non-Convex Optimization

Private Stochastic Non-Convex Optimization: Adaptive Algorithms and Tighter Generalization Bounds

Distributed Non-Convex First-Order Optimization and Information Processing: Lower Complexity Bounds and Rate Optimal Algorithms

Asynchronous Distributed ADMM for Large-Scale Optimization- Part I: Algorithm and Convergence Analysis

Decomposing Linearly Constrained Nonconvex Problems by a Proximal Primal Dual Approach: Algorithms, Convergence, and Applications

Joint Source-Relay Design for Full--Duplex MIMO AF Relay Systems

Sample Approximation-Based Deflation Approaches for Chance SINR Constrained Joint Power and Admission Control

Stochastic Proximal Gradient Consensus Over Random Networks

A Unified Algorithmic Framework for Block-Structured Optimization Involving Big Data

Alternating direction method of multipliers for penalized zero-variance discriminant analysis

Asynchronous Distributed ADMM for Large-Scale Optimization- Part II: Linear Convergence Analysis and Numerical Performance

Convergence Analysis of Alternating Direction Method of Multipliers for a Family of Nonconvex Problems

Decomposition by Successive Convex Approximation: A Unifying Approach for Linear Transceiver Design in Heterogeneous Networks

Improved Iteration Complexity Bounds of Cyclic Block Coordinate Descent for Convex Problems

Iteration Complexity Analysis of Block Coordinate Descent Methods

Quantized Consensus ADMM for Multi-Agent Distributed Optimization

SINR Constrained Beamforming for a MIMO Multi-user Downlink System

A Block Successive Upper Bound Minimization Method of Multipliers for Linearly Constrained Convex Optimization

A Distributed, Asynchronous and Incremental Algorithm for Nonconvex Optimization: An ADMM Based Approach

Joint Downlink Base Station Association and Power Control for Max-Min Fairness: Computation and Complexity

Multi-Agent Distributed Optimization via Inexact Consensus ADMM

Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization

Semidefinite approximation for mixed binary quadratically constrained quadratic programs

Semidefinite Relaxation for Two Mixed Binary Quadratically Constrained Quadratic Programs: Algorithms and Approximation Bounds

Base Station Activation and Linear Transceiver Design for Optimal Resource Management in Heterogeneous Networks

Joint User Grouping and Linear Virtual Beamforming: Complexity, Algorithms and Approximation Bounds

Min Flow Rate Maximization for Software Defined Radio Access Networks

On the Linear Convergence of the Alternating Direction Method of Multipliers

Outage Constrained Robust Secure Transmission for MISO Wiretap Channels

Solving Multiple-Block Separable Convex Minimization Problems Using Two-Block Alternating Direction Method of Multipliers

A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization

An Adaptive Online Ad Auction Scoring Algorithm for Revenue Maximization

Distributed Linear Precoder Optimization and Base Station Selection for an Uplink Heterogeneous Network

Joint Access Point Selection and Power Allocation for Uplink Wireless Networks

Joint Base Station Clustering and Beamformer Design for Partial Coordinated Transmission in Heterogenous Networks

Linear Transceiver Design for a MIMO Interfering Broadcast Channel Achieving Max-Min Fairness

Mechanism Design for Base Station Association and Resource Allocation in Downlink OFDMA Network

Signal Processing and Optimal Resource Allocation for the Interference Channel

Averaged Iterative Water-Filling Algorithm: Robustness and Convergence

Distributed Uplink Resource Allocation in Cognitive Radio Networks -- Part I: Equilibria and Algorithms for Power Allocation

Distributed Uplink Resource Allocation in Cognitive Radio Networks -- Part II: Equilibria and Algorithms for Joint Access Point Selection and Power Allocation

Joint Distributed Access Point Selection and Power Allocation in Cognitive Radio Networks

Lower Bounds Optimization for Coordinated Linear Transmission Beamformer Design in Multicell Network Downlink