Source author record

Hoi-To Wai

Hoi-To Wai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.OC eess.SP Distributed, Parallel, and Cluster Computing Information Theory math.IT eess.IV math.PR math.ST Molecular Networks Multiagent Systems Quantitative Methods Social and Information Networks Statistics Theory Systems and Control

Catalog footprint

What is connected

18works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic

This paper analyzes a two-timescale stochastic algorithm framework for bilevel optimization. Bilevel optimization is a class of problems which exhibit a two-level structure, and its goal is to minimize an outer objective function with variables which are constrained to be the optimal solution to an (inner) optimization problem. We consider the case when the inner problem is unconstrained and strongly convex, while the outer problem is constrained and has a smooth objective function. We propose a two-timescale stochastic approximation (TTSA) algorithm for tackling such a bilevel problem. In the algorithm, a stochastic gradient update with a larger step size is used for the inner problem, while a projected stochastic gradient update with a smaller step size is used for the outer problem. We analyze the convergence rates for the TTSA algorithm under various settings: when the outer problem is strongly convex (resp.~weakly convex), the TTSA algorithm finds an $\mathcal{O}(K^{-2/3})$-optimal (resp.~$\mathcal{O}(K^{-2/5})$-stationary) solution, where $K$ is the total iteration number. As an application, we show that a two-timescale natural actor-critic proximal policy optimization algorithm can be viewed as a special case of our TTSA framework. Importantly, the natural actor-critic algorithm is shown to converge at a rate of $\mathcal{O}(K^{-1/4})$ in terms of the gap in expected discounted reward compared to a global optimal policy.

preprint2022arXiv

Detecting Central Nodes from Low-rank Excited Graph Signals via Structured Factor Analysis

This paper treats a blind detection problem to identify the central nodes in a graph from filtered graph signals. Unlike prior works which impose strong restrictions on the data model, we only require the underlying graph filter to satisfy a low pass property with a generic low-rank excitation model. We treat two cases depending on the low pass graph filter's strength. When the graph filter is strong low pass, i.e., it has a frequency response that drops sharply at the high frequencies, we show that the principal component analysis (PCA) method detects central nodes with high accuracy. For general low pass graph filter, we show that the graph signals can be described by a structured factor model featuring the product between a low-rank plus sparse factor and an unstructured factor. We propose a two-stage decomposition algorithm to learn the structured factor model via a judicious combination of the non-negative matrix factorization and robust PCA algorithms. We analyze the identifiability conditions for the model which lead to accurate central nodes detection. Numerical experiments on synthetic and real data are provided to support our findings. We demonstrate significant performance gains over prior works.

preprint2022arXiv

Multi-agent Performative Prediction with Greedy Deployment and Consensus Seeking Agents

We consider a scenario where multiple agents are learning a common decision vector from data which can be influenced by the agents' decisions. This leads to the problem of multi-agent performative prediction (Multi-PfD). In this paper, we formulate Multi-PfD as a decentralized optimization problem that minimizes a sum of loss functions, where each loss function is based on a distribution influenced by the local decision vector. We first prove the necessary and sufficient condition for the Multi-PfD problem to admit a unique multi-agent performative stable (Multi-PS) solution. We show that enforcing consensus leads to a laxer condition for the existence of Multi-PS solution with respect to the distributions' sensitivities, compared to the single agent case. Then, we study a decentralized extension to the greedy deployment scheme [Mendler-Dünner et al., 2020], called the DSGD-GD scheme. We show that DSGD-GD converges to the Multi-PS solution and analyze its non-asymptotic convergence rate. Numerical results validate our analysis.

preprint2022arXiv

On the Role of Data Homogeneity in Multi-Agent Non-convex Stochastic Optimization

This paper studies the role of data homogeneity on multi-agent optimization. Concentrating on the decentralized stochastic gradient (DSGD) algorithm, we characterize the transient time, defined as the minimum number of iterations required such that DSGD can achieve comparable performance as its centralized counterpart. When the Hessians for the objective functions are identical at different agents, we show that the transient time of DSGD is $O( n^{4/3} / ρ^{8/3})$ for smooth (possibly non-convex) objective functions, where $n$ is the number of agents and $ρ$ is the spectral gap of connectivity graph. This is improved over the bound of $O( n^2 / ρ^4 )$ without the Hessian homogeneity assumption. Our analysis leverages a property that the objective function is twice continuously differentiable. Numerical experiments are presented to illustrate the essence of data homogeneity to fast convergence of DSGD.

preprint2022arXiv

Robust Distributed Optimization With Randomly Corrupted Gradients

In this paper, we propose a first-order distributed optimization algorithm that is provably robust to Byzantine failures-arbitrary and potentially adversarial behavior, where all the participating agents are prone to failure. We model each agent's state over time as a two-state Markov chain that indicates Byzantine or trustworthy behaviors at different time instants. We set no restrictions on the maximum number of Byzantine agents at any given time. We design our method based on three layers of defense: 1) temporal robust aggregation, 2) spatial robust aggregation, and 3) gradient normalization. We study two settings for stochastic optimization, namely Sample Average Approximation and Stochastic Approximation. We provide convergence guarantees of our method for strongly convex and smooth non-convex cost functions.

preprint2021arXiv

Federated Block Coordinate Descent Scheme for Learning Global and Personalized Models

In federated learning, models are learned from users' data that are held private in their edge devices, by aggregating them in the service provider's "cloud" to obtain a global model. Such global model is of great commercial value in, e.g., improving the customers' experience. In this paper we focus on two possible areas of improvement of the state of the art. First, we take the difference between user habits into account and propose a quadratic penalty-based formulation, for efficient learning of the global model that allows to personalize local models. Second, we address the latency issue associated with the heterogeneous training time on edge devices, by exploiting a hierarchical structure modeling communication not only between the cloud and edge devices, but also within the cloud. Specifically, we devise a tailored block coordinate descent-based computation scheme, accompanied with communication protocols for both the synchronous and asynchronous cloud settings. We characterize the theoretical convergence rate of the algorithm, and provide a variant that performs empirically better. We also prove that the asynchronous protocol, inspired by multi-agent consensus technique, has the potential for large gains in latency compared to a synchronous setting when the edge-device updates are intermittent. Finally, experimental results are provided that corroborate not only the theory, but also show that the system leads to faster convergence for personalized models on the edge devices, compared to the state of the art.

preprint2021arXiv

Identifying First-order Lowpass Graph Signals using Perron Frobenius Theorem

This paper is concerned with the blind identification of graph filters from graph signals. Our aim is to determine if the graph filter generating the graph signals is first-order lowpass without knowing the graph topology. Notice that lowpass graph filter is a common prerequisite for applying graph signal processing tools for sampling, denoising, and graph learning. Our method is inspired by the Perron Frobenius theorem, which observes that for first-order lowpass graph filter, the top eigenvector of output covariance would be the only eigenvector with elements of the same sign. Utilizing this observation, we develop a simple detector that answers if a given data set is produced by a first-order lowpass graph filter. We analyze the effects of finite-sample, graph size, observation noise, strength of lowpass filter, on the detector's performance. Numerical experiments on synthetic and real data support our findings.

preprint2021arXiv

On the Stability of Random Matrix Product with Markovian Noise: Application to Linear Stochastic Approximation and TD Learning

This paper studies the exponential stability of random matrix products driven by a general (possibly unbounded) state space Markov chain. It is a cornerstone in the analysis of stochastic algorithms in machine learning (e.g. for parameter tracking in online learning or reinforcement learning). The existing results impose strong conditions such as uniform boundedness of the matrix-valued functions and uniform ergodicity of the Markov chains. Our main contribution is an exponential stability result for the $p$-th moment of random matrix product, provided that (i) the underlying Markov chain satisfies a super-Lyapunov drift condition, (ii) the growth of the matrix-valued functions is controlled by an appropriately defined function (related to the drift condition). Using this result, we give finite-time $p$-th moment bounds for constant and decreasing stepsize linear stochastic approximation schemes with Markovian noise on general state space. We illustrate these findings for linear value-function estimation in reinforcement learning. We provide finite-time $p$-th moment bound for various members of temporal difference (TD) family of algorithms.

preprint2020arXiv

Accelerating Incremental Gradient Optimization with Curvature Information

This paper studies an acceleration technique for incremental aggregated gradient ({\sf IAG}) method through the use of \emph{curvature} information for solving strongly convex finite sum optimization problems. These optimization problems of interest arise in large-scale learning applications. Our technique utilizes a curvature-aided gradient tracking step to produce accurate gradient estimates incrementally using Hessian information. We propose and analyze two methods utilizing the new technique, the curvature-aided IAG ({\sf CIAG}) method and the accelerated CIAG ({\sf A-CIAG}) method, which are analogous to gradient method and Nesterov's accelerated gradient method, respectively. Setting $κ$ to be the condition number of the objective function, we prove the $R$ linear convergence rates of $1 - \frac{4c_0 κ}{(κ+1)^2}$ for the {\sf CIAG} method, and $1 - \sqrt{\frac{c_1}{2κ}}$ for the {\sf A-CIAG} method, where $c_0,c_1 \leq 1$ are constants inversely proportional to the distance between the initial point and the optimal solution. When the initial iterate is close to the optimal solution, the $R$ linear convergence rates match with the gradient and accelerated gradient method, albeit {\sf CIAG} and {\sf A-CIAG} operate in an incremental setting with strictly lower computation complexity. Numerical experiments confirm our findings. The source codes used for this paper can be found on \url{http://github.com/hoitowai/ciag/}.

preprint2020arXiv

Block-Randomized Stochastic Proximal Gradient for Low-Rank Tensor Factorization

This work considers the problem of computing the canonical polyadic decomposition (CPD) of large tensors. Prior works mostly leverage data sparsity to handle this problem, which is not suitable for handling dense tensors that often arise in applications such as medical imaging, computer vision, and remote sensing. Stochastic optimization is known for its low memory cost and per-iteration complexity when handling dense data. However, exisiting stochastic CPD algorithms are not flexible enough to incorporate a variety of constraints/regularizations that are of interest in signal and data analytics. Convergence properties of many such algorithms are also unclear. In this work, we propose a stochastic optimization framework for large-scale CPD with constraints/regularizations. The framework works under a doubly randomized fashion, and can be regarded as a judicious combination of randomized block coordinate descent (BCD) and stochastic proximal gradient (SPG). The algorithm enjoys lightweight updates and small memory footprint. In addition, this framework entails considerable flexibility---many frequently used regularizers and constraints can be readily handled under the proposed scheme. The approach is also supported by convergence analysis. Numerical results on large-scale dense tensors are employed to showcase the effectiveness of the proposed approach.

preprint2020arXiv

Distributed Learning in the Non-Convex World: From Batch to Streaming Data, and Beyond

Distributed learning has become a critical enabler of the massively connected world envisioned by many. This article discusses four key elements of scalable distributed processing and real-time intelligence --- problems, data, communication and computation. Our aim is to provide a fresh and unique perspective about how these elements should work together in an effective and coherent manner. In particular, we {provide a selective review} about the recent techniques developed for optimizing non-convex models (i.e., problem classes), processing batch and streaming data (i.e., data types), over the networks in a distributed manner (i.e., communication and computation paradigm). We describe the intuitions and connections behind a core set of popular distributed algorithms, emphasizing how to trade off between computation and communication costs. Practical issues and future research directions will also be discussed.

preprint2020arXiv

Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise

Linear two-timescale stochastic approximation (SA) scheme is an important class of algorithms which has become popular in reinforcement learning (RL), particularly for the policy evaluation problem. Recently, a number of works have been devoted to establishing the finite time analysis of the scheme, especially under the Markovian (non-i.i.d.) noise settings that are ubiquitous in practice. In this paper, we provide a finite-time analysis for linear two timescale SA. Our bounds show that there is no discrepancy in the convergence rate between Markovian and martingale noise, only the constants are affected by the mixing time of the Markov chain. With an appropriate step size schedule, the transient term in the expected error bound is $o(1/k^c)$ and the steady-state term is ${\cal O}(1/k)$, where $c>1$ and $k$ is the iteration number. Furthermore, we present an asymptotic expansion of the expected error with a matching lower bound of $Ω(1/k)$. A simple numerical experiment is presented to support our theory.

preprint2020arXiv

Hybrid Inexact BCD for Coupled Structured Matrix Factorization in Hyperspectral Super-Resolution

This paper develops a first-order optimization method for coupled structured matrix factorization (CoSMF) problems that arise in the context of hyperspectral super-resolution (HSR) in remote sensing. To best leverage the problem structures for computational efficiency, we introduce a hybrid inexact block coordinate descent (HiBCD) scheme wherein one coordinate is updated via the fast proximal gradient (FPG) method, while another via the Frank-Wolfe (FW) method. The FPG-type methods are known to take less number of iterations to converge, by numerical experience, while the FW-type methods can offer lower per-iteration complexity in certain cases; and we wish to take the best of both. We show that the limit points of this HiBCD scheme are stationary. Our proof treats HiBCD as an optimization framework for a class of multi-block structured optimization problems, and our stationarity claim is applicable not only to CoSMF but also to many other problems. Previous optimization research showed the same stationarity result for inexact block coordinate descent with either FPG or FW updates only. Numerical results indicate that the proposed HiBCD scheme is computationally much more efficient than the state-of-the-art CoSMF schemes in HSR.

preprint2016arXiv

On the Online Frank-Wolfe Algorithms for Convex and Non-convex Optimizations

In this paper, the online variants of the classical Frank-Wolfe algorithm are considered. We consider minimizing the regret with a stochastic cost. The online algorithms only require simple iterative updates and a non-adaptive step size rule, in contrast to the hybrid schemes commonly considered in the literature. Several new results are derived for convex and non-convex losses. With a strongly convex stochastic cost and when the optimal solution lies in the interior of the constraint set or the constraint set is a polytope, the regret bound and anytime optimality are shown to be ${\cal O}( \log^3 T / T )$ and ${\cal O}( \log^2 T / T)$, respectively, where $T$ is the number of rounds played. These results are based on an improved analysis on the stochastic Frank-Wolfe algorithms. Moreover, the online algorithms are shown to converge even when the loss is non-convex, i.e., the algorithms find a stationary point to the time-varying/stochastic loss at a rate of ${\cal O}(\sqrt{1/T})$. Numerical experiments on realistic data sets are presented to support our theoretical claims.

preprint2016arXiv

Optimal Pricing to Manage Electric Vehicles in Coupled Power and Transportation Networks

We study the system-level effects of the introduction of large populations of Electric Vehicles on the power and transportation networks. We assume that each EV owner solves a decision problem to pick a cost-minimizing charge and travel plan. This individual decision takes into account traffic congestion in the transportation network, affecting travel times, as well as as congestion in the power grid, resulting in spatial variations in electricity prices for battery charging. We show that this decision problem is equivalent to finding the shortest path on an "extended" transportation graph, with virtual arcs that represent charging options. Using this extended graph, we study the collective effects of a large number of EV owners individually solving this path planning problem. We propose a scheme in which independent power and transportation system operators can collaborate to manage each network towards a socially optimum operating point while keeping the operational data of each system private. We further study the optimal reserve capacity requirements for pricing in the absence of such collaboration. We showcase numerically that a lack of attention to interdependencies between the two infrastructures can have adverse operational effects.

preprint2016arXiv

RIDS: Robust Identification of Sparse Gene Regulatory Networks from Perturbation Experiments

Reconstructing the causal network in a complex dynamical system plays a crucial role in many applications, from sub-cellular biology to economic systems. Here we focus on inferring gene regulation networks (GRNs) from perturbation or gene deletion experiments. Despite their scientific merit, such perturbation experiments are not often used for such inference due to their costly experimental procedure, requiring significant resources to complete the measurement of every single experiment. To overcome this challenge, we develop the Robust IDentification of Sparse networks (RIDS) method that reconstructs the GRN from a small number of perturbation experiments. Our method uses the gene expression data observed in each experiment and translates that into a steady state condition of the system's nonlinear interaction dynamics. Applying a sparse optimization criterion, we are able to extract the parameters of the underlying weighted network, even from very few experiments. In fact, we demonstrate analytically that, under certain conditions, the GRN can be perfectly reconstructed using $K = Ω(d_{max})$ perturbation experiments, where $d_{max}$ is the maximum in-degree of the GRN, a small value for realistic sparse networks, indicating that RIDS can achieve high performance with a scalable number of experiments. We test our method on both synthetic and experimental data extracted from the DREAM5 network inference challenge. We show that the RIDS achieves superior performance compared to the state-of-the-art methods, while requiring as few as ~60% less experimental data. Moreover, as opposed to almost all competing methods, RIDS allows us to infer the directionality of the GRN links, allowing us to infer empirical GRNs, without relying on the commonly provided list of transcription factors.

preprint2015arXiv

The Social System Identification Problem

The focus of this paper is modeling what we call a Social Radar, i.e. a method to estimate the relative influence between social agents, by sampling their opinions and as they evolve, after injecting in the network stubborn agents. The stubborn agents opinion is not influenced by the peers they seek to sway, and their opinion bias is the known input to the social network system. The novelty is in the model presented to probe a social network and the solution of the associated regression problem. The model allows to map the observed opinion onto system equations that can be used to infer the social graph and the amount of trust that characterizes the links.

preprint2012arXiv

A Decentralized Method for Joint Admission Control and Beamforming in Coordinated Multicell Downlink

In cellular networks, admission control and beamforming optimization are intertwined problems. While beamforming optimization aims at satisfying users' quality-of-service (QoS) requirements or improving the QoS levels, admission control looks at how a subset of users should be selected so that the beamforming optimization problem can yield a reasonable solution in terms of the QoS levels provided. However, in order to simplify the design, the two problems are usually seen as separate problems. This paper considers joint admission control and beamforming (JACoB) under a coordinated multicell MISO downlink scenario. We formulate JACoB as a user number maximization problem, where selected users are guaranteed to receive the QoS levels they requested. The formulated problem is combinatorial and hard, and we derive a convex approximation to the problem. A merit of our convex approximation formulation is that it can be easily decomposed for per-base-station decentralized optimization, namely, via block coordinate decent. The efficacy of the proposed decentralized method is demonstrated by simulation results.

Hoi-To Wai

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic

Detecting Central Nodes from Low-rank Excited Graph Signals via Structured Factor Analysis

Multi-agent Performative Prediction with Greedy Deployment and Consensus Seeking Agents

On the Role of Data Homogeneity in Multi-Agent Non-convex Stochastic Optimization

Robust Distributed Optimization With Randomly Corrupted Gradients

Federated Block Coordinate Descent Scheme for Learning Global and Personalized Models

Identifying First-order Lowpass Graph Signals using Perron Frobenius Theorem

On the Stability of Random Matrix Product with Markovian Noise: Application to Linear Stochastic Approximation and TD Learning

Accelerating Incremental Gradient Optimization with Curvature Information

Block-Randomized Stochastic Proximal Gradient for Low-Rank Tensor Factorization

Distributed Learning in the Non-Convex World: From Batch to Streaming Data, and Beyond

Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise

Hybrid Inexact BCD for Coupled Structured Matrix Factorization in Hyperspectral Super-Resolution

On the Online Frank-Wolfe Algorithms for Convex and Non-convex Optimizations

Optimal Pricing to Manage Electric Vehicles in Coupled Power and Transportation Networks

RIDS: Robust Identification of Sparse Gene Regulatory Networks from Perturbation Experiments

The Social System Identification Problem

A Decentralized Method for Joint Admission Control and Beamforming in Coordinated Multicell Downlink