Researcher profile

Anthony Man-Cho So

Anthony Man-Cho So contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2022arXiv

Convergence and Recovery Guarantees of the K-Subspaces Method for Subspace Clustering

The K-subspaces (KSS) method is a generalization of the K-means method for subspace clustering. In this work, we present local convergence analysis and a recovery guarantee for KSS, assuming data are generated by the semi-random union of subspaces model, where $N$ points are randomly sampled from $K \ge 2$ overlapping subspaces. We show that if the initial assignment of the KSS method lies within a neighborhood of a true clustering, it converges at a superlinear rate and finds the correct clustering within $Θ(\log\log N)$ iterations with high probability. Moreover, we propose a thresholding inner-product based spectral method for initialization and prove that it produces a point in this neighborhood. We also present numerical results of the studied method to support our theoretical developments.

preprint2022arXiv

Exact Community Recovery over Signed Graphs

Signed graphs encode similarity and dissimilarity relationships among different entities with positive and negative edges. In this paper, we study the problem of community recovery over signed graphs generated by the signed stochastic block model (SSBM) with two equal-sized communities. Our approach is based on the maximum likelihood estimation (MLE) of the SSBM. Unlike many existing approaches, our formulation reveals that the positive and negative edges of a signed graph should be treated unequally. We then propose a simple two-stage iterative algorithm for solving the regularized MLE. It is shown that in the logarithmic degree regime, the proposed algorithm can exactly recover the underlying communities in nearly-linear time at the information-theoretic limit. Numerical results on both synthetic and real data are reported to validate and complement our theoretical developments and demonstrate the efficacy of the proposed method.

preprint2022arXiv

On the Finite-Time Complexity and Practical Computation of Approximate Stationarity Concepts of Lipschitz Functions

We report a practical finite-time algorithmic scheme to compute approximately stationary points for nonconvex nonsmooth Lipschitz functions. In particular, we are interested in two kinds of approximate stationarity notions for nonconvex nonsmooth problems, i.e., Goldstein approximate stationarity (GAS) and near-approximate stationarity (NAS). For GAS, our scheme removes the unrealistic subgradient selection oracle assumption in (Zhang et al., 2020, Assumption 1) and computes GAS with the same finite-time complexity. For NAS, Davis & Drusvyatskiy (2019) showed that $ρ$-weakly convex functions admit finite-time computation, while Tian & So (2021) provided the matching impossibility results of dimension-free finite-time complexity for first-order methods. Complement to these developments, in this paper, we isolate a new class of functions that could be Clarke irregular (and thus not weakly convex anymore) and show that our new algorithmic scheme can compute NAS points for functions in that class within finite time. To demonstrate the wide applicability of our new theoretical framework, we show that $ρ$-margin SVM, $1$-layer, and $2$-layer ReLU neural networks, all being Clarke irregular, satisfy our new conditions.

preprint2022arXiv

Probabilistic Simplex Component Analysis

This study presents PRISM, a probabilistic simplex component analysis approach to identifying the vertices of a data-circumscribing simplex from data. The problem has a rich variety of applications, the most notable being hyperspectral unmixing in remote sensing and non-negative matrix factorization in machine learning. PRISM uses a simple probabilistic model, namely, uniform simplex data distribution and additive Gaussian noise, and it carries out inference by maximum likelihood. The inference model is sound in the sense that the vertices are provably identifiable under some assumptions, and it suggests that PRISM can be effective in combating noise when the number of data points is large. PRISM has strong, but hidden, relationships with simplex volume minimization, a powerful geometric approach for the same problem. We study these fundamental aspects, and we also consider algorithmic schemes based on importance sampling and variational inference. In particular, the variational inference scheme is shown to resemble a matrix factorization problem with a special regularizer, which draws an interesting connection to the matrix factorization approach. Numerical results are provided to demonstrate the potential of PRISM.

preprint2022arXiv

Riemannian Natural Gradient Methods

This paper studies large-scale optimization problems on Riemannian manifolds whose objective function is a finite sum of negative log-probability losses. Such problems arise in various machine learning and signal processing applications. By introducing the notion of Fisher information matrix in the manifold setting, we propose a novel Riemannian natural gradient method, which can be viewed as a natural extension of the natural gradient method from the Euclidean setting to the manifold setting. We establish the almost-sure global convergence of our proposed method under standard assumptions. Moreover, we show that if the loss function satisfies certain convexity and smoothness conditions and the input-output map satisfies a Riemannian Jacobian stability condition, then our proposed method enjoys a local linear -- or, under the Lipschitz continuity of the Riemannian Jacobian of the input-output map, even quadratic -- rate of convergence. We then prove that the Riemannian Jacobian stability condition will be satisfied by a two-layer fully connected neural network with batch normalization with high probability, provided that the width of the network is sufficiently large. This demonstrates the practical relevance of our convergence rate result. Numerical experiments on applications arising from machine learning demonstrate the advantages of the proposed method over state-of-the-art ones.

preprint2022arXiv

SISAL Revisited

Simplex identification via split augmented Lagrangian (SISAL) is a popularly-used algorithm in blind unmixing of hyperspectral images. Developed by José M. Bioucas-Dias in 2009, the algorithm is fundamentally relevant to tackling simplex-structured matrix factorization, and by extension, non-negative matrix factorization, which have many applications under their umbrellas. In this article, we revisit SISAL and provide new meanings to this quintessential algorithm. The formulation of SISAL was motivated from a geometric perspective, with no noise. We show that SISAL can be explained as an approximation scheme from a probabilistic simplex component analysis framework, which is statistical and is principally more powerful in accommodating the presence of noise. The algorithm for SISAL was designed based on a successive convex approximation method, with a focus on practical utility. It was not known, by analyses, whether the SISAL algorithm has any kind of guarantee of convergence to a stationary point. By establishing associations between the SISAL algorithm and a line-search-based proximal gradient method, we confirm that SISAL can indeed guarantee convergence to a stationary point. Our re-explanation of SISAL also reveals new formulations and algorithms. The performance of these new possibilities is demonstrated by numerical experiments.

preprint2022arXiv

Variance-Reduced Stochastic Quasi-Newton Methods for Decentralized Learning: Part II

In Part I of this work, we have proposed a general framework of decentralized stochastic quasi-Newton methods, which converge linearly to the optimal solution under the assumption that the local Hessian inverse approximations have bounded positive eigenvalues. In Part II, we specify two fully decentralized stochastic quasi-Newton methods, damped regularized limited-memory DFP (Davidon-Fletcher-Powell) and damped limited-memory BFGS (Broyden-Fletcher-Goldfarb-Shanno), to locally construct such Hessian inverse approximations without extra sampling or communication. Both of the methods use a fixed moving window of $M$ past local gradient approximations and local decision variables to adaptively construct positive definite Hessian inverse approximations with bounded eigenvalues, satisfying the assumption in Part I for the linear convergence. For the proposed damped regularized limited-memory DFP, a regularization term is added to improve the performance. For the proposed damped limited-memory BFGS, a two-loop recursion is applied, leading to low storage and computation complexity. Numerical experiments demonstrate that the proposed quasi-Newton methods are much faster than the existing decentralized stochastic first-order algorithms.

preprint2021arXiv

Local Strong Convexity of Source Localization and Error Bound for Target Tracking under Time-of-Arrival Measurements

In this paper, we consider a time-varying optimization approach to the problem of tracking a moving target using noisy time-of-arrival (TOA) measurements. Specifically, we formulate the problem as that of sequential TOA-based source localization and apply online gradient descent (OGD) to it to generate the position estimates of the target. To analyze the tracking performance of OGD, we first revisit the classic least-squares formulation of the (static) TOA-based source localization problem and elucidate its estimation and geometric properties. In particular, under standard assumptions on the TOA measurement model, we establish a bound on the distance between an optimal solution to the least-squares formulation and the true target position. Using this bound, we show that the loss function in the formulation, albeit non-convex in general, is locally strongly convex at its global minima. To the best of our knowledge, these results are new and can be of independent interest. By combining them with existing techniques from online strongly convex optimization, we then establish the first non-trivial bound on the cumulative target tracking error of OGD. Our numerical results corroborate the theoretical findings and show that OGD can effectively track the target at different noise levels.

preprint2021arXiv

Quartic Perturbation-based Outage-constrained Robust Design in Two-hop One-way Relay Networks

In this work, we study a classic robust design problem in two-hop one-way relay system. We are particularly interested in the scenario where channel uncertainty exists in both the transmitter-to-relay and relay-to-receiver links. By considering the problem design that minimizes the average amplify-and-forward power budget at the relay side while satisfying SNR outage requirements, an outage-constrained robust design problem involving quartic perturbations is formulated to guarantee the robustness during transmission. This problem is in general difficult as it involves constraints on the tail probability of a high-order polynomial. Herein, we resort to moment inequality and Bernstein-type inequality to tackle this problem, which provide convex restrictions, or safe approximations, of the original design. We also analyze the relative tightness of the two safe approximations for a quadratic perturbation-based outage constrained problem. Our analysis shows that the Bernstein-type inequality approach is less conservative than the moment inequality approach when the outage rate is within some prescribed regime. To our best knowledge, this is the first provable tightness result for these two safe approximations. Our numerical simulations verify the superiority of the robust design and corroborate the tightness results.

preprint2020arXiv

A Newton Tracking Algorithm with Exact Linear Convergence Rate for Decentralized Consensus Optimization

This paper considers the decentralized consensus optimization problem defined over a network where each node holds a second-order differentiable local objective function. Our goal is to minimize the summation of local objective functions and find the exact optimal solution using only local computation and neighboring communication. We propose a novel Newton tracking algorithm, where each node updates its local variable along a local Newton direction modified with neighboring and historical information. We investigate the connections between the proposed Newton tracking algorithm and several existing methods, including gradient tracking and second-order algorithms. Under the strong convexity assumption, we prove that it converges to the exact optimal solution at a linear rate. Numerical experiments demonstrate the efficacy of Newton tracking and validate the theoretical findings.

preprint2020arXiv

Understanding Notions of Stationarity in Non-Smooth Optimization

Many contemporary applications in signal processing and machine learning give rise to structured non-convex non-smooth optimization problems that can often be tackled by simple iterative methods quite effectively. One of the keys to understanding such a phenomenon---and, in fact, one of the very difficult conundrums even for experts---lie in the study of "stationary points" of the problem in question. Unlike smooth optimization, for which the definition of a stationary point is rather standard, there is a myriad of definitions of stationarity in non-smooth optimization. In this article, we give an introduction to different stationarity concepts for several important classes of non-convex non-smooth functions and discuss the geometric interpretations and further clarify the relationship among these different concepts. We then demonstrate the relevance of these constructions in some representative applications and how they could affect the performance of iterative methods for tackling these applications.

preprint2020arXiv

Voting-Based Multi-Agent Reinforcement Learning for Intelligent IoT

The recent success of single-agent reinforcement learning (RL) in Internet of things (IoT) systems motivates the study of multi-agent reinforcement learning (MARL), which is more challenging but more useful in large-scale IoT. In this paper, we consider a voting-based MARL problem, in which the agents vote to make group decisions and the goal is to maximize the globally averaged returns. To this end, we formulate the MARL problem based on the linear programming form of the policy optimization problem and propose a distributed primal-dual algorithm to obtain the optimal solution. We also propose a voting mechanism through which the distributed learning achieves the same sublinear convergence rate as centralized learning. In other words, the distributed decision making does not slow down the process of achieving global consensus on optimality. Lastly, we verify the convergence of our proposed algorithm with numerical simulations and conduct case studies in practical multi-agent IoT systems.