Source author record

Lili Su

Lili Su appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Machine Learning math.OC Information Theory math.IT Artificial Intelligence Data Structures and Algorithms math.NA Networking and Internet Architecture Numerical Analysis physics.comp-ph quant-ph

Catalog footprint

What is connected

13works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Collaborative Yet Personalized Policy Training: Single-Timescale Federated Actor-Critic

Despite the popularity of the actor-critic method and the practical needs of collaborative policy training, existing works typically either overlook environmental heterogeneity or give up personalization altogether by training a single shared policy across all agents. We consider a federated actor-critic framework in which agents share a common linear subspace representation while maintaining personalized local policy components, and agents iteratively estimate the common subspace, local critic heads, and local policies (i.e., actors). Under canonical single-timescale updates with Markovian sampling, we establish finite-time convergence via a novel joint linear approximation framework. Specifically, we show that the critic error converges to zero at the rate of $\tilde{\mathcal{O}}(1/((1-γ)^4\sqrt{TK}))$, and the policy gradient norm converges to zero at the rate of $\tilde{\mathcal{O}}(1/((1-γ)^6\sqrt{TK}))$, where $T$ is the number of rounds, $K$ is the number of agents, and $γ\in (0,1)$ is the discount factor. These results demonstrate linear speedup with respect to the number of agents $K$, despite heterogeneous Markovian trajectories under distinct transition kernels and coupled learning dynamics. To address these challenges, we develop a new perturbation analysis for the projected subspace updates and QR decomposition steps, together with conditional mixing arguments for heterogeneous Markovian noise. Furthermore, to handle the additional complications induced by policy updates and temporal dependence, we establish fine-grained characterizations of the discrepancies between function evaluations under Markovian sampling and under temporally frozen policies. Experiments instantiate the framework within PPO on federated \texttt{Hopper-v5} action-map heterogeneity, showing gains over Single PPO and FedAvg PPO and downstream transfer from the learned shared trunk.

preprint2023arXiv

Nonlocalization of singular potentials in quantum dynamics

Nonlocal modeling has drawn more and more attention and becomes steadily more powerful in scientific computing. In this paper, we demonstrate the superiority of a first-principle nonlocal model -- Wigner function -- in treating singular potentials which are often used to model the interaction between point charges in quantum science. The nonlocal nature of the Wigner equation is fully exploited to convert the singular potential into the Wigner kernel with weak or even no singularity, and thus highly accurate numerical approximations are achievable, which are hardly designed when the singular potential is taken into account in the local Schrödinger equation. The Dirac delta function, the logarithmic, and the inverse power potentials are considered. Numerically converged Wigner functions under all these singular potentials are obtained with an operator splitting spectral method, and display many interesting quantum behaviors as well.

preprint2022arXiv

A Non-parametric View of FedAvg and FedProx: Beyond Stationary Points

Federated Learning (FL) is a promising decentralized learning framework and has great potentials in privacy preservation and in lowering the computation load at the cloud. Recent work showed that FedAvg and FedProx - the two widely-adopted FL algorithms - fail to reach the stationary points of the global optimization objective even for homogeneous linear regression problems. Further, it is concerned that the common model learned might not generalize well locally at all in the presence of heterogeneity. In this paper, we analyze the convergence and statistical efficiency of FedAvg and FedProx, addressing the above two concerns. Our analysis is based on the standard non-parametric regression in a reproducing kernel Hilbert space (RKHS), and allows for heterogeneous local data distributions and unbalanced local datasets. We prove that the estimation errors, measured in either the empirical norm or the RKHS norm, decay with a rate of 1/t in general and exponentially for finite-rank kernels. In certain heterogeneous settings, these upper bounds also imply that both FedAvg and FedProx achieve the optimal error rate. To further analytically quantify the impact of the heterogeneity at each client, we propose and characterize a novel notion-federation gain, defined as the reduction of the estimation error for a client to join the FL. We discover that when the data heterogeneity is moderate, a client with limited local data can benefit from a common model with a large federation gain. Numerical experiments further corroborate our theoretical findings.

preprint2022arXiv

Experimental Design Networks: A Paradigm for Serving Heterogeneous Learners under Networking Constraints

Significant advances in edge computing capabilities enable learning to occur at geographically diverse locations. In general, the training data needed in those learning tasks are not only heterogeneous but also not fully generated locally. In this paper, we propose an experimental design network paradigm, wherein learner nodes train possibly different Bayesian linear regression models via consuming data streams generated by data source nodes over a network. We formulate this problem as a social welfare optimization problem in which the global objective is defined as the sum of experimental design objectives of individual learners, and the decision variables are the data transmission strategies subject to network constraints. We first show that, assuming Poisson data streams, the global objective is a continuous DR-submodular function. We then propose a Frank-Wolfe type algorithm that outputs a solution within a 1-1/e factor from the optimal. Our algorithm contains a novel gradient estimation component which is carefully designed based on Poisson tail bounds and sampling. Finally, we complement our theoretical findings through extensive experiments. Our numerical evaluation shows that the proposed algorithm outperforms several baseline algorithms both in maximizing the global objective and in the quality of the trained models.

preprint2022arXiv

Global Convergence of Federated Learning for Mixed Regression

This paper studies the problem of model training under Federated Learning when clients exhibit cluster structure. We contextualize this problem in mixed regression, where each client has limited local data generated from one of $k$ unknown regression models. We design an algorithm that achieves global convergence from any initialization, and works even when local data volume is highly unbalanced -- there could exist clients that contain $O(1)$ data points only. Our algorithm first runs moment descent on a few anchor clients (each with $\tildeΩ(k)$ data points) to obtain coarse model estimates. Then each client alternately estimates its cluster labels and refines the model estimates based on FedAvg or FedProx. A key innovation in our analysis is a uniform estimate on the clustering errors, which we prove by bounding the VC dimension of general polynomial concept classes based on the theory of algebraic geometry.

preprint2016arXiv

Defending Non-Bayesian Learning against Adversarial Attacks

This paper addresses the problem of non-Bayesian learning over multi-agent networks, where agents repeatedly collect partially informative observations about an unknown state of the world, and try to collaboratively learn the true state. We focus on the impact of the adversarial agents on the performance of consensus-based non-Bayesian learning, where non-faulty agents combine local learning updates with consensus primitives. In particular, we consider the scenario where an unknown subset of agents suffer Byzantine faults -- agents suffering Byzantine faults behave arbitrarily. Two different learning rules are proposed.

preprint2015arXiv

Byzantine Multi-Agent Optimization: Part I

We study Byzantine fault-tolerant distributed optimization of a sum of convex (cost) functions with real-valued scalar input/ouput. In particular, the goal is to optimize a global cost function $\frac{1}{|\mathcal{N}|}\sum_{i\in \mathcal{N}} h_i(x)$, where $\mathcal{N}$ is the set of non-faulty agents, and $h_i(x)$ is agent $i$'s local cost function, which is initially known only to agent $i$. In general, when some of the agents may be Byzantine faulty, the above goal is unachievable, because the identity of the faulty agents is not necessarily known to the non-faulty agents, and the faulty agents may behave arbitrarily. Since the above global cost function cannot be optimized exactly in presence of Byzantine agents, we define a weaker version of the problem. The goal for the weaker problem is to generate an output that is an optimum of a function formed as a convex combination of local cost functions of the non-faulty agents. More precisely, for some choice of weights $α_i$ for $i\in \mathcal{N}$ such that $α_i\geq 0$ and $\sum_{i\in \mathcal{N}}α_i=1$, the output must be an optimum of the cost function $\sum_{i\in \mathcal{N}} α_ih_i(x)$. Ideally, we would like $α_i=\frac{1}{|\mathcal{N}|}$ for all $i\in \mathcal{N}$ -- however, this cannot be guaranteed due to the presence of faulty agents. In fact, we show that the maximum achievable number of nonzero weights ($α_i$'s) is $|\mathcal{N}|-f$, where $f$ is the upper bound on the number of Byzantine agents. In addition, we present algorithms that ensure that at least $|\mathcal{N}|-f$ agents have weights that are bounded away from 0. We also propose a low-complexity suboptimal algorithm, which ensures that at least $\lceil \frac{n}{2}\rceil-ϕ$ agents have weights that are bounded away from 0, where $n$ is the total number of agents, and $ϕ$ ($ϕ\le f$) is the actual number of Byzantine agents.

preprint2015arXiv

Byzantine Multi-Agent Optimization: Part II

In Part I of this report, we introduced a Byzantine fault-tolerant distributed optimization problem whose goal is to optimize a sum of convex (cost) functions with real-valued scalar input/ouput. In this second part, we introduce a condition-based variant of the original problem over arbitrary directed graphs. Specifically, for a given collection of $k$ input functions $h_1(x), \ldots, h_k(x)$, we consider the scenario when the local cost function stored at agent $j$, denoted by $g_j(x)$, is formed as a convex combination of the $k$ input functions $h_1(x), \ldots, h_k(x)$. The goal of this condition-based problem is to generate an output that is an optimum of $\frac{1}{k}\sum_{i=1}^k h_i(x)$. Depending on the availability of side information at each agent, two slightly different variants are considered. We show that for a given graph, the problem can indeed be solved despite the presence of faulty agents. In particular, even in the absence of side information at each agent, when adequate redundancy is available in the optima of input functions, a distributed algorithm is proposed in which each agent carries minimal state across iterations.

preprint2015arXiv

Fault-Tolerant Distributed Optimization (Part IV): Constrained Optimization with Arbitrary Directed Networks

We study the problem of constrained distributed optimization in multi-agent networks when some of the computing agents may be faulty. In this problem, the system goal is to have all the non-faulty agents collectively minimize a global objective given by weighted average of local cost functions, each of which is initially known to a non-faulty agent only. In particular, we are interested in the scenario when the computing agents are connected by an arbitrary directed communication network, some of the agents may suffer from crash faults or Byzantine faults, and the estimate of each agent is restricted to lie in a common constraint set. This problem finds its applications in social computing and distributed large-scale machine learning. The fault-tolerant multi-agent optimization problem was first formulated by Su and Vaidya, and is solved when the local functions are defined over the whole real line, and the networks are fully-connected. In this report, we consider arbitrary directed communication networks and focus on the scenario where, local estimates at the non-faulty agents are constrained, and only local communication and minimal memory carried across iterations are allowed. In particular, we generalize our previous results on fully-connected networks and unconstrained optimization to arbitrary directed networks and constrained optimization. As a byproduct, we provide a matrix representation for iterative approximate crash consensus. The matrix representation allows us to characterize the convergence rate for crash iterative consensus.

preprint2015arXiv

Fault-Tolerant Multi-Agent Optimization: Part III

We study fault-tolerant distributed optimization of a sum of convex (cost) functions with real-valued scalar input/output in the presence of crash faults or Byzantine faults. In particular, the goal is to optimize a global cost function $\frac{1}{n}\sum_{i\in \mathcal{V}} h_i(x)$, where $\mathcal{V}=\{1, \ldots, n\}$ is the collection of agents, and $h_i(x)$ is agent $i$'s local cost function, which is initially known only to agent $i$. Since the above global cost function cannot be optimized exactly in presence of crash faults or Byzantine faults, we define two weaker versions of the problem for crash faults and Byzantine faults, respectively. When some agents may crash, the goal for the weaker problem is to generate an output that is an optimum of a function formed as $$C(\sum_{i\in \mathcal{N}} h_i(x)+\sum_{i\in \mathcal{F}} α_i h_i(x)),$$ where $\mathcal{N}$ is the set of non-faulty agents, $\mathcal{F}$ is the set of faulty agents (crashed agents), $0\le α_i\le 1$ for each $i\in \mathcal{F}$ and $C$ is a normalization constant such that $C(|\mathcal{N}|+\sum_{i\in \mathcal{F}} α_i)=1$. We present an iterative algorithm in which each agent only needs to perform local computation, and send one message per iteration. When some agents may be Byzantine, the system cannot take full advantage of the data kept by non-faulty agents. The goal for the associated weaker problem is to generate an output that is an optimum of a function formed as $$\sum_{i\in \mathcal{N}}α_i h_i(x),$$ such that $α_i\geq 0$ for each $i\in \mathcal{N}$ and $\sum_{i\in \mathcal{N}}α_i=1$. We present an iterative algorithm, where only local computation is needed and only one message per agent is sent in each iteration, that ensures that at least $|\mathcal{N}|-f$ agents have weights ($α_i$'s) that are lower bounded by $\frac{1}{2(|\mathcal{N}|-f)}$.

preprint2015arXiv

Reaching Approximate Byzantine Consensus with Multi-hop Communication

We address the problem of reaching consensus in the presence of Byzantine faults. In particular, we are interested in investigating the impact of messages relay on the network connectivity for a correct iterative approximate Byzantine consensus algorithm to exist. The network is modeled by a simple directed graph. We assume a node can send messages to another node that is up to $l$ hops away via forwarding by the intermediate nodes on the routes, where $l\in \mathbb{N}$ is a natural number. We characterize the necessary and sufficient topological conditions on the network structure. The tight conditions we found are consistent with the tight conditions identified for $l=1$, where only local communication is allowed, and are strictly weaker for $l>1$. Let $l^*$ denote the length of a longest path in the given network. For $l\ge l^*$ and undirected graphs, our conditions hold if and only if $n\ge 3f+1$ and the node-connectivity of the given graph is at least $2f+1$ , where $n$ is the total number of nodes and $f$ is the maximal number of Byzantine nodes; and for $l\ge l^*$ and directed graphs, our conditions is equivalent to the tight condition found for exact Byzantine consensus. Our sufficiency is shown by constructing a correct algorithm, wherein the trim function is constructed based on investigating a newly introduced minimal messages cover property. The trim function proposed also works over multi-graphs.

preprint2014arXiv

Computing Similarity Distances Between Rankings

We address the problem of computing distances between rankings that take into account similarities between candidates. The need for evaluating such distances is governed by applications as diverse as rank aggregation, bioinformatics, social sciences and data storage. The problem may be summarized as follows: Given two rankings and a positive cost function on transpositions that depends on the similarity of the candidates involved, find a smallest cost sequence of transpositions that converts one ranking into another. Our focus is on costs that may be described via special metric-tree structures and on complete rankings modeled as permutations. The presented results include a quadratic-time algorithm for finding a minimum cost decomposition for simple cycles, and a quadratic-time, $4/3$-approximation algorithm for permutations that contain multiple cycles. The proposed methods rely on investigating a newly introduced balancing property of cycles embedded in trees, cycle-merging methods, and shortest path optimization techniques.

preprint2014arXiv

Synchronizing Rankings via Interactive Communication

We consider the problem of exact synchronization of two rankings at remote locations connected by a two-way channel. Such synchronization problems arise when items in the data are distinguishable, as is the case for playlists, tasklists, crowdvotes and recommender systems rankings. Our model accounts for different constraints on the communication throughput of the forward and feedback links, resulting in different anchoring, syndrome and checksum computation strategies. Information editing is assumed of the form of deletions, insertions, block deletions/insertions, translocations and transpositions. The protocols developed under the given model are order-optimal with respect to genie aided lower bounds.

Lili Su

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

Collaborative Yet Personalized Policy Training: Single-Timescale Federated Actor-Critic

Nonlocalization of singular potentials in quantum dynamics

A Non-parametric View of FedAvg and FedProx: Beyond Stationary Points

Experimental Design Networks: A Paradigm for Serving Heterogeneous Learners under Networking Constraints

Global Convergence of Federated Learning for Mixed Regression

Defending Non-Bayesian Learning against Adversarial Attacks

Byzantine Multi-Agent Optimization: Part I

Byzantine Multi-Agent Optimization: Part II

Fault-Tolerant Distributed Optimization (Part IV): Constrained Optimization with Arbitrary Directed Networks

Fault-Tolerant Multi-Agent Optimization: Part III

Reaching Approximate Byzantine Consensus with Multi-hop Communication

Computing Similarity Distances Between Rankings

Synchronizing Rankings via Interactive Communication