Source author record

Shusen Wang

Shusen Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Numerical Analysis Artificial Intelligence Discrete Mathematics math.NA math.OC Mathematical Software Multiagent Systems Social and Information Networks

Catalog footprint

What is connected

14works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Federated Reinforcement Learning with Environment Heterogeneity

We study a Federated Reinforcement Learning (FedRL) problem in which $n$ agents collaboratively learn a single policy without sharing the trajectories they collected during agent-environment interaction. We stress the constraint of environment heterogeneity, which means $n$ environments corresponding to these $n$ agents have different state transitions. To obtain a value function or a policy function which optimizes the overall performance in all environments, we propose two federated RL algorithms, \texttt{QAvg} and \texttt{PAvg}. We theoretically prove that these algorithms converge to suboptimal solutions, while such suboptimality depends on how heterogeneous these $n$ environments are. Moreover, we propose a heuristic that achieves personalization by embedding the $n$ environments into $n$ vectors. The personalization heuristic not only improves the training but also allows for better generalization to new environments.

preprint2022arXiv

Transferable Multi-Agent Reinforcement Learning with Dynamic Participating Agents

We study multi-agent reinforcement learning (MARL) with centralized training and decentralized execution. During the training, new agents may join, and existing agents may unexpectedly leave the training. In such situations, a standard deep MARL model must be trained again from scratch, which is very time-consuming. To tackle this problem, we propose a special network architecture with a few-shot learning algorithm that allows the number of agents to vary during centralized training. In particular, when a new agent joins the centralized training, our few-shot learning algorithm trains its policy network and value network using a small number of samples; when an agent leaves the training, the training process of the remaining agents is not affected. Our experiments show that using the proposed network architecture and algorithm, model adaptation when new agents join can be 100+ times faster than the baseline. Our work is applicable to any setting, including cooperative, competitive, and mixed.

preprint2020arXiv

On the Convergence of FedAvg on Non-IID Data

Federated learning enables a large amount of edge computing devices to jointly learn a model without data sharing. As a leading algorithm in this setting, Federated Averaging (\texttt{FedAvg}) runs Stochastic Gradient Descent (SGD) in parallel on a small subset of the total devices and averages the sequences only once in a while. Despite its simplicity, it lacks theoretical guarantees under realistic settings. In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs. Importantly, our bound demonstrates a trade-off between communication-efficiency and convergence rate. As user devices may be disconnected from the server, we relax the assumption of full device participation to partial device participation and study different averaging schemes; low device participation rate can be achieved without severely slowing down the learning. Our results indicate that heterogeneity of data slows down the convergence, which matches empirical observations. Furthermore, we provide a necessary condition for \texttt{FedAvg} on non-iid data: the learning rate $η$ must decay, even if full-gradient is used; otherwise, the solution will be $Ω(η)$ away from the optimal.

preprint2019arXiv

Graph Message Passing with Cross-location Attentions for Long-term ILI Prediction

Forecasting influenza-like illness (ILI) is of prime importance to epidemiologists and health-care providers. Early prediction of epidemic outbreaks plays a pivotal role in disease intervention and control. Most existing work has either limited long-term prediction performance or lacks a comprehensive ability to capture spatio-temporal dependencies in data. Accurate and early disease forecasting models would markedly improve both epidemic prevention and managing the onset of an epidemic. In this paper, we design a cross-location attention based graph neural network (Cola-GNN) for learning time series embeddings and location aware attentions. We propose a graph message passing framework to combine learned feature embeddings and an attention matrix to model disease propagation over time. We compare the proposed method with state-of-the-art statistical approaches and deep learning models on real-world epidemic-related datasets from United States and Japan. The proposed method shows strong predictive performance and leads to interpretable results for long-term epidemic predictions.

preprint2016arXiv

SPSD Matrix Approximation vis Column Selection: Theories, Algorithms, and Extensions

Symmetric positive semidefinite (SPSD) matrix approximation is an important problem with applications in kernel methods. However, existing SPSD matrix approximation methods such as the Nyström method only have weak error bounds. In this paper we conduct in-depth studies of an SPSD matrix approximation model and establish strong relative-error bounds. We call it the prototype model for it has more efficient and effective extensions, and some of its extensions have high scalability. Though the prototype model itself is not suitable for large-scale data, it is still useful to study its properties, on which the analysis of its extensions relies. This paper offers novel theoretical analysis, efficient algorithms, and a highly accurate extension. First, we establish a lower error bound for the prototype model and improve the error bound of an existing column selection algorithm to match the lower bound. In this way, we obtain the first optimal column selection algorithm for the prototype model. We also prove that the prototype model is exact under certain conditions. Second, we develop a simple column selection algorithm with a provable error bound. Third, we propose a so-called spectral shifting model to make the approximation more accurate when the eigenvalues of the matrix decay slowly, and the improvement is theoretically quantified. The spectral shifting method can also be applied to improve other SPSD matrix approximation models.

preprint2016arXiv

Towards More Efficient SPSD Matrix Approximation and CUR Matrix Decomposition

Symmetric positive semi-definite (SPSD) matrix approximation methods have been extensively used to speed up large-scale eigenvalue computation and kernel learning methods. The standard sketch based method, which we call the prototype model, produces relatively accurate approximations, but is inefficient on large square matrices. The Nyström method is highly efficient, but can only achieve low accuracy. In this paper we propose a novel model that we call the {\it fast SPSD matrix approximation model}. The fast model is nearly as efficient as the Nyström method and as accurate as the prototype model. We show that the fast model can potentially solve eigenvalue problems and kernel learning problems in linear time with respect to the matrix size $n$ to achieve $1+ε$ relative-error, whereas both the prototype model and the Nyström method cost at least quadratic time to attain comparable error bound. Empirical comparisons among the prototype model, the Nyström method, and our fast model demonstrate the superiority of the fast model. We also contribute new understandings of the Nyström method. The Nyström method is a special instance of our fast model and is approximation to the prototype model. Our technique can be straightforwardly applied to make the CUR matrix decomposition more efficiently computed without much affecting the accuracy.

preprint2015arXiv

A Practical Guide to Randomized Matrix Computations with MATLAB Implementations

Matrix operations such as matrix inversion, eigenvalue decomposition, singular value decomposition are ubiquitous in real-world applications. Unfortunately, many of these matrix operations so time and memory expensive that they are prohibitive when the scale of data is large. In real-world applications, since the data themselves are noisy, machine-precision matrix operations are not necessary at all, and one can sacrifice a reasonable amount of accuracy for computational efficiency. In recent years, a bunch of randomized algorithms have been devised to make matrix computations more scalable. Mahoney (2011) and Woodruff (2014) have written excellent but very technical reviews of the randomized algorithms. Differently, the focus of this manuscript is on intuition, algorithm derivation, and implementation. This manuscript should be accessible to people with knowledge in elementary matrix algebra but unfamiliar with randomized matrix computations. The algorithms introduced in this manuscript are all summarized in a user-friendly way, and they can be implemented in lines of MATLAB code. The readers can easily follow the implementations even if they do not understand the maths and algorithms.

preprint2015arXiv

Adjusting Leverage Scores by Row Weighting: A Practical Approach to Coherent Matrix Completion

Low-rank matrix completion is an important problem with extensive real-world applications. When observations are uniformly sampled from the underlying matrix entries, existing methods all require the matrix to be incoherent. This paper provides the first working method for coherent matrix completion under the standard uniform sampling model. Our approach is based on the weighted nuclear norm minimization idea proposed in several recent work, and our key contribution is a practical method to compute the weighting matrices so that the leverage scores become more uniform after weighting. Under suitable conditions, we are able to derive theoretical results, showing the effectiveness of our approach. Experiments on synthetic data show that our approach recovers highly coherent matrices with high precision, whereas the standard unweighted method fails even on noise-free data.

preprint2015arXiv

Improved Analyses of the Randomized Power Method and Block Lanczos Method

The power method and block Lanczos method are popular numerical algorithms for computing the truncated singular value decomposition (SVD) and eigenvalue decomposition problems. Especially in the literature of randomized numerical linear algebra, the power method is widely applied to improve the quality of randomized sketching, and relative-error bounds have been well established. Recently, Musco & Musco (2015) proposed a block Krylov subspace method that fully exploits the intermediate results of the power iteration to accelerate convergence. They showed spectral gap-independent bounds which are stronger than the power method by order-of-magnitude. This paper offers novel error analysis techniques and significantly improves the bounds of both the randomized power method and the block Lanczos method. This paper also establishes the first gap-independent bound for the warm-start block Lanczos method.

preprint2014arXiv

Efficient Algorithms and Error Analysis for the Modified Nystrom Method

Many kernel methods suffer from high time and space complexities and are thus prohibitive in big-data applications. To tackle the computational challenge, the Nyström method has been extensively used to reduce time and space complexities by sacrificing some accuracy. The Nyström method speedups computation by constructing an approximation of the kernel matrix using only a few columns of the matrix. Recently, a variant of the Nyström method called the modified Nyström method has demonstrated significant improvement over the standard Nyström method in approximation accuracy, both theoretically and empirically. In this paper, we propose two algorithms that make the modified Nyström method practical. First, we devise a simple column selection algorithm with a provable error bound. Our algorithm is more efficient and easier to implement than and nearly as accurate as the state-of-the-art algorithm. Second, with the selected columns at hand, we propose an algorithm that computes the approximation in lower time complexity than the approach in the previous work. Furthermore, we prove that the modified Nyström method is exact under certain conditions, and we establish a lower error bound for the modified Nyström method.

preprint2014arXiv

Sharpened Error Bounds for Random Sampling Based $\ell_2$ Regression

Given a data matrix $X \in R^{n\times d}$ and a response vector $y \in R^{n}$, suppose $n>d$, it costs $O(n d^2)$ time and $O(n d)$ space to solve the least squares regression (LSR) problem. When $n$ and $d$ are both large, exactly solving the LSR problem is very expensive. When $n \gg d$, one feasible approach to speeding up LSR is to randomly embed $y$ and all columns of $X$ into a smaller subspace $R^c$; the induced LSR problem has the same number of columns but much fewer number of rows, and it can be solved in $O(c d^2)$ time and $O(c d)$ space. We discuss in this paper two random sampling based methods for solving LSR more efficiently. Previous work showed that the leverage scores based sampling based LSR achieves $1+ε$ accuracy when $c \geq O(d ε^{-2} \log d)$. In this paper we sharpen this error bound, showing that $c = O(d \log d + d ε^{-1})$ is enough for achieving $1+ε$ accuracy. We also show that when $c \geq O(μd ε^{-2} \log d)$, the uniform sampling based LSR attains a $2+ε$ bound with positive probability.

preprint2013arXiv

Improving CUR Matrix Decomposition and the Nyström Approximation via Adaptive Sampling

The CUR matrix decomposition and the Nyström approximation are two important low-rank matrix approximation techniques. The Nyström method approximates a symmetric positive semidefinite matrix in terms of a small number of its columns, while CUR approximates an arbitrary data matrix by a small number of its columns and rows. Thus, CUR decomposition can be regarded as an extension of the Nyström approximation. In this paper we establish a more general error bound for the adaptive column/row sampling algorithm, based on which we propose more accurate CUR and Nyström algorithms with expected relative-error bounds. The proposed CUR and Nyström algorithms also have low time complexity and can avoid maintaining the whole data matrix in RAM. In addition, we give theoretical analysis for the lower error bounds of the standard Nyström method and the ensemble Nyström method. The main theoretical results established in this paper are novel, and our analysis makes no special assumption on the data matrices.

preprint2012arXiv

A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound

The CUR matrix decomposition is an important extension of Nyström approximation to a general matrix. It approximates any data matrix in terms of a small number of its columns and rows. In this paper we propose a novel randomized CUR algorithm with an expected relative-error bound. The proposed algorithm has the advantages over the existing relative-error CUR algorithms that it possesses tighter theoretical bound and lower time complexity, and that it can avoid maintaining the whole data matrix in main memory. Finally, experiments on several real-world datasets demonstrate significant improvement over the existing relative-error algorithms.

preprint2012arXiv

EP-GIG Priors and Applications in Bayesian Sparse Learning

In this paper we propose a novel framework for the construction of sparsity-inducing priors. In particular, we define such priors as a mixture of exponential power distributions with a generalized inverse Gaussian density (EP-GIG). EP-GIG is a variant of generalized hyperbolic distributions, and the special cases include Gaussian scale mixtures and Laplace scale mixtures. Furthermore, Laplace scale mixtures can subserve a Bayesian framework for sparse learning with nonconvex penalization. The densities of EP-GIG can be explicitly expressed. Moreover, the corresponding posterior distribution also follows a generalized inverse Gaussian distribution. These properties lead us to EM algorithms for Bayesian sparse learning. We show that these algorithms bear an interesting resemblance to iteratively re-weighted $\ell_2$ or $\ell_1$ methods. In addition, we present two extensions for grouped variable selection and logistic regression.

Shusen Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Federated Reinforcement Learning with Environment Heterogeneity

Transferable Multi-Agent Reinforcement Learning with Dynamic Participating Agents

On the Convergence of FedAvg on Non-IID Data

Graph Message Passing with Cross-location Attentions for Long-term ILI Prediction

SPSD Matrix Approximation vis Column Selection: Theories, Algorithms, and Extensions

Towards More Efficient SPSD Matrix Approximation and CUR Matrix Decomposition

A Practical Guide to Randomized Matrix Computations with MATLAB Implementations

Adjusting Leverage Scores by Row Weighting: A Practical Approach to Coherent Matrix Completion

Improved Analyses of the Randomized Power Method and Block Lanczos Method

Efficient Algorithms and Error Analysis for the Modified Nystrom Method

Sharpened Error Bounds for Random Sampling Based $\ell_2$ Regression

Improving CUR Matrix Decomposition and the Nyström Approximation via Adaptive Sampling

A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound

EP-GIG Priors and Applications in Bayesian Sparse Learning