Source author record

Sanvesh Srivastava

Sanvesh Srivastava appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation math.ST Statistics Theory Distributed, Parallel, and Cluster Computing Machine Learning Methodology

Catalog footprint

What is connected

4works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Distributed Bayesian Varying Coefficient Modeling Using a Gaussian Process Prior

Varying coefficient models (VCMs) are widely used for estimating nonlinear regression functions for functional data. Their Bayesian variants using Gaussian process priors on the functional coefficients, however, have received limited attention in massive data applications, mainly due to the prohibitively slow posterior computations using Markov chain Monte Carlo (MCMC) algorithms. We address this problem using a divide-and-conquer Bayesian approach. We first create a large number of data subsamples with much smaller sizes. Then, we formulate the VCM as a linear mixed-effects model and develop a data augmentation algorithm for obtaining MCMC draws on all the subsets in parallel. Finally, we aggregate the MCMC-based estimates of subset posteriors into a single Aggregated Monte Carlo (AMC) posterior, which is used as a computationally efficient alternative to the true posterior distribution. Theoretically, we derive minimax optimal posterior convergence rates for the AMC posteriors of both the varying coefficients and the mean regression function. We provide quantification on the orders of subset sample sizes and the number of subsets. The empirical results show that the combination schemes that satisfy our theoretical assumptions, including the AMC posterior, have better estimation performance than their main competitors across diverse simulations and in a real data analysis.

preprint2020arXiv

An Algorithm for Distributed Bayesian Inference in Generalized Linear Models

Monte Carlo algorithms, such as Markov chain Monte Carlo (MCMC) and Hamiltonian Monte Carlo (HMC), are routinely used for Bayesian inference in generalized linear models; however, these algorithms are prohibitively slow in massive data settings because they require multiple passes through the full data in every iteration. Addressing this problem, we develop a scalable extension of these algorithms using the divide-and-conquer (D&C) technique that divides the data into a sufficiently large number of subsets, draws parameters in parallel on the subsets using a \textit{powered} likelihood, and produces Monte Carlo draws of the parameter by combining parameter draws obtained from each subset. These combined parameter draws play the role of draws from the original sampling algorithm. Our main contributions are two-fold. First, we demonstrate through diverse simulated and real data analyses that our distributed algorithm is comparable to the current state-of-the-art D&C algorithm in terms of statistical accuracy and computational efficiency. Second, providing theoretical support for our empirical observations, we identify regularity assumptions under which the proposed algorithm leads to asymptotically optimal inference. We illustrate our methodology through normal linear and logistic regressions, where parts of our D&C algorithm are analytically tractable.

preprint2016arXiv

Robust and Scalable Bayes via a Median of Subset Posterior Measures

We propose a novel approach to Bayesian analysis that is provably robust to outliers in the data and often has computational advantages over standard methods. Our technique is based on splitting the data into non-overlapping subgroups, evaluating the posterior distribution given each independent subgroup, and then combining the resulting measures. The main novelty of our approach is the proposed aggregation step, which is based on the evaluation of a median in the space of probability measures equipped with a suitable collection of distances that can be quickly and efficiently evaluated in practice. We present both theoretical and numerical evidence illustrating the improvements achieved by our method.

preprint2016arXiv

Simple, Scalable and Accurate Posterior Interval Estimation

There is a lack of simple and scalable algorithms for uncertainty quantification. Bayesian methods quantify uncertainty through posterior and predictive distributions, but it is difficult to rapidly estimate summaries of these distributions, such as quantiles and intervals. Variational Bayes approximations are widely used, but may badly underestimate posterior covariance. Typically, the focus of Bayesian inference is on point and interval estimates for one-dimensional functionals of interest. In small scale problems, Markov chain Monte Carlo algorithms remain the gold standard, but such algorithms face major problems in scaling up to big data. Various modifications have been proposed based on parallelization and approximations based on subsamples, but such approaches are either highly complex or lack theoretical support and/or good performance outside of narrow settings. We propose a very simple and general posterior interval estimation algorithm, which is based on running Markov chain Monte Carlo in parallel for subsets of the data and averaging quantiles estimated from each subset. We provide strong theoretical guarantees and illustrate performance in several applications.

Sanvesh Srivastava

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Distributed Bayesian Varying Coefficient Modeling Using a Gaussian Process Prior

An Algorithm for Distributed Bayesian Inference in Generalized Linear Models

Robust and Scalable Bayes via a Median of Subset Posterior Measures

Simple, Scalable and Accurate Posterior Interval Estimation