Source author record

Natesh S. Pillai

Natesh S. Pillai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.PR math.ST Statistics Theory Computation Methodology Applications cond-mat.soft Data Structures and Algorithms math-ph math.DS math.MP math.NA Quantitative Methods

Catalog footprint

What is connected

38works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Fast and memory-optimal dimension reduction using Kac's walk

In this work, we analyze dimension reduction algorithms based on the Kac walk and discrete variants. (1) For $n$ points in $\mathbb{R}^{d}$, we design an optimal Johnson-Lindenstrauss (JL) transform based on the Kac walk which can be applied to any vector in time $O(d\log{d})$ for essentially the same restriction on $n$ as in the best-known transforms due to Ailon and Liberty [SODA, 2008], and Bamberger and Krahmer [arXiv, 2017]. Our algorithm is memory-optimal, and outperforms existing algorithms in regimes when $n$ is sufficiently large and the distortion parameter is sufficiently small. In particular, this confirms a conjecture of Ailon and Chazelle [STOC, 2006] in a stronger form. (2) The same construction gives a simple transform with optimal Restricted Isometry Property (RIP) which can be applied in time $O(d\log{d})$ for essentially the same range of sparsity as in the best-known such transform due to Ailon and Rauhut [Discrete Comput. Geom., 2014]. (3) We show that by fixing the angle in the Kac walk to be $π/4$ throughout, one obtains optimal JL and RIP transforms with almost the same running time, thereby confirming -- up to a $\log\log{d}$ factor -- a conjecture of Avron, Maymounkov, and Toledo [SIAM J. Sci. Comput., 2010]. Our moment-based analysis of this modification of the Kac walk may also be of independent interest.

preprint2020arXiv

Universality and least singular values of random matrix products: a simplified approach

In this note, we show how to provide sharp control on the least singular value of a certain translated linearization matrix arising in the study of the local universality of products of independent random matrices. This problem was first considered in a recent work of Koppel, O'Rourke, and Vu, and compared to their work, our proof is substantially simpler and established in much greater generality . In particular, we only assume that the entries of the ensemble are centered, and have second and fourth moments uniformly bounded away from $0$ and infinity, whereas previous work assumed a uniform subgaussian decay condition and that the entries within each factor of the product are identically distributed. A consequence of our least singular value bound is that the four moment matching universality results for the products of independent random matrices, recently obtained by Koppel, O'Rourke, and Vu, hold under much weaker hypotheses. Our proof technique is also of independent interest in the study of structured sparse matrices.

preprint2016arXiv

Elementary Bounds On Mixing Times for Decomposable Markov Chains

Many finite-state reversible Markov chains can be naturally decomposed into "projection" and "restriction" chains. In this paper we provide bounds on the total variation mixing times of the original chain in terms of the mixing properties of these related chains. This paper is in the tradition of existing bounds on Poincare and log-Sobolev constants of Markov chains in terms of similar decompositions [JSTV04, MR02, MR06, MY09]. Our proofs are simple, relying largely on recent results relating hitting and mixing times of reversible Markov chains [PS13, Oli12]. We describe situations in which our results give substantially better bounds than those obtained by applying existing decomposition results and provide examples for illustration.

preprint2016arXiv

Kac's Walk on $n$-sphere mixes in $n\log n$ steps

Determining the mixing time of Kac's random walk on the sphere $\mathrm{S}^{n-1}$ is a long-standing open problem. We show that the total variation mixing time of Kac's walk on $\mathrm{S}^{n-1}$ is between $\frac{1}{2} \, n \log(n)$ and $200 \,n \log(n)$. Our bound is thus optimal up to a constant factor, improving on the best-known upper bound of $O(n^{5} \log(n)^{2})$ due to Jiang. Our main tool is a `non-Markovian' coupling recently introduced by the second author for obtaining the convergence rates of certain high dimensional Gibbs samplers in continuous state spaces.

preprint2016arXiv

Maximum Likelihood Estimation for Single Particle, Passive Microrheology Data with Drift

Volume limitations and low yield thresholds of biological fluids have led to widespread use of passive microparticle rheology. The mean-squared-displacement (MSD) statistics of bead position time series (bead paths) are either applied directly to determine the creep compliance [Xu et al (1998)] or transformed to determine dynamic storage and loss moduli [Mason & Weitz (1995)]. A prevalent hurdle arises when there is a non-diffusive experimental drift in the data. Commensurate with the magnitude of drift relative to diffusive mobility, quantified by a Péclet number, the MSD statistics are distorted, and thus the path data must be "corrected" for drift. The standard approach is to estimate and subtract the drift from particle paths, and then calculate MSD statistics. We present an alternative, parametric approach using maximum likelihood estimation that simultaneously fits drift and diffusive model parameters from the path data; the MSD statistics (and consequently the compliance and dynamic moduli) then follow directly from the best-fit model. We illustrate and compare both methods on simulated path data over a range of Péclet numbers, where exact answers are known. We choose fractional Brownian motion as the numerical model because it affords tunable, sub-diffusive MSD statistics consistent with typical 30 second long, experimental observations of microbeads in several biological fluids. Finally, we apply and compare both methods on data from human bronchial epithelial cell culture mucus.

preprint2016arXiv

More Powerful Multiple Testing in Randomized Experiments with Non-Compliance

Two common concerns raised in analyses of randomized experiments are (i) appropriately handling issues of non-compliance, and (ii) appropriately adjusting for multiple tests (e.g., on multiple outcomes or subgroups). Although simple intention-to-treat (ITT) and Bonferroni methods are valid in terms of type I error, they can each lead to a substantial loss of power; when employing both simultaneously, the total loss may be severe. Alternatives exist to address each concern. Here we propose an analysis method for experiments involving both features that merges posterior predictive $p$-values for complier causal effects with randomization-based multiple comparisons adjustments; the results are valid familywise tests that are doubly advantageous: more powerful than both those based on standard ITT statistics and those using traditional multiple comparison adjustments. The operating characteristics and advantages of our method are demonstrated through a series of simulated experiments and an analysis of the United States Job Training Partnership Act (JTPA) Study, where our methods lead to different conclusions regarding the significance of estimated JTPA effects.

preprint2016arXiv

On the Mixing Time of Kac's Walk and Other High-Dimensional Gibbs Samplers with Constraints

Determining the total variation mixing time of Kac's random walk on the special orthogonal group $\mathrm{SO}(n)$ has been a long-standing open problem. In this paper, we construct a novel non-Markovian coupling for bounding this mixing time. The analysis of our coupling entails controlling the smallest singular value of a certain random matrix with highly dependent entries. The dependence of the entries in our matrix makes it not-amenable to existing techniques in random matrix theory. To circumvent this difficulty, we extend some recent bounds on the smallest singular values of matrices with independent entries to our setting. These bounds imply that the mixing time of Kac's walk on the group $\mathrm{SO}(n)$ is between $C_{1} n^{2}$ and $C_{2} n^{4} \log(n)$ for some explicit constants $0 < C_{1}, C_{2} < \infty$, substantially improving on the bound of $O(n^{5} \log(n)^{2})$ by Jiang. Our methods may also be applied to other high dimensional Gibbs samplers with constraints and thus are of independent interest. In addition to giving analytical bounds on the mixing time, our approach allows us to compute rigorous estimates of the mixing time by simulating the eigenvalues of a random matrix.

preprint2016arXiv

Parallel Markov Chain Monte Carlo via Spectral Clustering

As it has become common to use many computer cores in routine applications, finding good ways to parallelize popular algorithms has become increasingly important. In this paper, we present a parallelization scheme for Markov chain Monte Carlo (MCMC) methods based on spectral clustering of the underlying state space, generalizing earlier work on parallelization of MCMC methods by state space partitioning. We show empirically that this approach speeds up MCMC sampling for multimodal distributions and that it can be usefully applied in greater generality than several related algorithms. Our algorithm converges under reasonable conditions to an `optimal' MCMC algorithm. We also show that our approach can be asymptotically far more efficient than naive parallelization, even in situations such as completely flat target distributions where no unique optimal algorithm exists. Finally, we combine theoretical and empirical bounds to provide practical guidance on the choice of tuning parameters.

preprint2016arXiv

Ratios and Cauchy Distribution

It is well known that the ratio of two independent standard Gaussian random variables follows a Cauchy distribution. Any convex combination of independent standard Cauchy random variables also follows a Cauchy distribution. In a recent joint work, the author proved a surprising multivariate generalization of the above facts. Fix $m > 1$ and let $Σ$ be a $m\times m$ positive semi-definite matrix. Let $X,Y \sim \mathrm{N}(0,Σ)$ be independent vectors. Let $\vec{w}=(w_1, \dots, w_m)$ be a vector of non-negative numbers with $\sum_{j=1}^m w_j = 1.$ The author proved recently that the random variable $$ Z = \sum_{j=1}^m w_j\frac{X_j}{Y_j}\; $$ also has the standard Cauchy distribution. In this note, we provide some more understanding of this result and give a number of natural generalizations. In particular, we observe that if $(X,Y)$ have the same marginal distribution, they need neither be independent nor be jointly normal for $Z$ to be Cauchy distributed. In fact, our calculations suggest that joint normality of $(X,Y)$ may be the only instance in which they can be independent. Our results also give a method to construct copulas of Cauchy distributions.

preprint2016arXiv

Sub-optimality of some continuous shrinkage priors

Two-component mixture priors provide a traditional way to induce sparsity in high-dimensional Bayes models. However, several aspects of such a prior, including computational complexities in high-dimensions, interpretation of exact zeros and non-sparse posterior summaries under standard loss functions, has motivated an amazing variety of continuous shrinkage priors, which can be expressed as global-local scale mixtures of Gaussians. Interestingly, we demonstrate that many commonly used shrinkage priors, including the Bayesian Lasso, do not have adequate posterior concentration in high-dimensional settings.

preprint2015arXiv

An unexpected encounter with Cauchy and Lévy

The Cauchy distribution is usually presented as a mathematical curiosity, an exception to the Law of Large Numbers, or even as an "Evil" distribution in some introductory courses. It therefore surprised us when Drton and Xiao (2016) proved the following result for $m=2$ and conjectured it for $m\ge 3$. Let $X= (X_1,..., X_m)$ and $Y = (Y_1, ...,Y_m)$ be i.i.d $N(0,Σ)$, where $Σ=\{σ_{ij}\}\ge 0$ is an $m\times m$ and \textit{arbitrary} covariance matrix with $σ_{jj}>0$ for all $1\leq j\leq m$. Then $$Z = \sum_{j=1}^m w_j \frac{X_j}{Y_j} \ \sim \mathrm{Cauchy}(0,1),$$ as long as $w=(w_1,..., w_m) $ is independent of $(X, Y)$, $w_j\ge 0, j=1,..., m$, and $\sum_{j=1}^m w_j=1$. In this note, we present an elementary proof of this conjecture for any $m \geq 2$ by linking $Z$ to a geometric characterization of Cauchy(0,1) given in Willams (1969). This general result is essential to the large sample behavior of Wald tests in many applications such as factor models and contingency tables. It also leads to other unexpected results such as $$ \sum_{i=1}^m\sum_{j=1}^m \frac{w_iw_jσ_{ij}}{X_iX_j} \sim {\text{Lévy}}(0, 1). $$ This generalizes the "super Cauchy phenomenon" that the average of $m$ i.i.d. standard Lévy variables (i.e., inverse chi-squared variables with one degree of freedom) has the same distribution as that of a single standard Lévy variable multiplied by $m$ (which is obtained by taking $w_j=1/m$ and $Σ$ to be the identity matrix).

preprint2015arXiv

Bayesian Nonparametric Weighted Sampling Inference

It has historically been a challenge to perform Bayesian inference in a design-based survey context. The present paper develops a Bayesian model for sampling inference in the presence of inverse-probability weights. We use a hierarchical approach in which we model the distribution of the weights of the nonsampled units in the population and simultaneously include them as predictors in a nonparametric Gaussian process regression. We use simulation studies to evaluate the performance of our procedure and compare it to the classical design-based estimator. We apply our method to the Fragile Family and Child Wellbeing Study. Our studies find the Bayesian nonparametric finite population estimator to be more robust than the classical design-based estimator without loss in efficiency, which works because we induce regularization for small cells and thus this is a way of automatically smoothing the highly variable weights.

preprint2015arXiv

Degrees of freedom for combining regression with factor analysis

In the AGEMAP genomics study, researchers were interested in detecting genes related to age in a variety of tissue types. After not finding many age-related genes in some of the analyzed tissue types, the study was criticized for having low power. It is possible that the low power is due to the presence of important unmeasured variables, and indeed we find that a latent factor model appears to explain substantial variability not captured by measured covariates. We propose including the estimated latent factors in a multiple regression model. The key difficulty in doing so is assigning appropriate degrees of freedom to the estimated factors to obtain unbiased error variance estimators and enable valid hypothesis testing. When the number of responses is large relative to the sample size, treating the estimated factors like observed covariates leads to a downward bias in the variance estimates. Many ad-hoc solutions to this problem have been proposed in the literature without the backup of a careful theoretical analysis. Using recent results from random matrix theory, we derive a simple, easy to use expression for degrees of freedom. Our estimate gives a principled alternative to ad-hoc approaches in common use. Extensive simulation results show excellent agreement between the proposed estimator and its theoretical value. Applying our methodology to the AGEMAP genomics study, we found an order of magnitude increase in the number of significant genes. Although we focus on the AGEMAP study, the methods developed in this paper are widely applicable to other multivariate models, and thus are of independent interest.

preprint2015arXiv

Ergodicity of Approximate MCMC Chains with Applications to Large Data Sets

In many modern applications, difficulty in evaluating the posterior density makes performing even a single MCMC step slow. This difficulty can be caused by intractable likelihood functions, but also appears for routine problems with large data sets. Many researchers have responded by running approximate versions of MCMC algorithms. In this note, we develop quantitative bounds for showing the ergodicity of these approximate samplers. We then use these bounds to study the bias-variance trade-off of approximate MCMC algorithms. We apply our results to simple versions of recently proposed algorithms, including a variant of the "austerity" framework of Korratikara et al.

preprint2015arXiv

Gaussian Process Regression with Location Errors

In this paper, we investigate Gaussian process regression models where inputs are subject to measurement error. In spatial statistics, input measurement errors occur when the geographical locations of observed data are not known exactly. Such sources of error are not special cases of "nugget" or microscale variation, and require alternative methods for both interpolation and parameter estimation. Gaussian process models do not straightforwardly extend to incorporate input measurement error, and simply ignoring noise in the input space can lead to poor performance for both prediction and parameter inference. We review and extend existing theory on prediction and estimation in the presence of location errors, and show that ignoring location errors may lead to Kriging that is not "self-efficient". We also introduce a Markov Chain Monte Carlo (MCMC) approach using the Hybrid Monte Carlo algorithm that obtains optimal (minimum MSE) predictions, and discuss situations that lead to multimodality of the target distribution and/or poor chain mixing. Through simulation study and analysis of global air temperature data, we show that appropriate methods for incorporating location measurement error are essential to valid inference in this regime.

preprint2015arXiv

Hypothesis testing for high-dimensional sparse binary regression

In this paper, we study the detection boundary for minimax hypothesis testing in the context of high-dimensional, sparse binary regression models. Motivated by genetic sequencing association studies for rare variant effects, we investigate the complexity of the hypothesis testing problem when the design matrix is sparse. We observe a new phenomenon in the behavior of detection boundary which does not occur in the case of Gaussian linear regression. We derive the detection boundary as a function of two components: a design matrix sparsity index and signal strength, each of which is a function of the sparsity of the alternative. For any alternative, if the design matrix sparsity index is too high, any test is asymptotically powerless irrespective of the magnitude of signal strength. For binary design matrices with the sparsity index that is not too high, our results are parallel to those in the Gaussian case. In this context, we derive detection boundaries for both dense and sparse regimes. For the dense regime, we show that the generalized likelihood ratio is rate optimal; for the sparse regime, we propose an extended Higher Criticism Test and show it is rate optimal and sharp. We illustrate the finite sample properties of the theoretical results using simulation studies.

preprint2015arXiv

Mixing times for a constrained Ising process on the torus at low density

We study a kinetically constrained Ising process (KCIP) associated with a graph G and density parameter p; this process is an interacting particle system with state space $\{0,1\}^{G}$. The stationary distribution of the KCIP Markov chain is the Binomial($|G|, p$) distribution on the number of particles, conditioned on having at least one particle. The `constraint' in the name of the process refers to the rule that a vertex cannot change its state unless it has at least one neighbour in state `1'. The KCIP has been proposed by statistical physicists as a model for the glass transition, and more recently as a simple algorithm for data storage in computer networks. In this note, we study the mixing time of this process on the torus $G = \mathbb{Z}_{L}^{d}$, $d \geq 3$, in the low-density regime $p = \frac{c}{n}$ for arbitrary $0 < c < 1$; this regime is the subject of a conjecture of Aldous and is natural in the context of computer networks. Our results provide a counterexample to Aldous' conjecture, suggest a natural modifcation of the conjecture, and show that this modifcation is correct up to logarithmic factors. The methods developed in this paper also provide a strategy for tackling Aldous' conjecture for other graphs.

preprint2015arXiv

Model comparison and assessment for single particle tracking in biological fluids

State-of-the-art techniques in passive particle-tracking microscopy provide high-resolution path trajectories of diverse foreign particles in biological fluids. For particles on the order of 1 micron diameter, these paths are generally inconsistent with simple Brownian motion. Yet, despite an abundance of data confirming these findings and their wide-ranging scientific implications, stochastic modeling of the complex particle motion has received comparatively little attention. Even among posited models, there is virtually no literature on likelihood-based inference, model comparisons, and other quantitative assessments. In this article, we develop a rigorous and computationally efficient Bayesian methodology to address this gap. We analyze two of the most prevalent candidate models for 30 second paths of 1 micron diameter tracer particles in human lung mucus: fractional Brownian motion (fBM) and a Generalized Langevin Equation (GLE) consistent with viscoelastic theory. Our model comparisons distinctly favor GLE over fBM, with the former describing the data remarkably well up to the timescales for which we have reliable information.

preprint2014arXiv

A Function Space HMC Algorithm With Second Order Langevin Diffusion Limit

We describe a new MCMC method optimized for the sampling of probability measures on Hilbert space which have a density with respect to a Gaussian; such measures arise in the Bayesian approach to inverse problems, and in conditioned diffusions. Our algorithm is based on two key design principles: (i) algorithms which are well-defined in infinite dimensions result in methods which do not suffer from the curse of dimensionality when they are applied to approximations of the infinite dimensional target measure on $\bbR^N$; (ii) non-reversible algorithms can have better mixing properties compared to their reversible counterparts. The method we introduce is based on the hybrid Monte Carlo algorithm, tailored to incorporate these two design principles. The main result of this paper states that the new algorithm, appropriately rescaled, converges weakly to a second order Langevin diffusion on Hilbert space; as a consequence the algorithm explores the approximate target measures on $\bbR^N$ in a number of steps which is independent of $N$. We also present the underlying theory for the limiting non-reversible diffusion on Hilbert space, including characterization of the invariant measure, and we describe numerical simulations demonstrating that the proposed method has favourable mixing properties as an MCMC algorithm.

preprint2014arXiv

A location-mixture autoregressive model for online forecasting of lung tumor motion

Lung tumor tracking for radiotherapy requires real-time, multiple-step ahead forecasting of a quasi-periodic time series recording instantaneous tumor locations. We introduce a location-mixture autoregressive (LMAR) process that admits multimodal conditional distributions, fast approximate inference using the EM algorithm and accurate multiple-step ahead predictive distributions. LMAR outperforms several commonly used methods in terms of out-of-sample prediction accuracy using clinical data from lung tumor patients. With its superior predictive performance and real-time computation, the LMAR model could be effectively implemented for use in current tumor tracking systems.

preprint2014arXiv

Dirichlet-Laplace priors for optimal shrinkage

Penalized regression methods, such as $L_1$ regularization, are routinely used in high-dimensional applications, and there is a rich literature on optimality properties under sparsity assumptions. In the Bayesian paradigm, sparsity is routinely induced through two-component mixture priors having a probability mass at zero, but such priors encounter daunting computational problems in high dimensions. This has motivated an amazing variety of continuous shrinkage priors, which can be expressed as global-local scale mixtures of Gaussians, facilitating computation. In sharp contrast to the frequentist literature, little is known about the properties of such priors and the convergence and concentration of the corresponding posterior distribution. In this article, we propose a new class of Dirichlet--Laplace (DL) priors, which possess optimal posterior concentration and lead to efficient posterior computation exploiting results from normalized random measure theory. Finite sample performance of Dirichlet--Laplace priors relative to alternatives is assessed in simulated and real data examples.

preprint2014arXiv

Finite Sample Properties of Adaptive Markov Chains via Curvature

Adaptive Markov chains are an important class of Monte Carlo methods for sampling from probability distributions. The time evolution of adaptive algorithms depends on past samples, and thus these algorithms are non-Markovian. Although there has been previous work establishing conditions for their ergodicity, not much is known theoretically about their finite sample properties. In this paper, using a notion of discrete Ricci curvature for Markov kernels introduced by Ollivier, we establish concentration inequalities and finite sample bounds for a class of adaptive Markov chains. After establishing some general results, we give quantitative bounds for `multi-level' adaptive algorithms such as the equi-energy sampler. We also provide the first rigorous proofs that the finite sample properties of an equi-energy sampler are superior to those of related parallel tempering and Metropolis-Hastings samplers after a learning period comparable to their mixing times.

preprint2014arXiv

Gradient Flow from a Random Walk in Hilbert Space

Consider a probability measure on a Hilbert space defined via its density with respect to a Gaussian. The purpose of this paper is to demonstrate that an appropriately defined Markov chain, which is reversible with respect to the measure in question, exhibits a diffusion limit to a noisy gradient flow, also reversible with respect to the same measure. The Markov chain is defined by applying a Metropolis-Hastings accept-reject mechanism to an Ornstein-Uhlenbeck proposal which is itself reversible with respect to the underlying Gaussian measure. The resulting noisy gradient flow is a stochastic partial differential equation driven by a Wiener process with spatial correlation given by the underlying Gaussian structure.

preprint2014arXiv

Posterior contraction in sparse Bayesian factor models for massive covariance matrices

Sparse Bayesian factor models are routinely implemented for parsimonious dependence modeling and dimensionality reduction in high-dimensional applications. We provide theoretical understanding of such Bayesian procedures in terms of posterior convergence rates in inferring high-dimensional covariance matrices where the dimension can be larger than the sample size. Under relevant sparsity assumptions on the true covariance matrix, we show that commonly-used point mass mixture priors on the factor loadings lead to consistent estimation in the operator norm even when $p\gg n$. One of our major contributions is to develop a new class of continuous shrinkage priors and provide insights into their concentration around sparse vectors. Using such priors for the factor loadings, we obtain similar rate of convergence as obtained with point mass mixture priors. To obtain the convergence rates, we construct test functions to separate points in the space of high-dimensional covariance matrices using insights from random matrix theory; the tools developed may be of independent interest. We also derive minimax rates and show that the Bayesian posterior rates of convergence coincide with the minimax rates upto a $\sqrt{\log n}$ term.

preprint2014arXiv

Universality of covariance matrices

In this paper we prove the universality of covariance matrices of the form $H_{N\times N}={X}^{\dagger}X$ where $X$ is an ${M\times N}$ rectangular matrix with independent real valued entries $x_{ij}$ satisfying $\mathbb{E}x_{ij}=0$ and $\mathbb{E}x^2_{ij}={\frac{1}{M}}$, $N$, $M\to \infty$. Furthermore it is assumed that these entries have sub-exponential tails or sufficiently high number of moments. We will study the asymptotics in the regime $N/M=d_N\in(0,\infty),\lim_{N\to\infty}d_N\neq0,\infty$. Our main result is the edge universality of the sample covariance matrix at both edges of the spectrum. In the case $\lim_{N\to\infty}d_N=1$, we only focus on the largest eigenvalue. Our proof is based on a novel version of the Green function comparison theorem for data matrices with dependent entries. En route to proving edge universality, we establish that the Stieltjes transform of the empirical eigenvalue distribution of $H$ is given by the Marcenko-Pastur law uniformly up to the edges of the spectrum with an error of order $(Nη)^{-1}$ where $η$ is the imaginary part of the spectral parameter in the Stieltjes transform. Combining these results with existing techniques we also show bulk universality of covariance matrices. All our results hold for both real and complex valued entries.

preprint2013arXiv

Regularity of laws and ergodicity of hypoelliptic SDEs driven by rough paths

We consider differential equations driven by rough paths and study the regularity of the laws and their long time behavior. In particular, we focus on the case when the driving noise is a rough path valued fractional Brownian motion with Hurst parameter $H\in(\frac{1}{3},\frac{1}{2}]$. Our contribution in this work is twofold. First, when the driving vector fields satisfy Hörmander's celebrated "Lie bracket condition," we derive explicit quantitative bounds on the inverse of the Malliavin matrix. En route to this, we provide a novel "deterministic" version of Norris's lemma for differential equations driven by rough paths. This result, with the added assumption that the linearized equation has moments, will then yield that the transition laws have a smooth density with respect to Lebesgue measure. Our second main result states that under Hörmander's condition, the solutions to rough differential equations driven by fractional Brownian motion with $H\in(\frac{1}{3},\frac{1}{2}]$ enjoy a suitable version of the strong Feller property. Under a standard controllability condition, this implies that they admit a unique stationary solution that is physical in the sense that it does not "look into the future."

preprint2013arXiv

Statistical Inference for Stochastic Differential Equations with Memory

In this paper we construct a framework for doing statistical inference for discretely observed stochastic differential equations (SDEs) where the driving noise has 'memory'. Classical SDE models for inference assume the driving noise to be Brownian motion, or "white noise", thus implying a Markov assumption. We focus on the case when the driving noise is a fractional Brownian motion, which is a common continuous-time modeling device for capturing long-range memory. Since the likelihood is intractable, we proceed via data augmentation, adapting a familiar discretization and missing data approach developed for the white noise case. In addition to the other SDE parameters, we take the Hurst index to be unknown and estimate it from the data. Posterior sampling is performed via a Hybrid Monte Carlo algorithm on both the parameters and the missing data simultaneously so as to improve mixing. We point out that, due to the long-range correlations of the driving noise, careful discretization of the underlying SDE is necessary for valid inference. Our approach can be adapted to other types of rough-path driving processes such as Gaussian "colored" noise. The methodology is used to estimate the evolution of the memory parameter in US short-term interest rates.

preprint2012arXiv

Bayesian shrinkage

Penalized regression methods, such as $L_1$ regularization, are routinely used in high-dimensional applications, and there is a rich literature on optimality properties under sparsity assumptions. In the Bayesian paradigm, sparsity is routinely induced through two-component mixture priors having a probability mass at zero, but such priors encounter daunting computational problems in high dimensions. This has motivated an amazing variety of continuous shrinkage priors, which can be expressed as global-local scale mixtures of Gaussians, facilitating computation. In sharp contrast to the corresponding frequentist literature, very little is known about the properties of such priors. Focusing on a broad class of shrinkage priors, we provide precise results on prior and posterior concentration. Interestingly, we demonstrate that most commonly used shrinkage priors, including the Bayesian Lasso, are suboptimal in high-dimensional settings. A new class of Dirichlet Laplace (DL) priors are proposed, which are optimal and lead to efficient posterior computation exploiting results from normalized random measure theory. Finite sample performance of Dirichlet Laplace priors relative to alternatives is assessed in simulations.

preprint2012arXiv

Causal inference from $2^k$ factorial designs using the potential outcomes model

A framework for causal inference from two-level factorial designs is proposed. The framework utilizes the concept of potential outcomes that lies at the center stage of causal inference and extends Neyman's repeated sampling approach for estimation of causal effects and randomization tests based on Fisher's sharp null hypothesis to the case of 2-level factorial experiments. The framework allows for statistical inference from a finite population, permits definition and estimation of estimands other than "average factorial effects" and leads to more flexible inference procedures than those based on ordinary least squares estimation from a linear model.

preprint2012arXiv

Diffusion limits of the random walk Metropolis algorithm in high dimensions

Diffusion limits of MCMC methods in high dimensions provide a useful theoretical tool for studying computational complexity. In particular, they lead directly to precise estimates of the number of steps required to explore the target measure, in stationarity, as a function of the dimension of the state space. However, to date such results have mainly been proved for target measures with a product structure, severely limiting their applicability. The purpose of this paper is to study diffusion limits for a class of naturally occurring high-dimensional measures found from the approximation of measures on a Hilbert space which are absolutely continuous with respect to a Gaussian reference measure. The diffusion limit of a random walk Metropolis algorithm to an infinite-dimensional Hilbert space valued SDE (or SPDE) is proved, facilitating understanding of the computational complexity of the algorithm.

preprint2012arXiv

Edge universality of correlation matrices

Let $\widetilde{X}_{M\times N}$ be a rectangular data matrix with independent real-valued entries $[\widetilde{x}_{ij}]$ satisfying $\mathbb {E}\widetilde{x}_{ij}=0$ and $\mathbb {E}\widetilde{x}^2_{ij}=\frac{1}{M}$, $N,M\to\infty$. These entries have a subexponential decay at the tails. We will be working in the regime $N/M=d_N,\lim_{N\to\infty}d_N\neq0,1,\infty$. In this paper we prove the edge universality of correlation matrices ${X}^{\dagger}X$, where the rectangular matrix $X$ (called the standardized matrix) is obtained by normalizing each column of the data matrix $\widetilde{X}$ by its Euclidean norm. Our main result states that asymptotically the $k$-point ($k\geq1$) correlation functions of the extreme eigenvalues (at both edges of the spectrum) of the correlation matrix ${X}^{\dagger}X$ converge to those of the Gaussian correlation matrix, that is, Tracy-Widom law, and, thus, in particular, the largest and the smallest eigenvalues of ${X}^{\dagger}X$ after appropriate centering and rescaling converge to the Tracy-Widom distribution. The asymptotic distribution of extreme eigenvalues of the Gaussian correlation matrix has been worked out only recently. As a corollary of the main result in this paper, we also obtain that the extreme eigenvalues of Gaussian correlation matrices are asymptotically distributed according to the Tracy-Widom law. The proof is based on the comparison of Green functions, but the key obstacle to be surmounted is the strong dependence of the entries of the correlation matrix. We achieve this via a novel argument which involves comparing the moments of product of the entries of the standardized data matrix to those of the raw data matrix. Our proof strategy may be extended for proving the edge universality of other random matrix ensembles with dependent entries and hence is of independent interest.

preprint2012arXiv

Geometric ergodicity of a bead-spring pair with stochastic Stokes forcing

We consider a simple model for the fluctuating hydrodynamics of a flexible polymer in dilute solution, demonstrating geometric ergodicity for a pair of particles that interact with each other through a nonlinear spring potential while being advected by a stochastic Stokes fluid velocity field. This is a generalization of previous models which have used linear spring forces as well as white-in-time fluid velocity fields. We follow previous work combining control theoretic arguments, Lyapunov functions, and hypo-elliptic diffusion theory to prove exponential convergence via a Harris chain argument. In addition we allow the possibility of excluding certain "bad" sets in phase space in which the assumptions are violated but from which the system leaves with a controllable probability. This allows for the treatment of singular drifts, such as those derived from the Lennard-Jones potential, which is a novel feature of this work.

preprint2012arXiv

Optimal scaling and diffusion limits for the Langevin algorithm in high dimensions

The Metropolis-adjusted Langevin (MALA) algorithm is a sampling algorithm which makes local moves by incorporating information about the gradient of the logarithm of the target density. In this paper we study the efficiency of MALA on a natural class of target measures supported on an infinite dimensional Hilbert space. These natural measures have density with respect to a Gaussian random field measure and arise in many applications such as Bayesian nonparametric statistics and the theory of conditioned diffusions. We prove that, started in stationarity, a suitably interpolated and scaled version of the Markov chain corresponding to MALA converges to an infinite dimensional diffusion process. Our results imply that, in stationarity, the MALA algorithm applied to an N-dimensional approximation of the target will take $\mathcal{O}(N^{1/3})$ steps to explore the invariant measure, comparing favorably with the Random Walk Metropolis which was recently shown to require $\mathcal{O}(N)$ steps when applied to the same class of problems.

preprint2012arXiv

Statistical inference for dynamical systems: a review

The topic of statistical inference for dynamical systems has been studied extensively across several fields. In this survey we focus on the problem of parameter estimation for non-linear dynamical systems. Our objective is to place results across distinct disciplines in a common setting and highlight opportunities for further research.

preprint2011arXiv

On a Class of Shrinkage Priors for Covariance Matrix Estimation

We propose a flexible class of models based on scale mixture of uniform distributions to construct shrinkage priors for covariance matrix estimation. This new class of priors enjoys a number of advantages over the traditional scale mixture of normal priors, including its simplicity and flexibility in characterizing the prior density. We also exhibit a simple, easy to implement Gibbs sampler for posterior simulation which leads to efficient estimation in high dimensional problems. We first discuss the theory and computational details of this new approach and then extend the basic model to a new class of multivariate conditional autoregressive models for analyzing multivariate areal data. The proposed spatial model flexibly characterizes both the spatial and the outcome correlation structures at an appealing computational cost. Examples consisting of both synthetic and real-world data show the utility of this new framework in terms of robust estimation as well as improved predictive performance.

preprint2011arXiv

Why approximate Bayesian computational (ABC) methods cannot handle model choice problems

Approximate Bayesian computation (ABC), also known as likelihood-free methods, have become a favourite tool for the analysis of complex stochastic models, primarily in population genetics but also in financial analyses. We advocated in Grelaud et al. (2009) the use of ABC for Bayesian model choice in the specific case of Gibbs random fields (GRF), relying on a sufficiency property mainly enjoyed by GRFs to show that the approach was legitimate. Despite having previously suggested the use of ABC for model choice in a wider range of models in the DIY ABC software (Cornuet et al., 2008), we present theoretical evidence that the general use of ABC for model choice is fraught with danger in the sense that no amount of computation, however large, can guarantee a proper approximation of the posterior probabilities of the models under comparison.

preprint2010arXiv

Ergodicity of hypoelliptic SDEs driven by fractional Brownian motion

We demonstrate that stochastic differential equations (SDEs) driven by fractional Brownian motion with Hurst parameter H > 1/2 have similar ergodic properties as SDEs driven by standard Brownian motion. The focus in this article is on hypoelliptic systems satisfying Hörmander's condition. We show that such systems satisfy a suitable version of the strong Feller property and we conclude that they admit a unique stationary solution that is physical in the sense that it does not "look into the future". The main technical result required for the analysis is a bound on the moments of the inverse of the Malliavin covariance matrix, conditional on the past of the driving noise.

preprint2010arXiv

Optimal tuning of the Hybrid Monte-Carlo Algorithm

We investigate the properties of the Hybrid Monte-Carlo algorithm (HMC) in high dimensions. HMC develops a Markov chain reversible w.r.t. a given target distribution $Π$ by using separable Hamiltonian dynamics with potential $-\logΠ$. The additional momentum variables are chosen at random from the Boltzmann distribution and the continuous-time Hamiltonian dynamics are then discretised using the leapfrog scheme. The induced bias is removed via a Metropolis-Hastings accept/reject rule. In the simplified scenario of independent, identically distributed components, we prove that, to obtain an $\mathcal{O}(1)$ acceptance probability as the dimension $d$ of the state space tends to $\infty$, the leapfrog step-size $h$ should be scaled as $h= l \times d^{-1/4}$. Therefore, in high dimensions, HMC requires $\mathcal{O}(d^{1/4})$ steps to traverse the state space. We also identify analytically the asymptotically optimal acceptance probability, which turns out to be 0.651 (to three decimal places). This is the choice which optimally balances the cost of generating a proposal, which {\em decreases} as $l$ increases, against the cost related to the average number of proposals required to obtain acceptance, which {\em increases} as $l$ increases.

Natesh S. Pillai

What is connected

Connect this record

See the researcher in context

Building this map preview

38 published item(s)

Fast and memory-optimal dimension reduction using Kac's walk

Universality and least singular values of random matrix products: a simplified approach

Elementary Bounds On Mixing Times for Decomposable Markov Chains

Kac's Walk on $n$-sphere mixes in $n\log n$ steps

Maximum Likelihood Estimation for Single Particle, Passive Microrheology Data with Drift

More Powerful Multiple Testing in Randomized Experiments with Non-Compliance

On the Mixing Time of Kac's Walk and Other High-Dimensional Gibbs Samplers with Constraints

Parallel Markov Chain Monte Carlo via Spectral Clustering

Ratios and Cauchy Distribution

Sub-optimality of some continuous shrinkage priors

An unexpected encounter with Cauchy and Lévy

Bayesian Nonparametric Weighted Sampling Inference

Degrees of freedom for combining regression with factor analysis

Ergodicity of Approximate MCMC Chains with Applications to Large Data Sets

Gaussian Process Regression with Location Errors

Hypothesis testing for high-dimensional sparse binary regression

Mixing times for a constrained Ising process on the torus at low density

Model comparison and assessment for single particle tracking in biological fluids

A Function Space HMC Algorithm With Second Order Langevin Diffusion Limit

A location-mixture autoregressive model for online forecasting of lung tumor motion

Dirichlet-Laplace priors for optimal shrinkage

Finite Sample Properties of Adaptive Markov Chains via Curvature

Gradient Flow from a Random Walk in Hilbert Space

Posterior contraction in sparse Bayesian factor models for massive covariance matrices

Universality of covariance matrices

Regularity of laws and ergodicity of hypoelliptic SDEs driven by rough paths

Statistical Inference for Stochastic Differential Equations with Memory

Bayesian shrinkage

Causal inference from $2^k$ factorial designs using the potential outcomes model

Diffusion limits of the random walk Metropolis algorithm in high dimensions

Edge universality of correlation matrices

Geometric ergodicity of a bead-spring pair with stochastic Stokes forcing

Optimal scaling and diffusion limits for the Langevin algorithm in high dimensions

Statistical inference for dynamical systems: a review

On a Class of Shrinkage Priors for Covariance Matrix Estimation

Why approximate Bayesian computational (ABC) methods cannot handle model choice problems

Ergodicity of hypoelliptic SDEs driven by fractional Brownian motion

Optimal tuning of the Hybrid Monte-Carlo Algorithm