Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
21topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2022arXiv

A Simple Non-Stationary Mean Ergodic Theorem, with Bonus Weak Law of Large Numbers

This brief pedagogical note re-proves a simple theorem on the convergence, in $L_2$ and in probability, of time averages of non-stationary time series to the mean of expectation values. The basic condition is that the sum of covariances grows sub-quadratically with the length of the time series. I make no claim to originality; the result is widely, but unevenly, spread bit of folklore among users of applied probability. The goal of this note is merely to even out that distribution.

preprint2022arXiv

Evaluating Posterior Distributions by Selectively Breeding Prior Samples

Using Markov chain Monte Carlo to sample from posterior distributions was the key innovation which made Bayesian data analysis practical. Notoriously, however, MCMC is hard to tune, hard to diagnose, and hard to parallelize. This pedagogical note explores variants on a universal {\em non}-Markov-chain Monte Carlo scheme for sampling from posterior distributions. The basic idea is to draw parameter values from the prior distributions, evaluate the likelihood of each draw, and then copy that draw a number of times proportional to its likelihood. The distribution after copying is an approximation to the posterior which becomes exact as the number of initial samples goes to infinity; the convergence of the approximation is easily analyzed, and is uniform over Glivenko-Cantelli classes. While not {\em entirely} practical, the schemes are straightforward to implement (a few lines of R), easily parallelized, and require no rejection, burn-in, convergence diagnostics, or tuning of any control settings. I provide references to the prior art which deals with some of the practical obstacles, at some cost in computational and analytical simplicity.

preprint2016arXiv

Nonparametric risk bounds for time-series forecasting

We derive generalization error bounds for traditional time-series forecasting models. Our results hold for many standard forecasting tools including autoregressive models, moving average models, and, more generally, linear state-space models. These non-asymptotic bounds need only weak assumptions on the data-generating process, yet allow forecasters to select among competing models and to guarantee, with high probability, that their chosen model will perform well. We motivate our techniques with and apply them to standard economic and financial forecasting tools---a GARCH model for predicting equity volatility and a dynamic stochastic general equilibrium model (DSGE), the standard tool in macroeconomic forecasting. We demonstrate in particular how our techniques can aid forecasters and policy makers in choosing models which behave well under uncertainty and mis-specification.

preprint2013arXiv

Consistency under sampling of exponential random graph models

The growing availability of network data and of scientific interest in distributed systems has led to the rapid development of statistical models of network structure. Typically, however, these are models for the entire network, while the data consists only of a sampled sub-network. Parameters for the whole network, which is what is of interest, are estimated by applying the model to the sub-network. This assumes that the model is consistent under sampling, or, in terms of the theory of stochastic processes, that it defines a projective family. Focusing on the popular class of exponential random graph models (ERGMs), we show that this apparently trivial condition is in fact violated by many popular and scientifically appealing models, and that satisfying it drastically limits ERGM's expressive power. These results are actually special cases of more general results about exponential families of dependent random variables, which we also prove. Using such results, we offer easily checked conditions for the consistency of maximum likelihood estimation in ERGMs, and discuss some possible constructive responses.

preprint2010arXiv

Approximate Methods for State-Space Models

State-space models provide an important body of techniques for analyzing time-series, but their use requires estimating unobserved states. The optimal estimate of the state is its conditional expectation given the observation histories, and computing this expectation is hard when there are nonlinearities. Existing filtering methods, including sequential Monte Carlo, tend to be either inaccurate or slow. In this paper, we study a nonlinear filter for nonlinear/non-Gaussian state-space models, which uses Laplace's method, an asymptotic series expansion, to approximate the state's conditional mean and variance, together with a Gaussian conditional distribution. This {\em Laplace-Gaussian filter} (LGF) gives fast, recursive, deterministic state estimates, with an error which is set by the stochastic characteristics of the model and is, we show, stable over time. We illustrate the estimation ability of the LGF by applying it to the problem of neural decoding and compare it to sequential Monte Carlo both in simulations and with real data. We find that the LGF can deliver superior results in a small fraction of the computing time.

preprint2010arXiv

Homophily and Contagion Are Generically Confounded in Observational Social Network Studies

We consider processes on social networks that can potentially involve three factors: homophily, or the formation of social ties due to matching individual traits; social contagion, also known as social influence; and the causal effect of an individual's covariates on their behavior or other measurable responses. We show that, generically, all of these are confounded with each other. Distinguishing them from one another requires strong assumptions on the parametrization of the social process or on the adequacy of the covariates used (or both). In particular we demonstrate, with simple examples, that asymmetries in regression coefficients cannot identify causal effects, and that very simple models of imitation (a form of social contagion) can produce substantial correlations between an individual's enduring traits and their choices, even when there is no intrinsic affinity between them. We also suggest some possible constructive responses to these results.

preprint2009arXiv

Dynamics of Bayesian Updating with Dependent Data and Misspecified Models

Much is now known about the consistency of Bayesian updating on infinite-dimensional parameter spaces with independent or Markovian data. Necessary conditions for consistency include the prior putting enough weight on the correct neighborhoods of the data-generating distribution; various sufficient conditions further restrict the prior in ways analogous to capacity control in frequentist nonparametrics. The asymptotics of Bayesian updating with mis-specified models or priors, or non-Markovian data, are far less well explored. Here I establish sufficient conditions for posterior convergence when all hypotheses are wrong, and the data have complex dependencies. The main dynamical assumption is the asymptotic equipartition (Shannon-McMillan-Breiman) property of information theory. This, along with Egorov's Theorem on uniform convergence, lets me build a sieve-like structure for the prior. The main statistical assumption, also a form of capacity control, concerns the compatibility of the prior and the data-generating process, controlling the fluctuations in the log-likelihood when averaged over the sieve-like sets. In addition to posterior convergence, I derive a kind of large deviations principle for the posterior measure, extending in some cases to rates of convergence, and discuss the advantages of predicting using a combination of models known to be wrong. An appendix sketches connections between these results and the replicator dynamics of evolutionary theory.

preprint2009arXiv

The Computational Structure of Spike Trains

Neurons perform computations, and convey the results of those computations through the statistical structure of their output spike trains. Here we present a practical method, grounded in the information-theoretic analysis of prediction, for inferring a minimal representation of that structure and for characterizing its complexity. Starting from spike trains, our approach finds their causal state models (CSMs), the minimal hidden Markov models or stochastic automata capable of generating statistically identical time series. We then use these CSMs to objectively quantify both the generalizable structure and the idiosyncratic randomness of the spike train. Specifically, we show that the expected algorithmic information content (the information needed to describe the spike train exactly) can be split into three parts describing (1) the time-invariant structure (complexity) of the minimal spike-generating process, which describes the spike train statistically; (2) the randomness (internal entropy rate) of the minimal spike-generating process; and (3) a residual pure noise term not described by the minimal spike-generating process. We use CSMs to approximate each of these quantities. The CSMs are inferred nonparametrically from the data, making only mild regularity assumptions, via the causal state splitting reconstruction algorithm. The methods presented here complement more traditional spike train analyses by describing not only spiking probability and spike train entropy, but also the complexity of a spike train's structure. We demonstrate our approach using both simulated spike trains and experimental data recorded in rat barrel cortex during vibrissa stimulation.

preprint2003arXiv

Optimal Nonlinear Prediction of Random Fields on Networks

It is increasingly common to encounter time-varying random fields on networks (metabolic networks, sensor arrays, distributed computing, etc.). This paper considers the problem of optimal, nonlinear prediction of these fields, showing from an information-theoretic perspective that it is formally identical to the problem of finding minimal local sufficient statistics. I derive general properties of these statistics, show that they can be composed into global predictors, and explore their recursive estimation properties. For the special case of discrete-valued fields, I describe a convergent algorithm to identify the local predictors from empirical data, with minimal prior information about the field, and no distributional assumptions.

preprint2000arXiv

Computational Mechanics: Pattern and Prediction, Structure and Simplicity

Computational mechanics, an approach to structural complexity, defines a process's causal states and gives a procedure for finding them. We show that the causal-state representation--an $ε$-machine--is the minimal one consistent with accurate prediction. We establish several results on $ε$-machine optimality and uniqueness and on how $ε$-machines compare to alternative representations. Further results relate measures of randomness and structural complexity obtained from $ε$-machines to those from ergodic and information theories.

preprint2000arXiv

Information Bottlenecks, Causal States, and Statistical Relevance Bases: How to Represent Relevant Information in Memoryless Transduction

Discovering relevant, but possibly hidden, variables is a key step in constructing useful and predictive theories about the natural world. This brief note explains the connections between three approaches to this problem: the recently introduced information-bottleneck method, the computational mechanics approach to inferring optimal models, and Salmon's statistical relevance basis.