Researcher profile

Ashwin Pananjady

Ashwin Pananjady contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2026arXiv

Multiscale replay: A robust algorithm for stochastic variational inequalities with a Markovian buffer

We introduce the Multiscale Experience Replay (MER) algorithm for solving a class of stochastic variational inequalities (VIs) in settings where samples are generated from a Markov chain and we have access to a memory buffer to store them. Rather than uniformly sampling from the buffer, MER utilizes a multi-scale sampling scheme to emulate the behavior of VI algorithms designed for independent and identically distributed samples, overcoming bias in the de facto serial scheme and thereby accelerating convergence. Notably, unlike standard sample-skipping variants of serial algorithms, MER is robust in that it achieves this acceleration in iteration complexity whenever possible, and without requiring knowledge of the mixing time of the Markov chain. We also discuss applications of MER, particularly in policy evaluation with temporal difference learning and in training generalized linear models with dependent data.

preprint2026arXiv

Predictive inference for time series: why is split conformal effective despite temporal dependence?

We consider the problem of uncertainty quantification for prediction in a time series: if we use past data to forecast the next time point, can we provide valid prediction intervals around our forecasts? To avoid placing distributional assumptions on the data, in recent years the conformal prediction method has been a popular approach for predictive inference, since it provides distribution-free coverage for any iid or exchangeable data distribution. However, in the time series setting, the strong empirical performance of conformal prediction methods is not well understood, since even short-range temporal dependence is a strong violation of the exchangeability assumption. Using predictors with "memory" -- i.e., predictors that utilize past observations, such as autoregressive models -- further exacerbates this problem. In this work, we examine the theoretical properties of split conformal prediction in the time series setting, including the case where predictors may have memory. Our results bound the loss of coverage of these methods in terms of a new "switch coefficient", measuring the extent to which temporal dependence within the time series creates violations of exchangeability. Our characterization of the coverage probability is sharp over the class of stationary, $β$-mixing processes. Along the way, we introduce tools that may prove useful in analyzing other predictive inference methods for dependent data.

preprint2023arXiv

Do algorithms and barriers for sparse principal component analysis extend to other structured settings?

We study a principal component analysis problem under the spiked Wishart model in which the structure in the signal is captured by a class of union-of-subspace models. This general class includes vanilla sparse PCA as well as its variants with graph sparsity. With the goal of studying these problems under a unified statistical and computational lens, we establish fundamental limits that depend on the geometry of the problem instance, and show that a natural projected power method exhibits local convergence to the statistically near-optimal neighborhood of the solution. We complement these results with end-to-end analyses of two important special cases given by path and tree sparsity in a general basis, showing initialization methods and matching evidence of computational hardness. Overall, our results indicate that several of the phenomena observed for vanilla sparse PCA extend in a natural fashion to its structured counterparts.

preprint2022arXiv

A Dual Accelerated Method for Online Stochastic Distributed Averaging: From Consensus to Decentralized Policy Evaluation

Motivated by decentralized sensing and policy evaluation problems, we consider a particular type of distributed stochastic optimization problem over a network, called the online stochastic distributed averaging problem. We design a dual-based method for this distributed consensus problem with Polyak--Ruppert averaging and analyze its behavior. We show that the proposed algorithm attains an accelerated deterministic error depending optimally on the condition number of the network, and also that it has an order-optimal stochastic error. This improves on the guarantees of state-of-the-art distributed stochastic optimization algorithms when specialized to this setting, and yields -- among other things -- corollaries for decentralized policy evaluation. Our proofs rely on explicitly studying the evolution of several relevant linear systems, and may be of independent interest. Numerical experiments are provided, which validate our theoretical results and demonstrate that our approach outperforms existing methods in finite-sample scenarios on several natural network topologies.

preprint2022arXiv

Accelerated and instance-optimal policy evaluation with linear function approximation

We study the problem of policy evaluation with linear function approximation and present efficient and practical algorithms that come with strong optimality guarantees. We begin by proving lower bounds that establish baselines on both the deterministic error and stochastic error in this problem. In particular, we prove an oracle complexity lower bound on the deterministic error in an instance-dependent norm associated with the stationary distribution of the transition kernel, and use the local asymptotic minimax machinery to prove an instance-dependent lower bound on the stochastic error in the i.i.d. observation model. Existing algorithms fail to match at least one of these lower bounds: To illustrate, we analyze a variance-reduced variant of temporal difference learning, showing in particular that it fails to achieve the oracle complexity lower bound. To remedy this issue, we develop an accelerated, variance-reduced fast temporal difference algorithm (VRFTD) that simultaneously matches both lower bounds and attains a strong notion of instance-optimality. Finally, we extend the VRFTD algorithm to the setting with Markovian observations, and provide instance-dependent convergence results. Our theoretical guarantees of optimality are corroborated by numerical experiments.

preprint2022arXiv

Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits

We introduce the "inverse bandit" problem of estimating the rewards of a multi-armed bandit instance from observing the learning process of a low-regret demonstrator. Existing approaches to the related problem of inverse reinforcement learning assume the execution of an optimal policy, and thereby suffer from an identifiability issue. In contrast, we propose to leverage the demonstrator's behavior en route to optimality, and in particular, the exploration phase, for reward estimation. We begin by establishing a general information-theoretic lower bound under this paradigm that applies to any demonstrator algorithm, which characterizes a fundamental tradeoff between reward estimation and the amount of exploration of the demonstrator. Then, we develop simple and efficient reward estimators for upper-confidence-based demonstrator algorithms that attain the optimal tradeoff, showing in particular that consistent reward estimation -- free of identifiability issues -- is possible under our paradigm. Extensive simulations on both synthetic and semi-synthetic data corroborate our theoretical results.

preprint2020arXiv

Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of these methods when applied to linear-quadratic systems, and study various settings of driving noise and reward feedback. We show that these methods provably converge to within any pre-specified tolerance of the optimal policy with a number of zero-order evaluations that is an explicit polynomial of the error tolerance, dimension, and curvature properties of the problem. Our analysis reveals some interesting differences between the settings of additive driving noise and random initialization, as well as the settings of one-point and two-point reward feedback. Our theory is corroborated by extensive simulations of derivative-free methods on these systems. Along the way, we derive convergence rates for stochastic zero-order optimization algorithms when applied to a certain class of non-convex problems.

preprint2020arXiv

Instance-dependent $\ell_\infty$-bounds for policy evaluation in tabular reinforcement learning

Markov reward processes (MRPs) are used to model stochastic phenomena arising in operations research, control engineering, robotics, and artificial intelligence, as well as communication and transportation networks. In many of these cases, such as in the policy evaluation problem encountered in reinforcement learning, the goal is to estimate the long-term value function of such a process without access to the underlying population transition and reward functions. Working with samples generated under the synchronous model, we study the problem of estimating the value function of an infinite-horizon, discounted MRP on finitely many states in the $\ell_\infty$-norm. We analyze both the standard plug-in approach to this problem and a more robust variant, and establish non-asymptotic bounds that depend on the (unknown) problem instance, as well as data-dependent bounds that can be evaluated based on the observations of state-transitions and rewards. We show that these approaches are minimax-optimal up to constant factors over natural sub-classes of MRPs. Our analysis makes use of a leave-one-out decoupling argument tailored to the policy evaluation problem, one which may be of independent interest.

preprint2020arXiv

Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis

We address the problem of policy evaluation in discounted Markov decision processes, and provide instance-dependent guarantees on the $\ell_\infty$-error under a generative model. We establish both asymptotic and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms. Theory-inspired simulations show that the widely-used temporal difference (TD) algorithm is strictly suboptimal when evaluated in a non-asymptotic setting, even when combined with Polyak-Ruppert iterate averaging. We remedy this issue by introducing and analyzing variance-reduced forms of stochastic approximation, showing that they achieve non-asymptotic, instance-dependent optimality up to logarithmic factors.