Source author record

Ashwin Pananjady

Ashwin Pananjady appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.OC Data Structures and Algorithms Information Theory math.IT math.ST Networking and Internet Architecture Statistics Theory math.PR Artificial Intelligence Discrete Mathematics eess.SY math.FA Multiagent Systems Robotics Social and Information Networks Systems and Control

Catalog footprint

What is connected

16works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Multiscale replay: A robust algorithm for stochastic variational inequalities with a Markovian buffer

We introduce the Multiscale Experience Replay (MER) algorithm for solving a class of stochastic variational inequalities (VIs) in settings where samples are generated from a Markov chain and we have access to a memory buffer to store them. Rather than uniformly sampling from the buffer, MER utilizes a multi-scale sampling scheme to emulate the behavior of VI algorithms designed for independent and identically distributed samples, overcoming bias in the de facto serial scheme and thereby accelerating convergence. Notably, unlike standard sample-skipping variants of serial algorithms, MER is robust in that it achieves this acceleration in iteration complexity whenever possible, and without requiring knowledge of the mixing time of the Markov chain. We also discuss applications of MER, particularly in policy evaluation with temporal difference learning and in training generalized linear models with dependent data.

preprint2026arXiv

Predictive inference for time series: why is split conformal effective despite temporal dependence?

We consider the problem of uncertainty quantification for prediction in a time series: if we use past data to forecast the next time point, can we provide valid prediction intervals around our forecasts? To avoid placing distributional assumptions on the data, in recent years the conformal prediction method has been a popular approach for predictive inference, since it provides distribution-free coverage for any iid or exchangeable data distribution. However, in the time series setting, the strong empirical performance of conformal prediction methods is not well understood, since even short-range temporal dependence is a strong violation of the exchangeability assumption. Using predictors with "memory" -- i.e., predictors that utilize past observations, such as autoregressive models -- further exacerbates this problem. In this work, we examine the theoretical properties of split conformal prediction in the time series setting, including the case where predictors may have memory. Our results bound the loss of coverage of these methods in terms of a new "switch coefficient", measuring the extent to which temporal dependence within the time series creates violations of exchangeability. Our characterization of the coverage probability is sharp over the class of stationary, $β$-mixing processes. Along the way, we introduce tools that may prove useful in analyzing other predictive inference methods for dependent data.

preprint2023arXiv

Do algorithms and barriers for sparse principal component analysis extend to other structured settings?

We study a principal component analysis problem under the spiked Wishart model in which the structure in the signal is captured by a class of union-of-subspace models. This general class includes vanilla sparse PCA as well as its variants with graph sparsity. With the goal of studying these problems under a unified statistical and computational lens, we establish fundamental limits that depend on the geometry of the problem instance, and show that a natural projected power method exhibits local convergence to the statistically near-optimal neighborhood of the solution. We complement these results with end-to-end analyses of two important special cases given by path and tree sparsity in a general basis, showing initialization methods and matching evidence of computational hardness. Overall, our results indicate that several of the phenomena observed for vanilla sparse PCA extend in a natural fashion to its structured counterparts.

preprint2022arXiv

A Dual Accelerated Method for Online Stochastic Distributed Averaging: From Consensus to Decentralized Policy Evaluation

Motivated by decentralized sensing and policy evaluation problems, we consider a particular type of distributed stochastic optimization problem over a network, called the online stochastic distributed averaging problem. We design a dual-based method for this distributed consensus problem with Polyak--Ruppert averaging and analyze its behavior. We show that the proposed algorithm attains an accelerated deterministic error depending optimally on the condition number of the network, and also that it has an order-optimal stochastic error. This improves on the guarantees of state-of-the-art distributed stochastic optimization algorithms when specialized to this setting, and yields -- among other things -- corollaries for decentralized policy evaluation. Our proofs rely on explicitly studying the evolution of several relevant linear systems, and may be of independent interest. Numerical experiments are provided, which validate our theoretical results and demonstrate that our approach outperforms existing methods in finite-sample scenarios on several natural network topologies.

preprint2022arXiv

Accelerated and instance-optimal policy evaluation with linear function approximation

We study the problem of policy evaluation with linear function approximation and present efficient and practical algorithms that come with strong optimality guarantees. We begin by proving lower bounds that establish baselines on both the deterministic error and stochastic error in this problem. In particular, we prove an oracle complexity lower bound on the deterministic error in an instance-dependent norm associated with the stationary distribution of the transition kernel, and use the local asymptotic minimax machinery to prove an instance-dependent lower bound on the stochastic error in the i.i.d. observation model. Existing algorithms fail to match at least one of these lower bounds: To illustrate, we analyze a variance-reduced variant of temporal difference learning, showing in particular that it fails to achieve the oracle complexity lower bound. To remedy this issue, we develop an accelerated, variance-reduced fast temporal difference algorithm (VRFTD) that simultaneously matches both lower bounds and attains a strong notion of instance-optimality. Finally, we extend the VRFTD algorithm to the setting with Markovian observations, and provide instance-dependent convergence results. Our theoretical guarantees of optimality are corroborated by numerical experiments.

preprint2022arXiv

Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits

We introduce the "inverse bandit" problem of estimating the rewards of a multi-armed bandit instance from observing the learning process of a low-regret demonstrator. Existing approaches to the related problem of inverse reinforcement learning assume the execution of an optimal policy, and thereby suffer from an identifiability issue. In contrast, we propose to leverage the demonstrator's behavior en route to optimality, and in particular, the exploration phase, for reward estimation. We begin by establishing a general information-theoretic lower bound under this paradigm that applies to any demonstrator algorithm, which characterizes a fundamental tradeoff between reward estimation and the amount of exploration of the demonstrator. Then, we develop simple and efficient reward estimators for upper-confidence-based demonstrator algorithms that attain the optimal tradeoff, showing in particular that consistent reward estimation -- free of identifiability issues -- is possible under our paradigm. Extensive simulations on both synthetic and semi-synthetic data corroborate our theoretical results.

preprint2020arXiv

Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of these methods when applied to linear-quadratic systems, and study various settings of driving noise and reward feedback. We show that these methods provably converge to within any pre-specified tolerance of the optimal policy with a number of zero-order evaluations that is an explicit polynomial of the error tolerance, dimension, and curvature properties of the problem. Our analysis reveals some interesting differences between the settings of additive driving noise and random initialization, as well as the settings of one-point and two-point reward feedback. Our theory is corroborated by extensive simulations of derivative-free methods on these systems. Along the way, we derive convergence rates for stochastic zero-order optimization algorithms when applied to a certain class of non-convex problems.

preprint2020arXiv

Instance-dependent $\ell_\infty$-bounds for policy evaluation in tabular reinforcement learning

Markov reward processes (MRPs) are used to model stochastic phenomena arising in operations research, control engineering, robotics, and artificial intelligence, as well as communication and transportation networks. In many of these cases, such as in the policy evaluation problem encountered in reinforcement learning, the goal is to estimate the long-term value function of such a process without access to the underlying population transition and reward functions. Working with samples generated under the synchronous model, we study the problem of estimating the value function of an infinite-horizon, discounted MRP on finitely many states in the $\ell_\infty$-norm. We analyze both the standard plug-in approach to this problem and a more robust variant, and establish non-asymptotic bounds that depend on the (unknown) problem instance, as well as data-dependent bounds that can be evaluated based on the observations of state-transitions and rewards. We show that these approaches are minimax-optimal up to constant factors over natural sub-classes of MRPs. Our analysis makes use of a leave-one-out decoupling argument tailored to the policy evaluation problem, one which may be of independent interest.

preprint2020arXiv

Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis

We address the problem of policy evaluation in discounted Markov decision processes, and provide instance-dependent guarantees on the $\ell_\infty$-error under a generative model. We establish both asymptotic and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms. Theory-inspired simulations show that the widely-used temporal difference (TD) algorithm is strictly suboptimal when evaluated in a non-asymptotic setting, even when combined with Polyak-Ruppert iterate averaging. We remedy this issue by introducing and analyzing variance-reduced forms of stochastic approximation, showing that they achieve non-asymptotic, instance-dependent optimality up to logarithmic factors.

preprint2016arXiv

Linear Regression with an Unknown Permutation: Statistical and Computational Limits

Consider a noisy linear observation model with an unknown permutation, based on observing $y = Π^* A x^* + w$, where $x^* \in \mathbb{R}^d$ is an unknown vector, $Π^*$ is an unknown $n \times n$ permutation matrix, and $w \in \mathbb{R}^n$ is additive Gaussian noise. We analyze the problem of permutation recovery in a random design setting in which the entries of the matrix $A$ are drawn i.i.d. from a standard Gaussian distribution, and establish sharp conditions on the SNR, sample size $n$, and dimension $d$ under which $Π^*$ is exactly and approximately recoverable. On the computational front, we show that the maximum likelihood estimate of $Π^*$ is NP-hard to compute, while also providing a polynomial time algorithm when $d =1$.

preprint2016arXiv

Wasserstein Stability of the Entropy Power Inequality for Log-Concave Densities

We establish quantitative stability results for the entropy power inequality (EPI). Specifically, we show that if uniformly log-concave densities nearly saturate the EPI, then they must be close to Gaussian densities in the quadratic Wasserstein distance. Further, if one of the densities is log-concave and the other is Gaussian, then the deficit in the EPI can be controlled in terms of the $L^1$-Wasserstein distance. As a counterpoint, an example shows that the EPI can be unstable with respect to the quadratic Wasserstein distance when densities are uniformly log-concave on sets of measure arbitrarily close to one. Our stability results can be extended to non-log-concave densities, provided certain regularity conditions are met. The proofs are based on optimal transportation.

preprint2015arXiv

Compressing Sparse Sequences under Local Decodability Constraints

We consider a variable-length source coding problem subject to local decodability constraints. In particular, we investigate the blocklength scaling behavior attainable by encodings of $r$-sparse binary sequences, under the constraint that any source bit can be correctly decoded upon probing at most $d$ codeword bits. We consider both adaptive and non-adaptive access models, and derive upper and lower bounds that often coincide up to constant factors. Notably, such a characterization for the fixed-blocklength analog of our problem remains unknown, despite considerable research over the last three decades. Connections to communication complexity are also briefly discussed.

preprint2014arXiv

On the Complexity of Making a Distinguished Vertex Minimum or Maximum Degree by Vertex Deletion

In this paper, we investigate the approximability of two node deletion problems. Given a vertex weighted graph $G=(V,E)$ and a specified, or "distinguished" vertex $p \in V$, MDD(min) is the problem of finding a minimum weight vertex set $S \subseteq V\setminus \{p\}$ such that $p$ becomes the minimum degree vertex in $G[V \setminus S]$; and MDD(max) is the problem of finding a minimum weight vertex set $S \subseteq V\setminus \{p\}$ such that $p$ becomes the maximum degree vertex in $G[V \setminus S]$. These are known $NP$-complete problems and have been studied from the parameterized complexity point of view in previous work. Here, we prove that for any $ε> 0$, both the problems cannot be approximated within a factor $(1 - ε)\log n$, unless $NP \subseteq DTIME(n^{\log\log n})$. We also show that for any $ε> 0$, MDD(min) cannot be approximated within a factor $(1 -ε)\log n$ on bipartite graphs, unless $NP \subseteq DTIME(n^{\log\log n})$, and that for any $ε> 0$, MDD(max) cannot be approximated within a factor $(1/2 - ε)\log n$ on bipartite graphs, unless $NP \subseteq DTIME(n^{\log\log n})$. We give an $O(\log n)$ factor approximation algorithm for MDD(max) on general graphs, provided the degree of $p$ is $O(\log n)$. We then show that if the degree of $p$ is $n-O(\log n)$, a similar result holds for MDD(min). We prove that MDD(max) is $APX$-complete on 3-regular unweighted graphs and provide an approximation algorithm with ratio $1.583$ when $G$ is a 3-regular unweighted graph. In addition, we show that MDD(min) can be solved in polynomial time when $G$ is a regular graph of constant degree.

preprint2014arXiv

Optimally Approximating the Coverage Lifetime of Wireless Sensor Networks

We consider the problem of maximizing the lifetime of coverage (MLCP) of targets in a wireless sensor network with battery-limited sensors. We first show that the MLCP cannot be approximated within a factor less than $\ln n$ by any polynomial time algorithm, where $n$ is the number of targets. This provides closure to the long-standing open problem of showing optimality of previously known $\ln n$ approximation algorithms. We also derive a new $\ln n$ approximation to the MLCP by showing a $\ln n$ approximation to the maximum disjoint set cover problem (DSCP), which has many advantages over previous MLCP algorithms, including an easy extension to the $k$-coverage problem. We then present an improvement (in certain cases) to the $\ln n$ algorithm in terms of a newly defined quantity "expansiveness" of the network. For the special one-dimensional case, where each sensor can monitor a contiguous region of possibly different lengths, we show that the MLCP solution is equal to the DSCP solution, and can be found in polynomial time. Finally, for the special two-dimensional case, where each sensor can monitor a circular area with a given radius around itself, we combine existing results to derive a $1+ε$ approximation algorithm for solving MLCP for any $ε>0$.

preprint2014arXiv

The Online Disjoint Set Cover Problem and its Applications

Given a universe $U$ of $n$ elements and a collection of subsets $\mathcal{S}$ of $U$, the maximum disjoint set cover problem (DSCP) is to partition $\mathcal{S}$ into as many set covers as possible, where a set cover is defined as a collection of subsets whose union is $U$. We consider the online DSCP, in which the subsets arrive one by one (possibly in an order chosen by an adversary), and must be irrevocably assigned to some partition on arrival with the objective of minimizing the competitive ratio. The competitive ratio of an online DSCP algorithm $A$ is defined as the maximum ratio of the number of disjoint set covers obtained by the optimal offline algorithm to the number of disjoint set covers obtained by $A$ across all inputs. We propose an online algorithm for solving the DSCP with competitive ratio $\ln n$. We then show a lower bound of $Ω(\sqrt{\ln n})$ on the competitive ratio for any online DSCP algorithm. The online disjoint set cover problem has wide ranging applications in practice, including the online crowd-sourcing problem, the online coverage lifetime maximization problem in wireless sensor networks, and in online resource allocation problems.

preprint2013arXiv

Maximizing Utility Among Selfish Users in Social Groups

We consider the problem of a social group of users trying to obtain a "universe" of files, first from a server and then via exchange amongst themselves. We consider the selfish file-exchange paradigm of give-and-take, whereby two users can exchange files only if each has something unique to offer the other. We are interested in maximizing the number of users who can obtain the universe through a schedule of file-exchanges. We first present a practical paradigm of file acquisition. We then present an algorithm which ensures that at least half the users obtain the universe with high probability for $n$ files and $m=O(\log n)$ users when $n\rightarrow\infty$, thereby showing an approximation ratio of 2. Extending these ideas, we show a $1+ε_1$ - approximation algorithm for $m=O(n)$, $ε_1>0$ and a $(1+z)/2 +ε_2$ - approximation algorithm for $m=O(n^z)$, $z>1$, $ε_2>0$. Finally, we show that for any $m=O(e^{o(n)})$, there exists a schedule of file exchanges which ensures that at least half the users obtain the universe.

Ashwin Pananjady

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

Multiscale replay: A robust algorithm for stochastic variational inequalities with a Markovian buffer

Predictive inference for time series: why is split conformal effective despite temporal dependence?

Do algorithms and barriers for sparse principal component analysis extend to other structured settings?

A Dual Accelerated Method for Online Stochastic Distributed Averaging: From Consensus to Decentralized Policy Evaluation

Accelerated and instance-optimal policy evaluation with linear function approximation

Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits

Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

Instance-dependent $\ell_\infty$-bounds for policy evaluation in tabular reinforcement learning

Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis

Linear Regression with an Unknown Permutation: Statistical and Computational Limits

Wasserstein Stability of the Entropy Power Inequality for Log-Concave Densities

Compressing Sparse Sequences under Local Decodability Constraints

On the Complexity of Making a Distinguished Vertex Minimum or Maximum Degree by Vertex Deletion

Optimally Approximating the Coverage Lifetime of Wireless Sensor Networks

The Online Disjoint Set Cover Problem and its Applications

Maximizing Utility Among Selfish Users in Social Groups