Researcher profile

Koulik Khamaru

Koulik Khamaru contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

Avoiding the Price of Adaptivity: Inference in Linear Contextual Bandits via Stability

Statistical inference in contextual bandits is challenging due to the adaptive, non-i.i.d. nature of the data. A growing body of work shows that classical least-squares inference can fail under adaptive sampling, and that valid confidence intervals for linear functionals typically require an inflation of order $\sqrt{d \log T}$. This phenomenon -- often termed the price of adaptivity -- reflects the intrinsic difficulty of reliable inference under general contextual bandit policies. A key structural condition that overcomes this limitation is the stability condition of Lai and Wei, which requires the empirical feature covariance to converge to a deterministic limit. When stability holds, the ordinary least-squares estimator satisfies a central limit theorem, and classical Wald-type confidence intervals remain asymptotically valid under adaptation, without incurring the $\sqrt{d \log T}$ price of adaptivity. In this paper, we propose and analyze a regularized EXP4 algorithm for linear contextual bandits. Our first main result shows that this procedure satisfies the Lai--Wei stability condition and therefore admits valid Wald-type confidence intervals for linear functionals. We additionally provide quantitative rates of convergence in the associated central limit theorem. Our second result establishes that the same algorithm achieves regret guarantees that are minimax optimal up to logarithmic factors, demonstrating that stability and statistical efficiency can coexist within a single contextual bandit method. As an application of our theory, we show how it can be used to construct confidence intervals for the conditional average treatment effect (CATE) under adaptively collected data. Finally, we complement our theory with simulations illustrating the empirical normality of the resulting estimators and the sharpness of the corresponding confidence intervals.

preprint2024arXiv

PICS: A sequential approach to obtain optimal designs for non-linear models leveraging closed-form solutions for faster convergence

D-Optimal designs for estimating parameters of response models are derived by maximizing the determinant of the Fisher information matrix. For non-linear models, the Fisher information matrix depends on the unknown parameter vector of interest, leading to a weird situation that in order to obtain the D-optimal design, one needs to have knowledge of the parameter to be estimated. One solution to this problem is to choose the design points sequentially, optimizing the D-optimality criterion using parameter estimates based on available data, followed by updating the parameter estimates using maximum likelihood estimation. On the other hand, there are many non-linear models for which closed-form results for D-optimal designs are available, but because such solutions involve the parameters to be estimated, they can only be used by substituting "guestimates" of parameters. In this paper, a hybrid sequential strategy called PICS (Plug into closed-form solution) is proposed that replaces the optimization of the objective function at every single step by a draw from the probability distribution induced by the known optimal design by plugging in the current estimates. Under regularity conditions, asymptotic normality of the sequence of estimators generated by this approach are established. Usefulness of this approach in terms of saving computational time and achieving greater efficiency of estimation compared to the standard sequential approach are demonstrated with simulations conducted from two different sets of models.

preprint2022arXiv

Instability, Computational Efficiency and Statistical Accuracy

Many statistical estimators are defined as the fixed point of a data-dependent operator, with estimators based on minimizing a cost function being an important special case. The limiting performance of such estimators depends on the properties of the population-level operator in the idealized limit of infinitely many samples. We develop a general framework that yields bounds on statistical accuracy based on the interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (in)stability when applied to an empirical object based on $n$ samples. Using this framework, we analyze both stable forms of gradient descent and some higher-order and unstable algorithms, including Newton's method and its cubic-regularized variant, as well as the EM algorithm. We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models. We exhibit cases in which an unstable algorithm can achieve the same statistical accuracy as a stable algorithm in exponentially fewer steps -- namely, with the number of iterations being reduced from polynomial to logarithmic in sample size $n$.

preprint2022arXiv

Instance-Dependent Confidence and Early Stopping for Reinforcement Learning

Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure. Such problem-dependent behavior is not captured by worst-case analyses and has accordingly inspired a growing effort in obtaining instance-dependent guarantees and deriving instance-optimal algorithms for RL problems. This research has been carried out, however, primarily within the confines of theory, providing guarantees that explain \textit{ex post} the performance differences observed. A natural next step is to convert these theoretical guarantees into guidelines that are useful in practice. We address the problem of obtaining sharp instance-dependent confidence regions for the policy evaluation problem and the optimal value estimation problem of an MDP, given access to an instance-optimal algorithm. As a consequence, we propose a data-dependent stopping rule for instance-optimal algorithms. The proposed stopping rule adapts to the instance-specific difficulty of the problem and allows for early termination for problems with favorable structure.

preprint2020arXiv

Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of these methods when applied to linear-quadratic systems, and study various settings of driving noise and reward feedback. We show that these methods provably converge to within any pre-specified tolerance of the optimal policy with a number of zero-order evaluations that is an explicit polynomial of the error tolerance, dimension, and curvature properties of the problem. Our analysis reveals some interesting differences between the settings of additive driving noise and random initialization, as well as the settings of one-point and two-point reward feedback. Our theory is corroborated by extensive simulations of derivative-free methods on these systems. Along the way, we derive convergence rates for stochastic zero-order optimization algorithms when applied to a certain class of non-convex problems.

preprint2020arXiv

Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis

We address the problem of policy evaluation in discounted Markov decision processes, and provide instance-dependent guarantees on the $\ell_\infty$-error under a generative model. We establish both asymptotic and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms. Theory-inspired simulations show that the widely-used temporal difference (TD) algorithm is strictly suboptimal when evaluated in a non-asymptotic setting, even when combined with Polyak-Ruppert iterate averaging. We remedy this issue by introducing and analyzing variance-reduced forms of stochastic approximation, showing that they achieve non-asymptotic, instance-dependent optimality up to logarithmic factors.

preprint2020arXiv

Singularity, Misspecification, and the Convergence Rate of EM

A line of recent work has analyzed the behavior of the Expectation-Maximization (EM) algorithm in the well-specified setting, in which the population likelihood is locally strongly concave around its maximizing argument. Examples include suitably separated Gaussian mixture models and mixtures of linear regressions. We consider over-specified settings in which the number of fitted components is larger than the number of components in the true distribution. Such misspecified settings can lead to singularity in the Fisher information matrix, and moreover, the maximum likelihood estimator based on $n$ i.i.d. samples in $d$ dimensions can have a non-standard $\mathcal{O}((d/n)^{\frac{1}{4}})$ rate of convergence. Focusing on the simple setting of two-component mixtures fit to a $d$-dimensional Gaussian distribution, we study the behavior of the EM algorithm both when the mixture weights are different (unbalanced case), and are equal (balanced case). Our analysis reveals a sharp distinction between these two cases: in the former, the EM algorithm converges geometrically to a point at Euclidean distance of $\mathcal{O}((d/n)^{\frac{1}{2}})$ from the true parameter, whereas in the latter case, the convergence rate is exponentially slower, and the fixed point has a much lower $\mathcal{O}((d/n)^{\frac{1}{4}})$ accuracy. Analysis of this singular case requires the introduction of some novel techniques: in particular, we make use of a careful form of localization in the associated empirical process, and develop a recursive argument to progressively sharpen the statistical rate.