Researcher profile

Gugan Thoppe

Gugan Thoppe contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

Adversary-Robust Learning from Fully Asynchronous Directional Derivative Estimates

We propose FAR-SIGN (Fully Asynchronous Robust optimization via SIGNed directional projections) for adversary-resilient learning in parameter-server--worker systems. FAR-SIGN achieves robustness through sign-based updates along carefully designed directions and mitigates the resulting bias via a two-timescale mechanism. It admits both first-order and zeroth-order implementations and enables fully asynchronous execution without requiring a private reference dataset at the server. We establish almost-sure convergence of FAR-SIGN to the set of stationary points for smooth, nonconvex objectives. Moreover, we prove the near-optimal rate of $O(n^{-1/4+ε})$ in the first-order setting and the standard $O(n^{-1/6+ε})$ in the zeroth-order setting, where $n$ is the iteration count and $ε>0$ can be chosen arbitrarily small. Experiments on MNIST show that FAR-SIGN outperforms robust aggregation-based methods in both accuracy and wall-clock time.

preprint2026arXiv

Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs

Reinforcement learning (RL) for exponential-utility optimization in discounted Markov decision processes (MDPs) lacks principled value-based algorithms. We address this gap in the fixed risk-aversion setting. Building on the Bellman-type equation for exponential utility studied in \cite{porteus1975optimality}, we derive two Q-value-style extensions and show that the associated operators are contractions in the $L_\infty$ and sup-log/Thompson metrics, respectively. We characterize their fixed points and prove that the induced greedy stationary policy is optimal for the exponential-utility objective among stationary policies. These structural results lead to two model-free algorithms: a two-timescale Q-learning--style algorithm, for which we establish almost-sure convergence and provide finite-time convergence rates via timescale separation, and a one-timescale algorithm governed by a sublinear power-law operator. Since the latter does not admit a global contraction in standard metrics, we prove its convergence using delicate arguments based on local Lipschitzness, monotonicity, homogeneity, and Dini derivatives, and provide a scalar finite-time analysis that highlights the challenges in obtaining convergence rates in the vector case. Our work provides a foundation for value-based RL under exponential-utility objectives.

preprint2022arXiv

A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning

In Multi-Agent Reinforcement Learning (MARL), multiple agents interact with a common environment, as also with each other, for solving a shared problem in sequential decision-making. It has wide-ranging applications in gaming, robotics, finance, etc. In this work, we derive a novel law of iterated logarithm for a family of distributed nonlinear stochastic approximation schemes that is useful in MARL. In particular, our result describes the convergence rate on almost every sample path where the algorithm converges. This result is the first of its kind in the distributed setup and provides deeper insights than the existing ones, which only discuss convergence rates in the expected or the CLT sense. Importantly, our result holds under significantly weaker assumptions: neither the gossip matrix needs to be doubly stochastic nor the stepsizes square summable. As an application, we show that, for the stepsize $n^{-γ}$ with $γ\in (0, 1),$ the distributed TD(0) algorithm with linear function approximation has a convergence rate of $O(\sqrt{n^{-γ} \ln n })$ a.s.; for the $1/n$ type stepsize, the same is $O(\sqrt{n^{-1} \ln \ln n})$ a.s. These decay rates do not depend on the graph depicting the interactions among the different agents.

preprint2022arXiv

Does Momentum Help? A Sample Complexity Analysis

Stochastic Heavy Ball (SHB) and Nesterov's Accelerated Stochastic Gradient (ASG) are popular momentum methods in stochastic optimization. While benefits of such acceleration ideas in deterministic settings are well understood, their advantages in stochastic optimization is still unclear. In fact, in some specific instances, it is known that momentum does not help in the sample complexity sense. Our work shows that a similar outcome actually holds for the whole of quadratic optimization. Specifically, we obtain a lower bound on the sample complexity of SHB and ASG for this family and show that the same bound can be achieved by the vanilla SGD. We note that there exist results claiming the superiority of momentum based methods in quadratic optimization, but these are based on one-sided or flawed analyses.

preprint2021arXiv

Limit theorems for topological invariants of the dynamic multi-parameter simplicial complex

Topological study of existing random simplicial complexes is non-trivial and has led to several seminal works. However, the applicability of such studies is limited since the randomness there is usually governed by a single parameter. With this in mind, we focus here on the topology of the recently proposed multi-parameter random simplicial complex and, more importantly, of its dynamic analogue that we introduce here. In this dynamic setup, the temporal evolution of simplices is determined by stationary and possibly non-Markovian processes with a renewal structure. The dynamic versions of the clique complex and the Linial-Meshulum complex are special cases of our setup. Our key result concerns the regime where face-counts of a particular dimension dominate. We show that the Betti numbers corresponding to this dimension and the Euler characteristic satisfy functional strong law of large numbers and functional central limit theorems. Surprisingly, in the latter result, the limiting Gaussian process depends only upon the dynamics in the smallest non-trivial dimension.

preprint2020arXiv

Change Rate Estimation and Optimal Freshness in Web Page Crawling

For providing quick and accurate results, a search engine maintains a local snapshot of the entire web. And, to keep this local cache fresh, it employs a crawler for tracking changes across various web pages. However, finite bandwidth availability and server restrictions impose some constraints on the crawling frequency. Consequently, the ideal crawling rates are the ones that maximise the freshness of the local cache and also respect the above constraints. Azar et al. 2018 recently proposed a tractable algorithm to solve this optimisation problem. However, they assume the knowledge of the exact page change rates, which is unrealistic in practice. We address this issue here. Specifically, we provide two novel schemes for online estimation of page change rates. Both schemes only need partial information about the page change process, i.e., they only need to know if the page has changed or not since the last crawled instance. For both these schemes, we prove convergence and, also, derive their convergence rates. Finally, we provide some numerical experiments to compare the performance of our proposed estimators with the existing ones (e.g., MLE).

preprint2020arXiv

Randomly Weighted $d-$complexes: Minimal Spanning Acycles and Persistence Diagrams

A weighted $d-$complex is a simplicial complex of dimension $d$ in which each face is assigned a real-valued weight. We derive three key results here concerning persistence diagrams and minimal spanning acycles (MSAs) of such complexes. First, we establish an equivalence between the MSA face-weights and \emph{death times} in the persistence diagram. Next, we show a novel stability result for the MSA face-weights which, due to our first result, also holds true for the death and birth times, separately. Our final result concerns a perturbation of a mean-field model of randomly weighted $d-$complexes. The $d-$face weights here are perturbation of some i.i.d. distribution while all the lower-dimensional faces have a weight of $0$. If the perturbations decay sufficiently quickly, we show that suitably scaled extremal nearest face-weights, face-weights of the $d-$MSA, and the associated death times converge to an inhomogeneous Poisson point process. This result completely characterizes the extremal points of persistence diagrams and MSAs. The point process convergence and the asymptotic equivalence of three point processes are new for any weighted random complex model, including even the non-perturbed case. Lastly, as a consequence of our stability result, we show that Frieze's $ζ(3)$ limit for random minimal spanning trees and the recent extension to random MSAs by Hino and Kanazawa also hold in suitable noisy settings.