Researcher profile

Vivek S. Borkar

Vivek S. Borkar contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2026arXiv

A Concentration Bound for TD(0) with Function Approximation

We derive uniform all-time concentration bound of the type 'for all $n \geq n_0$ for some $n_0$' for TD(0) with linear function approximation. We work with online TD learning with samples from a single sample path of the underlying Markov chain. This makes our analysis significantly different from offline TD learning or TD learning with access to independent samples from the stationary distribution of the Markov chain. We treat TD(0) as a contractive stochastic approximation algorithm, with both martingale and Markov noises. Markov noise is handled using the Poisson equation and the lack of almost sure guarantees on boundedness of iterates is handled using the concept of relaxed concentration inequalities.

preprint2026arXiv

Adynamical systems view of training generativemodels and the memorization phenomenon

Using recent works of one of the authors (VSB) on collapse in generative models and two time scale dynamics in stochastic gradient descent in high dimensions, we give a system theoretic explanation of the memorization phenomenon in generative models. This relies purely on the dynamic aspects of the training phase. Specifically, we use a result of Austin [2016] to motivate a stylized model for the loss function for stochastic gradient descent (SGD) wherein the loss function has a strong dependence on some variables and weak dependence on the rest in a precise sense. This naturally leads to two distinct time scales in the constant step size SGD that is commonly used in machine learning. This fact has been used to explain the double descent phenomenon in SGD in Borkar [2026]. In conjunction with a mathematical model for collapse phenomenon in SGD developed in Borkar [2025a], we analyze the constant step size SGD using the recent results of Azizian et al. [2024] in order to explain the phenomenon of memorization wherein a generative model that is concurrently being tuned yields the same or similar outputs for significant stretches of time. This gives a novel perspective on the aforementioned phenomena reported in machine learning literature and their interrelationships, using a dynamical systems viewpoint.

preprint2025arXiv

Lagrangian Index Policy for Restless Bandits with Average Reward

We study the Lagrangian Index Policy (LIP) for restless multi-armed bandits with long-run average reward. In particular, we compare the performance of LIP with the performance of the Whittle Index Policy (WIP), both heuristic policies known to be asymptotically optimal under certain natural conditions. Even though in most cases their performances are very similar, in the cases when WIP shows bad performance, LIP continues to perform very well. We then propose reinforcement learning algorithms, both tabular and NN-based, to obtain online learning schemes for LIP in the model-free setting. The proposed reinforcement learning schemes for LIP require significantly less memory than the analogous schemes for WIP. We calculate analytically the Lagrangian index for the restart model, which applies to the optimal web crawling and the minimization of the weighted age of information. We also give a new proof of asymptotic optimality in case of homogeneous arms as the number of arms goes to infinity, based on exchangeability and de Finetti's theorem.

preprint2022arXiv

A selection procedure for extracting the unique Feller weak solution of degenerate diffusions

In this work, we show that for the martingale problem for a class of degenerate diffusions with bounded continuous drift and diffusion coefficients, the small noise limit of non-degenerate approximations leads to a unique Feller limit. The proof uses the theory of viscosity solutions applied to the associated backward Kolmogorov equations. Under appropriate conditions on drift and diffusion coefficients, we will establish a comparison principle and a one-one correspondence between Feller solutions to the martingale problem and continuous viscosity solutions of the associated Kolmogorov equation. This work can be considered as an extension to the work of V. S. Borkar and K. S. Kumar (2010).

preprint2022arXiv

Ergodic Risk-sensitive control -- A survey

Risk-sensitive control has received considerable interest since the seminal work of Howard and Matheson [120] because of its ability to account for fluctuations about the mean, its connection with $H_\infty$ control, and its application to financial mathematics. In this article, we attempt to put together a comprehensive survey on the research done on ergodic risk-sensitive control over the last four decades.

preprint2022arXiv

Scheduling in Wireless Networks using Whittle Index Theory

We consider the problem of scheduling packet transmissions in a wireless network of users while minimizing the energy consumed and the transmission delay. A challenge is that transmissions of users that are close to each other mutually interfere, while users that are far apart can transmit simultaneously without much interference. Each user has a queue of packets that are transmitted on a single channel and mutually non interfering users reuse the spectrum. Using the theory of Whittle index for cost minimizing restless bandits, we design four index-based policies and compare their performance with that of the well-known policies: Slotted ALOHA, maximum weight scheduling, quadratic Lyapunov drift, Cella and Cesa Bianchi algorithm, and two Whittle index based policies from a recently published paper. We make the code used to perform our simulations publicly available, so that it can be used for future work by the research community at large.

preprint2020arXiv

A variational characterization of the risk-sensitive average reward for controlled diffusions on $\mathbb{R}^d$

We address the variational formulation of the risk-sensitive reward problem for non-degenerate diffusions on $\mathbb{R}^d$ controlled through the drift. We establish a variational formula on the whole space and also show that the risk-sensitive value equals the generalized principal eigenvalue of the semilinear operator. This can be viewed as a controlled version of the variational formulas for principal eigenvalues of diffusion operators arising in large deviations. We also revisit the average risk-sensitive minimization problem and by employing a gradient estimate developed in this paper, we extend earlier results to unbounded drifts and running costs.

preprint2020arXiv

Scheduling in Wireless Networks with Spatial Reuse of Spectrum as Restless Bandits

We study the problem of scheduling packet transmissions with the aim of minimizing the energy consumption and data transmission delay of users in a wireless network in which spatial reuse of spectrum is employed. We approach this problem using the theory of Whittle index for cost minimizing restless bandits, which has been used to effectively solve problems in a variety of applications. We design two Whittle index based policies the first by treating the graph representing the network as a clique and the second based on interference constraints derived from the original graph. We evaluate the performance of these two policies via extensive simulations, in terms of average cost and packets dropped, and show that they outperform the well-known Slotted ALOHA and maximum weight scheduling algorithms.