Source author record

Vivek S. Borkar

Vivek S. Borkar appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC math.PR Machine Learning Systems and Control Artificial Intelligence Distributed, Parallel, and Cluster Computing eess.SY Information Theory math.IT Networking and Internet Architecture Computation eess.SP math.AP Social and Information Networks

Catalog footprint

What is connected

19works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Concentration Bound for TD(0) with Function Approximation

We derive uniform all-time concentration bound of the type 'for all $n \geq n_0$ for some $n_0$' for TD(0) with linear function approximation. We work with online TD learning with samples from a single sample path of the underlying Markov chain. This makes our analysis significantly different from offline TD learning or TD learning with access to independent samples from the stationary distribution of the Markov chain. We treat TD(0) as a contractive stochastic approximation algorithm, with both martingale and Markov noises. Markov noise is handled using the Poisson equation and the lack of almost sure guarantees on boundedness of iterates is handled using the concept of relaxed concentration inequalities.

preprint2026arXiv

Adynamical systems view of training generativemodels and the memorization phenomenon

Using recent works of one of the authors (VSB) on collapse in generative models and two time scale dynamics in stochastic gradient descent in high dimensions, we give a system theoretic explanation of the memorization phenomenon in generative models. This relies purely on the dynamic aspects of the training phase. Specifically, we use a result of Austin [2016] to motivate a stylized model for the loss function for stochastic gradient descent (SGD) wherein the loss function has a strong dependence on some variables and weak dependence on the rest in a precise sense. This naturally leads to two distinct time scales in the constant step size SGD that is commonly used in machine learning. This fact has been used to explain the double descent phenomenon in SGD in Borkar [2026]. In conjunction with a mathematical model for collapse phenomenon in SGD developed in Borkar [2025a], we analyze the constant step size SGD using the recent results of Azizian et al. [2024] in order to explain the phenomenon of memorization wherein a generative model that is concurrently being tuned yields the same or similar outputs for significant stretches of time. This gives a novel perspective on the aforementioned phenomena reported in machine learning literature and their interrelationships, using a dynamical systems viewpoint.

preprint2025arXiv

Lagrangian Index Policy for Restless Bandits with Average Reward

We study the Lagrangian Index Policy (LIP) for restless multi-armed bandits with long-run average reward. In particular, we compare the performance of LIP with the performance of the Whittle Index Policy (WIP), both heuristic policies known to be asymptotically optimal under certain natural conditions. Even though in most cases their performances are very similar, in the cases when WIP shows bad performance, LIP continues to perform very well. We then propose reinforcement learning algorithms, both tabular and NN-based, to obtain online learning schemes for LIP in the model-free setting. The proposed reinforcement learning schemes for LIP require significantly less memory than the analogous schemes for WIP. We calculate analytically the Lagrangian index for the restart model, which applies to the optimal web crawling and the minimization of the weighted age of information. We also give a new proof of asymptotic optimality in case of homogeneous arms as the number of arms goes to infinity, based on exchangeability and de Finetti's theorem.

preprint2022arXiv

A selection procedure for extracting the unique Feller weak solution of degenerate diffusions

In this work, we show that for the martingale problem for a class of degenerate diffusions with bounded continuous drift and diffusion coefficients, the small noise limit of non-degenerate approximations leads to a unique Feller limit. The proof uses the theory of viscosity solutions applied to the associated backward Kolmogorov equations. Under appropriate conditions on drift and diffusion coefficients, we will establish a comparison principle and a one-one correspondence between Feller solutions to the martingale problem and continuous viscosity solutions of the associated Kolmogorov equation. This work can be considered as an extension to the work of V. S. Borkar and K. S. Kumar (2010).

preprint2022arXiv

Concentration of Contractive Stochastic Approximation and Reinforcement Learning

Using a martingale concentration inequality, concentration bounds `from time $n_0$ on' are derived for stochastic approximation algorithms with contractive maps and both martingale difference and Markov noises. These are applied to reinforcement learning algorithms, in particular to asynchronous Q-learning and TD(0).

preprint2022arXiv

Ergodic Risk-sensitive control -- A survey

Risk-sensitive control has received considerable interest since the seminal work of Howard and Matheson [120] because of its ability to account for fluctuations about the mean, its connection with $H_\infty$ control, and its application to financial mathematics. In this article, we attempt to put together a comprehensive survey on the research done on ergodic risk-sensitive control over the last four decades.

preprint2022arXiv

Scheduling in Wireless Networks using Whittle Index Theory

We consider the problem of scheduling packet transmissions in a wireless network of users while minimizing the energy consumed and the transmission delay. A challenge is that transmissions of users that are close to each other mutually interfere, while users that are far apart can transmit simultaneously without much interference. Each user has a queue of packets that are transmitted on a single channel and mutually non interfering users reuse the spectrum. Using the theory of Whittle index for cost minimizing restless bandits, we design four index-based policies and compare their performance with that of the well-known policies: Slotted ALOHA, maximum weight scheduling, quadratic Lyapunov drift, Cella and Cesa Bianchi algorithm, and two Whittle index based policies from a recently published paper. We make the code used to perform our simulations publicly available, so that it can be used for future work by the research community at large.

preprint2020arXiv

A variational characterization of the optimal exit rate for controlled diffusions

The main result in this paper is a variational formula for the exit rate from a bounded domain for a diffusion process in terms of the stationary law of the diffusion constrained to remain in this domain forever. Related results on the geometric ergodicity of the controlled Q-process are also presented.

preprint2020arXiv

A variational characterization of the risk-sensitive average reward for controlled diffusions on $\mathbb{R}^d$

We address the variational formulation of the risk-sensitive reward problem for non-degenerate diffusions on $\mathbb{R}^d$ controlled through the drift. We establish a variational formula on the whole space and also show that the risk-sensitive value equals the generalized principal eigenvalue of the semilinear operator. This can be viewed as a controlled version of the variational formulas for principal eigenvalues of diffusion operators arising in large deviations. We also revisit the average risk-sensitive minimization problem and by employing a gradient estimate developed in this paper, we extend earlier results to unbounded drifts and running costs.

preprint2020arXiv

Scheduling in Wireless Networks with Spatial Reuse of Spectrum as Restless Bandits

We study the problem of scheduling packet transmissions with the aim of minimizing the energy consumption and data transmission delay of users in a wireless network in which spatial reuse of spectrum is employed. We approach this problem using the theory of Whittle index for cost minimizing restless bandits, which has been used to effectively solve problems in a variety of applications. We design two Whittle index based policies the first by treating the graph representing the network as a clique and the second based on interference constraints derived from the original graph. We evaluate the performance of these two policies via extensive simulations, in terms of average cost and packets dropped, and show that they outperform the well-known Slotted ALOHA and maximum weight scheduling algorithms.

preprint2016arXiv

Gradient Estimation with Simultaneous Perturbation and Compressive Sensing

This paper aims at achieving a "good" estimator for the gradient of a function on a high-dimensional space. Often such functions are not sensitive in all coordinates and the gradient of the function is almost sparse. We propose a method for gradient estimation that combines ideas from Spall's Simultaneous Perturbation Stochastic Approximation with compressive sensing. The aim is to obtain "good" estimator without too many function evaluations. Application to estimating gradient outer product matrix as well as standard optimization problems are illustrated via simulations.

preprint2016arXiv

Randomized Kaczmarz for Rank Aggregation from Pairwise Comparisons

We revisit the problem of inferring the overall ranking among entities in the framework of Bradley-Terry-Luce (BTL) model, based on available empirical data on pairwise preferences. By a simple transformation, we can cast the problem as that of solving a noisy linear system, for which a ready algorithm is available in the form of the randomized Kaczmarz method. This scheme is provably convergent, has excellent empirical performance, and is amenable to on-line, distributed and asynchronous variants. Convergence, convergence rate, and error analysis of the proposed algorithm are presented and several numerical experiments are conducted whose results validate our theoretical findings.

preprint2014arXiv

Asynchronous Gossip for Averaging and Spectral Ranking

We consider two variants of the classical gossip algorithm. The first variant is a version of asynchronous stochastic approximation. We highlight a fundamental difficulty associated with the classical asynchronous gossip scheme, viz., that it may not converge to a desired average, and suggest an alternative scheme based on reinforcement learning that has guaranteed convergence to the desired average. We then discuss a potential application to a wireless network setting with simultaneous link activation constraints. The second variant is a gossip algorithm for distributed computation of the Perron-Frobenius eigenvector of a nonnegative matrix. While the first variant draws upon a reinforcement learning algorithm for an average cost controlled Markov decision problem, the second variant draws upon a reinforcement learning algorithm for risk-sensitive control. We then discuss potential applications of the second variant to ranking schemes, reputation networks, and principal component analysis.

preprint2014arXiv

Greedy Block Coordinate Descent (GBCD) Method for High Dimensional Quadratic Programs

High dimensional unconstrained quadratic programs (UQPs) involving massive datasets are now common in application areas such as web, social networks, etc. Unless computational resources that match up to these datasets are available, solving such problems using classical UQP methods is very difficult. This paper discusses alternatives. We first define high dimensional compliant (HDC) methods for UQPs---methods that can solve high dimensional UQPs by adapting to available computational resources. We then show that the class of block Kaczmarz and block coordinate descent (BCD) are the only existing methods that can be made HDC. As a possible answer to the question of the `best' amongst BCD methods for UQP, we propose a novel greedy BCD (GBCD) method with serial, parallel and distributed variants. Convergence rates and numerical tests confirm that the GBCD is indeed an effective method to solve high dimensional UQPs. In fact, it sometimes beats even the conjugate gradient.

preprint2013arXiv

A Stochastic Kaczmarz Algorithm for Network Tomography

We develop a stochastic approximation version of the classical Kaczmarz algorithm that is incremental in nature and takes as input noisy real time data. Our analysis shows that with probability one it mimics the behavior of the original scheme: starting from the same initial point, our algorithm and the corresponding deterministic Kaczmarz algorithm converge to precisely the same point. The motivation for this work comes from network tomography where network parameters are to be estimated based upon end-to-end measurements. Numerical examples via Matlab based simulations demonstrate the efficacy of the algorithm.

preprint2013arXiv

Asymptotics of the Invariant Measure in Mean Field Models with Jumps

We consider the asymptotics of the invariant measure for the process of the empirical spatial distribution of $N$ coupled Markov chains in the limit of a large number of chains. Each chain reflects the stochastic evolution of one particle. The chains are coupled through the dependence of the transition rates on this spatial distribution of particles in the various states. Our model is a caricature for medium access interactions in wireless local area networks. It is also applicable to the study of spread of epidemics in a network. The limiting process satisfies a deterministic ordinary differential equation called the McKean-Vlasov equation. When this differential equation has a unique globally asymptotically stable equilibrium, the spatial distribution asymptotically concentrates on this equilibrium. More generally, its limit points are supported on a subset of the $ω$-limit sets of the McKean-Vlasov equation. Using a control-theoretic approach, we examine the question of large deviations of the invariant measure from this limit.

preprint2013arXiv

Distributed Reinforcement Learning via Gossip

We consider the classical TD(0) algorithm implemented on a network of agents wherein the agents also incorporate the updates received from neighboring agents using a gossip-like mechanism. The combined scheme is shown to converge for both discounted and average cost problems.

preprint2013arXiv

Reinforcement Learning for Matrix Computations: PageRank as an Example

Reinforcement learning has gained wide popularity as a technique for simulation-driven approximate dynamic programming. A less known aspect is that the very reasons that make it effective in dynamic programming can also be leveraged for using it for distributed schemes for certain matrix computations involving non-negative matrices. In this spirit, we propose a reinforcement learning algorithm for PageRank computation that is fashioned after analogous schemes for approximate dynamic programming. The algorithm has the advantage of ease of distributed implementation and more importantly, of being model-free, i.e., not dependent on any specific assumptions about the transition probabilities in the random web-surfer model. We analyze its convergence and finite time behavior and present some supporting numerical experiments.

preprint2011arXiv

Small Noise Asymptotics for Invariant Densities for a Class of Diffusions: A Control Theoretic View (with Erratum)

The uniqueness argument in the proof of Theorem 5, p. 483, of "Small noise asymptotics for invariant densities for a class of diffusions: a control theoretic view, J. Math. Anal. and Appl. (2009) " is flawed. We give here a corrected proof.

Vivek S. Borkar

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

A Concentration Bound for TD(0) with Function Approximation

Adynamical systems view of training generativemodels and the memorization phenomenon

Lagrangian Index Policy for Restless Bandits with Average Reward

A selection procedure for extracting the unique Feller weak solution of degenerate diffusions

Concentration of Contractive Stochastic Approximation and Reinforcement Learning

Ergodic Risk-sensitive control -- A survey

Scheduling in Wireless Networks using Whittle Index Theory

A variational characterization of the optimal exit rate for controlled diffusions

A variational characterization of the risk-sensitive average reward for controlled diffusions on $\mathbb{R}^d$

Scheduling in Wireless Networks with Spatial Reuse of Spectrum as Restless Bandits

Gradient Estimation with Simultaneous Perturbation and Compressive Sensing

Randomized Kaczmarz for Rank Aggregation from Pairwise Comparisons

Asynchronous Gossip for Averaging and Spectral Ranking

Greedy Block Coordinate Descent (GBCD) Method for High Dimensional Quadratic Programs

A Stochastic Kaczmarz Algorithm for Network Tomography

Asymptotics of the Invariant Measure in Mean Field Models with Jumps

Distributed Reinforcement Learning via Gossip

Reinforcement Learning for Matrix Computations: PageRank as an Example

Small Noise Asymptotics for Invariant Densities for a Class of Diffusions: A Control Theoretic View (with Erratum)