Source author record

Sujay Bhatt

Sujay Bhatt appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Social and Information Networks math.OC eess.SY math.ST Methodology Multiagent Systems physics.soc-ph q-fin.TR Statistics Theory Systems and Control

Catalog footprint

What is connected

8works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Rethinking Neural Network Learning Rates: A Stackelberg Perspective

Neural networks are typically trained with a single learning rate across all layers. While recent empirical evidence suggests that assigning layer-specific learning rates can accelerate training, a principled understanding of the conditions and mechanisms under which non-uniform learning rates are beneficial remains limited. In this work, we investigate non-uniform learning rates through the lens of Stackelberg optimization. Specifically, we demonstrate that training neural networks with a smaller learning rate for the body layers and a larger learning rate for the final layer can be interpreted as a two-time-scale alternating gradient descent algorithm applied to a Stackelberg reformulation of the original objective. We establish finite-time convergence guarantees for the algorithm under broad conditions that accommodate constraint sets and non-smooth activation functions. Beyond convergence, we identify two mechanisms by which non-uniform learning rates can outperform uniform learning rates: (i) we show that certain problem instances induce a Stackelberg objective with stronger optimization structure than the original objective, yielding faster convergence to globally optimal solutions, (ii) our numerical analysis reveals that the Stackelberg objective can exhibit substantially sharper local curvature, especially in early training, which leads to more informative gradients and learning acceleration. Experiments in supervised learning and reinforcement learning support our findings.

preprint2022arXiv

Catoni-style Confidence Sequences under Infinite Variance

In this paper, we provide an extension of confidence sequences for settings where the variance of the data-generating distribution does not exist or is infinite. Confidence sequences furnish confidence intervals that are valid at arbitrary data-dependent stopping times, naturally having a wide range of applications. We first establish a lower bound for the width of the Catoni-style confidence sequences for the finite variance case to highlight the looseness of the existing results. Next, we derive tight Catoni-style confidence sequences for data distributions having a relaxed bounded~$p^{th}-$moment, where~$p \in (1,2]$, and strengthen the results for the finite variance case of~$p =2$. The derived results are shown to better than confidence sequences obtained using Dubins-Savage inequality.

preprint2022arXiv

Offline Change Detection under Contamination

In this work, we propose a non-parametric and robust change detection algorithm to detect multiple change points in time series data under contamination. The contamination model is sufficiently general, in that, the most common model used in the context of change detection -- Huber contamination model -- is a special case. Also, the contamination model is oblivious and arbitrary. The change detection algorithm is designed for the offline setting, where the objective is to detect changes when all data are received. We only make weak moment assumptions on the inliers (uncorrupted data) to handle a large class of distributions. The robust scan statistic in the algorithm is fashioned using mean estimators based on influence functions. We establish the consistency of the estimated change point indexes as the number of samples increases, and provide empirical evidence to support the consistency results.

preprint2020arXiv

Controlled Sequential Information Fusion with Social Sensors

A sequence of social sensors estimate an unknown parameter (modeled as a state of nature) by performing Bayesian Social Learning, and myopically optimize individual reward functions. The decisions of the social sensors contain quantized information about the underlying state. How should a fusion center dynamically incentivize the social sensors for acquiring information about the underlying state? This paper presents five results. First, sufficient conditions on the model parameters are provided under which the optimal policy for the fusion center has a threshold structure. The optimal policy is determined in closed form, and is such that it switches between two exactly specified incentive policies at the threshold. Second, it is shown that the optimal incentive sequence is a sub-martingale, i.e, the optimal incentives increase on average over time. Third, it is shown that it is possible for the fusion center to learn the true state asymptotically by employing a sub-optimal policy; in other words, controlled information fusion with social sensors can be consistent. Fourth, uniform bounds on the average additional cost incurred by the fusion center for employing a sub-optimal policy are provided. This characterizes the trade-off between the cost of information acquisition and consistency for the fusion center. Finally, when it is sufficient to estimate the state with a degree of confidence, uniform bounds on the budget saved by employing policies that guarantee state estimation in finite time are provided.

preprint2020arXiv

Policy Gradient using Weak Derivatives for Reinforcement Learning

This paper considers policy search in continuous state-action reinforcement learning problems. Typically, one computes search directions using a classic expression for the policy gradient called the Policy Gradient Theorem, which decomposes the gradient of the value function into two factors: the score function and the Q-function. This paper presents four results:(i) an alternative policy gradient theorem using weak (measure-valued) derivatives instead of score-function is established; (ii) the stochastic gradient estimates thus derived are shown to be unbiased and to yield algorithms that converge almost surely to stationary points of the non-convex value function of the reinforcement learning problem; (iii) the sample complexity of the algorithm is derived and is shown to be $O(1/\sqrt(k))$; (iv) finally, the expected variance of the gradient estimates obtained using weak derivatives is shown to be lower than those obtained using the popular score-function approach. Experiments on OpenAI gym pendulum environment show superior performance of the proposed algorithm.

preprint2016arXiv

Opportunistic Advertisement Scheduling in Live Social Media: A Multiple Stopping Time POMDP Approach

Live online social broadcasting services like YouTube Live and Twitch have steadily gained popularity due to improved bandwidth, ease of generating content and the ability to earn revenue on the generated content. In contrast to traditional cable television, revenue in online services is generated solely through advertisements, and depends on the number of clicks generated. Channel owners aim to opportunistically schedule advertisements so as to generate maximum revenue. This paper considers the problem of optimal scheduling of advertisements in live online social media. The problem is formulated as a multiple stopping problem and is addressed in a partially observed Markov decision process (POMDP) framework. Structural results are provided on the optimal advertisement scheduling policy. By exploiting the structure of the optimal policy, best linear thresholds are computed using stochastic approximation. The proposed model and framework are validated on real datasets, and the following observations are made: (i) The policy obtained by the multiple stopping problem can be used to detect changes in ground truth from online search data (ii) Numerical results show a significant improvement in the expected revenue by opportunistically scheduling the advertisements. The revenue can be improved by $20-30\%$ in comparison to currently employed periodic scheduling.

preprint2016arXiv

Tracking Infection Diffusion in Social Networks: Filtering Algorithms and Threshold Bounds

This paper deals with the statistical signal pro- cessing over graphs for tracking infection diffusion in social networks. Infection (or Information) diffusion is modeled using the Susceptible-Infected-Susceptible (SIS) model. Mean field approximation is employed to approximate the discrete valued infected degree distribution evolution by a deterministic ordinary differential equation for obtaining a generative model for the infection diffusion. The infected degree distribution is shown to follow polynomial dynamics and is estimated using an exact non- linear Bayesian filter. We compute posterior Cramer-Rao bounds to obtain the fundamental limits of the filter which depend on the structure of the network. Considering the time-varying nature of the real world networks, the relationship between the diffusion thresholds and the degree distribution is investigated using generative models for real world networks. In addition, we validate the efficacy of our method with the diffusion data from a real-world online social system, Twitter. We find that SIS model is a good fit for the information diffusion and the non-linear filter effectively tracks the information diffusion.

preprint2015arXiv

Sequential Detection of Market shocks using Risk-averse Agent Based Models

This paper considers a statistical signal processing problem involving agent based models of financial markets which at a micro-level are driven by socially aware and risk- averse trading agents. These agents trade (buy or sell) stocks by exploiting information about the decisions of previous agents (social learning) via an order book in addition to a private (noisy) signal they receive on the value of the stock. We are interested in the following: (1) Modelling the dynamics of these risk averse agents, (2) Sequential detection of a market shock based on the behaviour of these agents. Structural results which characterize social learning under a risk measure, CVaR (Conditional Value-at-risk), are presented and formulation of the Bayesian change point detection problem is provided. The structural results exhibit two interesting prop- erties: (i) Risk averse agents herd more often than risk neutral agents (ii) The stopping set in the sequential detection problem is non-convex. The framework is validated on data from the Yahoo! Tech Buzz game dataset.

Sujay Bhatt

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Rethinking Neural Network Learning Rates: A Stackelberg Perspective

Catoni-style Confidence Sequences under Infinite Variance

Offline Change Detection under Contamination

Controlled Sequential Information Fusion with Social Sensors

Policy Gradient using Weak Derivatives for Reinforcement Learning

Opportunistic Advertisement Scheduling in Live Social Media: A Multiple Stopping Time POMDP Approach

Tracking Infection Diffusion in Social Networks: Filtering Algorithms and Threshold Bounds

Sequential Detection of Market shocks using Risk-averse Agent Based Models