Source author record

Vaibhav Srivastava

Vaibhav Srivastava appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.OC Systems and Control eess.SY Machine Learning Robotics math.DS Multiagent Systems math.PR Neurons and Cognition q-fin.MF Artificial Intelligence Computer Science and Game Theory eess.SP Human-Computer Interaction math.CA Social and Information Networks

Catalog footprint

What is connected

20works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Co-Learning Port-Hamiltonian Systems and Optimal Energy-Shaping Control

We develop a physics-informed learning framework for energy-shaping control of port-Hamiltonian (pH) systems from trajectory data. The proposed approach co-learns a pH system model and an optimal energy-balancing passivity-based controller (EB-PBC) through alternating optimization with policy-aware data collection. At each iteration, the system model is refined using trajectory data collected under the current control policy, and the controller is re-optimized on the updated model. Both components are parameterized by neural networks that embed the pH dynamics and EB-PBC structure, ensuring interpretability in terms of energy interactions. The learned controller renders the closed-loop system inherently passive and provably stable, and exploits passive plant dynamics without canceling the natural potential. A dissipation regularization enforces strict energy decay during training, thereby enhancing robustness to sim-to-real gaps. The proposed framework is validated on state-regulation and swing-up tasks for planar and torsional pendulum systems.

preprint2022arXiv

Epidemic Propagation under Evolutionary Behavioral Dynamics: Stability and Bifurcation Analysis

We consider the class of SIS epidemic models in which a large population of individuals chooses whether to adopt protection or to remain unprotected as the epidemic evolves. For a susceptible individual, adopting protection reduces the probability of becoming infected but it comes with a cost that is weighed with the instantaneous risk of becoming infected. An infected individual adopting protection transmits a new infection with a smaller probability compared to an unprotected infected individual. We focus on the replicator evolutionary dynamics to model the evolution of protection decisions by susceptible and infected subpopulations. We completely characterize the existence and local stability of the equilibria of the resulting coupled epidemic and replicator dynamics. We further show how the stability of different equilibrium points gets exchanged as certain parameters change. Finally, we investigate the system behavior under timescale separation between the epidemic and the evolutionary dynamics.

preprint2022arXiv

Towards Modeling Human Motor Learning Dynamics in High-Dimensional Spaces

Designing effective rehabilitation strategies for upper extremities, particularly hands and fingers, warrants the need for a computational model of human motor learning. The presence of large degrees of freedom (DoFs) available in these systems makes it difficult to balance the trade-off between learning the full dexterity and accomplishing manipulation goals. The motor learning literature argues that humans use motor synergies to reduce the dimension of control space. Using the low-dimensional space spanned by these synergies, we develop a computational model based on the internal model theory of motor control. We analyze the proposed model in terms of its convergence properties and fit it to the data collected from human experiments. We compare the performance of the fitted model to the experimental data and show that it captures human motor learning behavior well.

preprint2021arXiv

Nonstationary Stochastic Multiarmed Bandits: UCB Policies and Minimax Regret

We study the nonstationary stochastic Multi-Armed Bandit (MAB) problem in which the distribution of rewards associated with each arm are assumed to be time-varying and the total variation in the expected rewards is subject to a variation budget. The regret of a policy is defined by the difference in the expected cumulative rewards obtained using the policy and using an oracle that selects the arm with the maximum mean reward at each time. We characterize the performance of the proposed policies in terms of the worst-case regret, which is the supremum of the regret over the set of reward distribution sequences satisfying the variation budget. We extend Upper-Confidence Bound (UCB)-based policies with three different approaches, namely, periodic resetting, sliding observation window and discount factor and show that they are order-optimal with respect to the minimax regret, i.e., the minimum worst-case regret achieved by any policy. We also relax the sub-Gaussian assumption on reward distributions and develop robust versions the proposed polices that can handle heavy-tailed reward distributions and maintain their performance guarantees.

preprint2021arXiv

Phase Reduction and Synchronization of Coupled Noisy Oscillators

We study the synchronization behavior of a noisy network in which each system is driven by two sources of state-dependent noise: (1) an intrinsic noise which is common among all systems and can be generated by the environment or any internal fluctuations, and (2) a coupling noise which is generated by interactions with other systems. After providing sufficient conditions that foster synchronization in networks of general noisy systems, we focus on weakly coupled networks of noisy oscillators and, using the first- and second-order phase response curves (PRCs), we derive a reduced order stochastic differential equation to describe the corresponding phase evolutions. Finally, we derive synchronization conditions based on the PRCs and illustrate the theoretical results on a couple of models.

preprint2020arXiv

An Incremental Approach to Online Dynamic Mode Decomposition for Time-Varying Systems with Applications to EEG Data Modeling

Dynamic Mode Decomposition (DMD) is a data-driven technique to identify a low dimensional linear time invariant dynamics underlying high-dimensional data. For systems in which such underlying low-dimensional dynamics is time-varying, a time-invariant approximation of such dynamics computed through standard DMD techniques may not be appropriate. We focus on DMD techniques for such time-varying systems and develop incremental algorithms for systems without and with exogenous control inputs. We build upon the work in [35] to scenarios in which high dimensional data are governed by low dimensional time-varying dynamics. We consider two classes of algorithms that rely on (i) a discount factor on previous observations, and (ii) a sliding window of observations. Our algorithms leverage existing techniques for incremental singular value decomposition and allow us to determine an appropriately reduced model at each time and are applicable even if data matrix is singular. We apply the developed algorithms for autonomous systems to Electroencephalographic (EEG) data and demonstrate their effectiveness in terms of reconstruction and prediction. Our algorithms for non-autonomous systems are illustrated using randomly generated linear time-varying systems.

preprint2020arXiv

Design of Robust Path-Following Control System for Self-driving Vehicles Using Extended High-Gain Observer

In the real-world, self-driving vehicles are required to achieve steering maneuvers in both uncontrolled and uncertain environments while maintaining high levels of safety and passengers' comfort. Ignoring these requirements would inherently cause a significant degradation in the performance of the control system, and consequently, could lead to life-threatening scenarios. In this paper, we present a robust path following control of a self-driving vehicle under mismatched perturbations due to the effect of parametric uncertainties, vehicle side-slip angle, and road banking. In particular, the proposed control framework includes two parts. The first part ensures that the lateral and the yaw dynamics behave with nominal desired dynamics by canceling undesired dynamics. The second part is composed of two extended high-gain observers to estimate the system state variables and the perturbation terms. Our stability analysis of the closed-loop systems confirms exponential stability properties under the proposed control law. To validate the proposed control system, the controller is implemented experimentally on an autonomous vehicle research platform and tested in different road conditions that include flat, inclined, and banked roads. The experimental results show the effectiveness of the controller, they also illustrate the capability of the controller in achieving comparable performance under inclined and banked roads as compared to flat roads under a range of longitudinal velocities.

preprint2020arXiv

Distributed Cooperative Decision Making in Multi-agent Multi-armed Bandits

We study a distributed decision-making problem in which multiple agents face the same multi-armed bandit (MAB), and each agent makes sequential choices among arms to maximize its own individual reward. The agents cooperate by sharing their estimates over a fixed communication graph. We consider an unconstrained reward model in which two or more agents can choose the same arm and collect independent rewards. And we consider a constrained reward model in which agents that choose the same arm at the same time receive no reward. We design a dynamic, consensus-based, distributed estimation algorithm for cooperative estimation of mean rewards at each arm. We leverage the estimates from this algorithm to develop two distributed algorithms: coop-UCB2 and coop-UCB2-selective-learning, for the unconstrained and constrained reward models, respectively. We show that both algorithms achieve group performance close to the performance of a centralized fusion center. Further, we investigate the influence of the communication graph structure on performance. We propose a novel graph explore-exploit index that predicts the relative performance of groups in terms of the communication graph, and we propose a novel nodal explore-exploit centrality index that predicts the relative performance of agents in terms of the agent locations in the communication graph.

preprint2020arXiv

Expedited Multi-Target Search with Guaranteed Performance via Multi-fidelity Gaussian Processes

We consider a scenario in which an autonomous vehicle equipped with a downward facing camera operates in a 3D environment and is tasked with searching for an unknown number of stationary targets on the 2D floor of the environment. The key challenge is to minimize the search time while ensuring a high detection accuracy. We model the sensing field using a multi-fidelity Gaussian process that systematically describes the sensing information available at different altitudes from the floor. Based on the sensing model, we design a novel algorithm called Expedited Multi-Target Search (EMTS) that (i) addresses the coverage-accuracy trade-off: sampling at locations farther from the floor provides wider field of view but less accurate measurements, (ii) computes an occupancy map of the floor within a prescribed accuracy and quickly eliminates unoccupied regions from the search space, and (iii) travels efficiently to collect the required samples for target detection. We rigorously analyze the algorithm and establish formal guarantees on the target detection accuracy and the expected detection time. We illustrate the algorithm using a simulated multi-target search scenario.

preprint2020arXiv

Influence Spread in the Heterogeneous Multiplex Linear Threshold Model

The linear threshold model (LTM) has been used to study spread on single-layer networks defined by one inter-agent sensing modality and agents homogeneous in protocol. We define and analyze the heterogeneous multiplex LTM to study spread on multi-layer networks with each layer representing a different sensing modality and agents heterogeneous in protocol. Protocols are designed to distinguish signals from different layers: an agent becomes active if a sufficient number of its neighbors in each of any $a$ of the $m$ layers is active. We focus on Protocol OR, when $a=1$, and Protocol AND, when $a=m$, which model agents that are most and least readily activated, respectively. We develop theory and algorithms to compute the size of the spread at steady state for any set of initially active agents and to analyze the role of distinguished sensing modalities, network structure, and heterogeneity. We show how heterogeneity manages the tension in spreading dynamics between sensitivity to inputs and robustness to disturbances.

preprint2020arXiv

SIS Epidemic Model under Mobility on Multi-layer Networks

We study the influence of heterogeneous mobility patterns in a population on the SIS epidemic model. In particular, we consider a patchy environment in which each patch comprises individuals belonging the different classes, e.g., individuals in different socio-economic strata. We model the mobility of individuals of each class across different patches through an associated Continuous Time Markov Chain (CTMC). The topology of these multiple CTMCs constitute the multi-layer network of mobility. At each time, individuals move in the multi-layer network of spatially-distributed patches according to their CTMC and subsequently interact with the local individuals in the patch according to an SIS epidemic model. We derive a deterministic continuum limit model describing these mobility-epidemic interactions. We establish the existence of a Disease-Free Equilibrium (DFE) and an Endemic Equilibrium (EE) under different parameter regimes and establish their (almost) global asymptotic stability using Lyapunov techniques. We derive simple sufficient conditions that highlight the influence of the multi-layer network on the stability of DFE. Finally, we numerically illustrate that the derived model provides a good approximation to the stochastic model with a finite population and also demonstrate the influence of the multi-layer network structure on the transient performance.

preprint2016arXiv

A martingale analysis of first passage times of time-dependent Wiener diffusion models

Research in psychology and neuroscience has successfully modeled decision making as a process of noisy evidence accumulation to a decision bound. While there are several variants and implementations of this idea, the majority of these models make use of a noisy accumulation between two absorbing boundaries. A common assumption of these models is that decision parameters, e.g., the rate of accumulation (drift rate), remain fixed over the course of a decision, allowing the derivation of analytic formulas for the probabilities of hitting the upper or lower decision threshold, and the mean decision time. There is reason to believe, however, that many types of behavior would be better described by a model in which the parameters were allowed to vary over the course of the decision process. In this paper, we use martingale theory to derive formulas for the mean decision time, hitting probabilities, and first passage time (FPT) densities of a Wiener process with time-varying drift between two time-varying absorbing boundaries. This model was first studied by Ratcliff (1980) in the two-stage form, and here we consider the same model for an arbitrary number of stages (i.e. intervals of time during which parameters are constant). Our calculations enable direct computation of mean decision times and hitting probabilities for the associated multistage process. We also provide a review of how martingale theory may be used to analyze similar models employing Wiener processes by re-deriving some classical results. In concert with a variety of numerical tools already available, the current derivations should encourage mathematical analysis of more complex models of decision making with time-varying evidence.

preprint2016arXiv

Explicit moments of decision times for single- and double-threshold drift-diffusion processes

We derive expressions for the first three moments of the decision time (DT) distribution produced via first threshold crossings by sample paths of a drift-diffusion equation. The "pure" and "extended" diffusion processes are widely used to model two-alternative forced choice decisions, and, while simple formulae for accuracy, mean DT and coefficient of variation are readily available, third and higher moments and conditioned moments are not generally available. We provide explicit formulae for these, describe their behaviors as drift rates and starting points approach interesting limits, and, with the support of numerical simulations, discuss how trial-to-trial variability of drift rates, starting points, and non-decision times affect these behaviors in the extended diffusion model. Both unconditioned moments and those conditioned on correct and erroneous responses are treated. We argue that the results will assist in exploring mechanisms of evidence accumulation and in fitting parameters to experimental data.

preprint2016arXiv

Satisficing in multi-armed bandit problems

Satisficing is a relaxation of maximizing and allows for less risky decision making in the face of uncertainty. We propose two sets of satisficing objectives for the multi-armed bandit problem, where the objective is to achieve reward-based decision-making performance above a given threshold. We show that these new problems are equivalent to various standard multi-armed bandit problems with maximizing objectives and use the equivalence to find bounds on performance. The different objectives can result in qualitatively different behavior; for example, agents explore their options continually in one case and only a finite number of times in another. For the case of Gaussian rewards we show an additional equivalence between the two sets of satisficing objectives that allows algorithms developed for one set to be applied to the other. We then develop variants of the Upper Credible Limit (UCL) algorithm that solve the problems with satisficing objectives and show that these modified UCL algorithms achieve efficient satisficing performance.

preprint2015arXiv

Correlated Multiarmed Bandit Problem: Bayesian Algorithms and Regret Analysis

We consider the correlated multiarmed bandit (MAB) problem in which the rewards associated with each arm are modeled by a multivariate Gaussian random variable, and we investigate the influence of the assumptions in the Bayesian prior on the performance of the upper credible limit (UCL) algorithm and a new correlated UCL algorithm. We rigorously characterize the influence of accuracy, confidence, and correlation scale in the prior on the decision-making performance of the algorithms. Our results show how priors and correlation structure can be leveraged to improve performance.

preprint2014arXiv

Collective Decision-Making in Ideal Networks: The Speed-Accuracy Tradeoff

We study collective decision-making in a model of human groups, with network interactions, performing two alternative choice tasks. We focus on the speed-accuracy tradeoff, i.e., the tradeoff between a quick decision and a reliable decision, for individuals in the network. We model the evidence aggregation process across the network using a coupled drift diffusion model (DDM) and consider the free response paradigm in which individuals take their time to make the decision. We develop reduced DDMs as decoupled approximations to the coupled DDM and characterize their efficiency. We determine high probability bounds on the error rate and the expected decision time for the reduced DDM. We show the effect of the decision-maker's location in the network on their decision-making performance under several threshold selection criteria. Finally, we extend the coupled DDM to the coupled Ornstein-Uhlenbeck model for decision-making in two alternative choice tasks with recency effects, and to the coupled race model for decision-making in multiple alternative choice tasks.

preprint2013arXiv

Mixed Human-Robot Team Surveillance

We study the mixed human-robot team design in a system theoretic setting using the context of a surveillance mission. The three key coupled components of a mixed team design are (i) policies for the human operator, (ii) policies to account for erroneous human decisions, and (iii) policies to control the automaton. In this paper, we survey elements of human decision-making, including evidence aggregation, situational awareness, fatigue, and memory effects. We bring together the models for these elements in human decision-making to develop a single coherent model for human decision-making in a two-alternative choice task. We utilize the developed model to design efficient attention allocation policies for the human operator. We propose an anomaly detection algorithm that utilizes potentially erroneous decision by the operator to ascertain an anomalous region among the set of regions surveilled. Finally, we propose a stochastic vehicle routing policy that surveils an anomalous region with high probability. Our mixed team design relies on the certainty-equivalent receding-horizon control framework.

preprint2012arXiv

Distributed Random Convex Programming via Constraints Consensus

This paper discusses distributed approaches for the solution of random convex programs (RCP). RCPs are convex optimization problems with a (usually large) number N of randomly extracted constraints; they arise in several applicative areas, especially in the context of decision under uncertainty, see [2],[3]. We here consider a setup in which instances of the random constraints (the scenario) are not held by a single centralized processing unit, but are distributed among different nodes of a network. Each node "sees" only a small subset of the constraints, and may communicate with neighbors. The objective is to make all nodes converge to the same solution as the centralized RCP problem. To this end, we develop two distributed algorithms that are variants of the constraints consensus algorithm [4],[5]: the active constraints consensus (ACC) algorithm, and the vertex constraints consensus (VCC) algorithm. We show that the ACC algorithm computes the overall optimal solution in finite time, and with almost surely bounded communication at each iteration. The VCC algorithm is instead tailored for the special case in which the constraint functions are convex also w.r.t. the uncertain parameters, and it computes the solution in a number of iterations bounded by the diameter of the communication graph. We further devise a variant of the VCC algorithm, namely quantized vertex constraints consensus (qVCC), to cope with the case in which communication bandwidth among processors is bounded. We discuss several applications of the proposed distributed techniques, including estimation, classification, and random model predictive control, and we present a numerical analysis of the performance of the proposed methods. As a complementary numerical result, we show that the parallel computation of the scenario solution using ACC algorithm significantly outperforms its centralized equivalent.

preprint2012arXiv

Stochastic Surveillance Strategies for Spatial Quickest Detection

We design persistent surveillance strategies for the quickest detection of anomalies taking place in an environment of interest. From a set of predefined regions in the environment, a team of autonomous vehicles collects noisy observations, which a control center processes. The overall objective is to minimize detection delay while maintaining the false alarm rate below a desired threshold. We present joint (i) anomaly detection algorithms for the control center and (ii) vehicle routing policies. For the control center, we propose parallel cumulative sum (CUSUM) algorithms (one for each region) to detect anomalies from noisy observations. For the vehicles, we propose a stochastic routing policy, in which the regions to be visited are chosen according to a probability vector. We study stationary routing policy (the probability vector is constant) as well as adaptive routing policies (the probability vector varies in time as a function of the likelihood of regional anomalies). In the context of stationary policies, we design a performance metric and minimize it to design an efficient stationary routing policy. Our adaptive policy improves upon the stationary counterpart by adaptively increasing the selection probability of regions with high likelihood of anomaly. Finally, we show the effectiveness of the proposed algorithms through numerical simulations and a persistent surveillance experiment.

preprint2010arXiv

Task Release Control for Decision Making Queues

We consider the optimal duration allocation in a decision making queue. Decision making tasks arrive at a given rate to a human operator. The correctness of the decision made by human evolves as a sigmoidal function of the duration allocated to the task. Each task in the queue loses its value continuously. We elucidate on this trade-off and determine optimal policies for the human operator. We show the optimal policy requires the human to drop some tasks. We present a receding horizon optimization strategy, and compare it with the greedy policy.

Vaibhav Srivastava

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

Co-Learning Port-Hamiltonian Systems and Optimal Energy-Shaping Control

Epidemic Propagation under Evolutionary Behavioral Dynamics: Stability and Bifurcation Analysis

Towards Modeling Human Motor Learning Dynamics in High-Dimensional Spaces

Nonstationary Stochastic Multiarmed Bandits: UCB Policies and Minimax Regret

Phase Reduction and Synchronization of Coupled Noisy Oscillators

An Incremental Approach to Online Dynamic Mode Decomposition for Time-Varying Systems with Applications to EEG Data Modeling

Design of Robust Path-Following Control System for Self-driving Vehicles Using Extended High-Gain Observer

Distributed Cooperative Decision Making in Multi-agent Multi-armed Bandits

Expedited Multi-Target Search with Guaranteed Performance via Multi-fidelity Gaussian Processes

Influence Spread in the Heterogeneous Multiplex Linear Threshold Model

SIS Epidemic Model under Mobility on Multi-layer Networks

A martingale analysis of first passage times of time-dependent Wiener diffusion models

Explicit moments of decision times for single- and double-threshold drift-diffusion processes

Satisficing in multi-armed bandit problems

Correlated Multiarmed Bandit Problem: Bayesian Algorithms and Regret Analysis

Collective Decision-Making in Ideal Networks: The Speed-Accuracy Tradeoff

Mixed Human-Robot Team Surveillance

Distributed Random Convex Programming via Constraints Consensus

Stochastic Surveillance Strategies for Spatial Quickest Detection

Task Release Control for Decision Making Queues