Source author record

Vikram Krishnamurthy

Vikram Krishnamurthy appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

48works

25topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Adaptive Filtering Algorithms for Set-Valued Observations -- Symmetric Measurement Approach to Unlabeled and Anonymized Data

Suppose $L$ simultaneous independent stochastic systems generate observations, where the observations from each system depend on the underlying parameter of that system. The observations are unlabeled (anonymized), in the sense that an analyst does not know which observation came from which stochastic system. How can the analyst estimate the underlying parameters of the $L$ systems? Since the anonymized observations at each time are an unordered set of L measurements (rather than a vector), classical stochastic gradient algorithms cannot be directly used. By using symmetric polynomials, we formulate a symmetric measurement equation that maps the observation set to a unique vector. By exploiting that fact that the algebraic ring of multi-variable polynomials is a unique factorization domain over the ring of one-variable polynomials, we construct an adaptive filtering algorithm that yields a statistically consistent estimate of the underlying parameters. We analyze the asymptotic covariance of these estimates to quantify the effect of anonymization. Finally, we characterize the anonymity of the observations in terms of the error probability of the maximum aposteriori Bayesian estimator. Using Blackwell dominance of mean preserving spreads, we construct a partial ordering of the noise densities which relates the anonymity of the observations to the asymptotic covariance of the adaptive filtering algorithm.

preprint2022arXiv

Estimating Exposure to Information on Social Networks

This paper considers the problem of estimating exposure to information in a social network. Given a piece of information (e.g., a URL of a news article on Facebook, a hashtag on Twitter), our aim is to find the fraction of people on the network who have been exposed to it. The exact value of exposure to a piece of information is determined by two features: the structure of the underlying social network and the set of people who shared the piece of information. Often, both features are not publicly available (i.e., access to the two features is limited only to the internal administrators of the platform) and difficult to be estimated from data. As a solution, we propose two methods to estimate the exposure to a piece of information in an unbiased manner: a vanilla method which is based on sampling the network uniformly and a method which non-uniformly samples the network motivated by the Friendship Paradox. We provide theoretical results which characterize the conditions (in terms of properties of the network and the piece of information) under which one method outperforms the other. Further, we outline extensions of the proposed methods to dynamic information cascades (where the exposure needs to be tracked in real-time). We demonstrate the practical feasibility of the proposed methods via experiments on multiple synthetic and real-world datasets.

preprint2022arXiv

Hawkes Process Modeling of Block Arrivals in Bitcoin Blockchain

The paper constructs a multi-variate Hawkes process model of Bitcoin block arrivals and price jumps. Hawkes processes are selfexciting point processes that can capture the self- and cross-excitation effects of block mining and Bitcoin price volatility. We use publicly available blockchain datasets to estimate the model parameters via maximum likelihood estimation. The results show that Bitcoin price volatility boost block mining rate and Bitcoin investment return demonstrates mean reversion. Quantile-Quantile plots show that the proposed Hawkes process model is a better fit to the blockchain datasets than a Poisson process model.

preprint2022arXiv

Inverse-Inverse Reinforcement Learning. How to Hide Strategy from an Adversarial Inverse Reinforcement Learner

Inverse reinforcement learning (IRL) deals with estimating an agent's utility function from its actions. In this paper, we consider how an agent can hide its strategy and mitigate an adversarial IRL attack; we call this inverse IRL (I-IRL). How should the decision maker choose its response to ensure a poor reconstruction of its strategy by an adversary performing IRL to estimate the agent's strategy? This paper comprises four results: First, we present an adversarial IRL algorithm that estimates the agent's strategy while controlling the agent's utility function. Our second result for I-IRL result spoofs the IRL algorithm used by the adversary. Our I-IRL results are based on revealed preference theory in micro-economics. The key idea is for the agent to deliberately choose sub-optimal responses that sufficiently masks its true strategy. Third, we give a sample complexity result for our main I-IRL result when the agent has noisy estimates of the adversary specified utility function. Finally, we illustrate our I-IRL scheme in a radar problem where a meta-cognitive radar is trying to mitigate an adversarial target.

preprint2022arXiv

Lyapunov based Stochastic Stability of a Quantum Decision System for Human-Machine Interaction

In mathematical psychology, decision makers are modeled using the Lindbladian equations from quantum mechanics to capture important human-centric features such as order effects and violation of the sure thing principle. We consider human-machine interaction involving a quantum decision maker (human) and a controller (machine). Given a sequence of human decisions over time, how can the controller dynamically provide input messages to adapt these decisions so as to converge to a specific decision? We show via novel stochastic Lyapunov arguments how the Lindbladian dynamics of the quantum decision maker can be controlled to converge to a specific decision asymptotically. Our methodology yields a useful mathematical framework for human-sensor decision making. The stochastic Lyapunov results are also of independent interest as they generalize recent results in the literature.

preprint2022arXiv

Lyapunov based Stochastic Stability of Human-Machine Interaction: A Quantum Decision System Approach

preprint2022arXiv

Meta-Cognition. An Inverse-Inverse Reinforcement Learning Approach for Cognitive Radars

This paper considers meta-cognitive radars in an adversarial setting. A cognitive radar optimally adapts its waveform (response) in response to maneuvers (probes) of a possibly adversarial moving target. A meta-cognitive radar is aware of the adversarial nature of the target and seeks to mitigate the adversarial target. How should the meta-cognitive radar choose its responses to sufficiently confuse the adversary trying to estimate the radar's utility function? This paper abstracts the radar's meta-cognition problem in terms of the spectra (eigenvalues) of the state and observation noise covariance matrices, and embeds the algebraic Riccati equation into an economics-based utility maximization setup. This adversarial target is an inverse reinforcement learner. By observing a noisy sequence of radar's responses (waveforms), the adversarial target uses a statistical hypothesis test to detect if the radar is a utility maximizer. In turn, the meta-cognitive radar deliberately chooses sub-optimal responses that increasing its Type-I error probability of the adversary's detector. We call this counter-adversarial step taken by the meta-cognitive radar as inverse inverse reinforcement learning (I-IRL). We illustrate the meta-cognition results of this paper via simple numerical examples. Our approach for meta-cognition in this paper is based on revealed preference theory in micro-economics and inspired by results in differential privacy and adversarial obfuscation in machine learning.

preprint2022arXiv

Quickest Detection for Human-Sensor Systems using Quantum Decision Theory

In mathematical psychology, recent models for human decision-making use Quantum Decision Theory to capture important human-centric features such as order effects and violation of the sure-thing principle (total probability law). We construct and analyze a human-sensor system where a quickest detector aims to detect a change in an underlying state by observing human decisions that are influenced by the state. Apart from providing an analytical framework for such human-sensor systems, we also analyze the structure of the quickest detection policy. We show that the quickest detection policy has a single threshold and the optimal cost incurred is lower bounded by that of the classical quickest detector. This indicates that intermediate human decisions strictly hinder detection performance. We also analyze the sensitivity of the quickest detection cost with respect to the quantum decision parameters of the human decision maker, revealing that the performance is robust to inaccurate knowledge of the decision-making process. Numerical results are provided which suggest that observing the decisions of more rational decision makers will improve the quickest detection performance. Finally, we illustrate a numerical implementation of this quickest detector in the context of the Prisoner's Dilemma problem, in which it has been observed that Quantum Decision Theory can uniquely model empirically tested violations of the sure-thing principle.

preprint2021arXiv

Langevin Dynamics for Adaptive Inverse Reinforcement Learning of Stochastic Gradient Algorithms

Inverse reinforcement learning (IRL) aims to estimate the reward function of optimizing agents by observing their response (estimates or actions). This paper considers IRL when noisy estimates of the gradient of a reward function generated by multiple stochastic gradient agents are observed. We present a generalized Langevin dynamics algorithm to estimate the reward function $R(θ)$; specifically, the resulting Langevin algorithm asymptotically generates samples from the distribution proportional to $\exp(R(θ))$. The proposed IRL algorithms use kernel-based passive learning schemes. We also construct multi-kernel passive Langevin algorithms for IRL which are suitable for high dimensional data. The performance of the proposed IRL algorithms are illustrated on examples in adaptive Bayesian learning, logistic regression (high dimensional problem) and constrained Markov decision processes. We prove weak convergence of the proposed IRL algorithms using martingale averaging methods. We also analyze the tracking performance of the IRL algorithms in non-stationary environments where the utility function $R(θ)$ jump changes over time as a slow Markov chain.

preprint2021arXiv

Maximum Likelihood Estimation of Power-law Degree Distributions via Friendship Paradox based Sampling

This paper considers the problem of estimating a power-law degree distribution of an undirected network using sampled data. Although power-law degree distributions are ubiquitous in nature, the widely used parametric methods for estimating them (e.g. linear regression on double-logarithmic axes, maximum likelihood estimation with uniformly sampled nodes) suffer from the large variance introduced by the lack of data-points from the tail portion of the power-law degree distribution. As a solution, we present a novel maximum likelihood estimation approach that exploits the friendship paradox to sample more efficiently from the tail of the degree distribution. We analytically show that the proposed method results in a smaller bias, variance and a Cramer-Rao lower bound compared to the vanilla maximum-likelihood estimate obtained with uniformly sampled nodes (which is the most commonly used method in literature). Detailed numerical and empirical results are presented to illustrate the performance of the proposed method under different conditions and how it compares with alternative methods. We also show that the proposed method and its desirable properties (i.e. smaller bias, variance and Cramer-Rao lower bound compared to vanilla method based on uniform samples) extend to parametric degree distributions other than the power-law such as exponential degree distributions as well. All the numerical and empirical results are reproducible and the code is publicly available on Github.

preprint2021arXiv

Multi-kernel Passive Stochastic Gradient Algorithms and Transfer Learning

This paper develops a novel passive stochastic gradient algorithm. In passive stochastic approximation, the stochastic gradient algorithm does not have control over the location where noisy gradients of the cost function are evaluated. Classical passive stochastic gradient algorithms use a kernel that approximates a Dirac delta to weigh the gradients based on how far they are evaluated from the desired point. In this paper we construct a multi-kernel passive stochastic gradient algorithm. The algorithm performs substantially better in high dimensional problems and incorporates variance reduction. We analyze the weak convergence of the multi-kernel algorithm and its rate of convergence. In numerical examples, we study the multi-kernel version of the passive least mean squares (LMS) algorithm for transfer learning to compare the performance with the classical passive version.

preprint2020arXiv

A Markov Decision Process Approach to Active Meta Learning

In supervised learning, we fit a single statistical model to a given data set, assuming that the data is associated with a singular task, which yields well-tuned models for specific use, but does not adapt well to new contexts. By contrast, in meta-learning, the data is associated with numerous tasks, and we seek a model that may perform well on all tasks simultaneously, in pursuit of greater generalization. One challenge in meta-learning is how to exploit relationships between tasks and classes, which is overlooked by commonly used random or cyclic passes through data. In this work, we propose actively selecting samples on which to train by discerning covariates inside and between meta-training sets. Specifically, we cast the problem of selecting a sample from a number of meta-training sets as either a multi-armed bandit or a Markov Decision Process (MDP), depending on how one encapsulates correlation across tasks. We develop scheduling schemes based on Upper Confidence Bound (UCB), Gittins Index and tabular Markov Decision Problems (MDPs) solved with linear programming, where the reward is the scaled statistical accuracy to ensure it is a time-invariant function of state and action. Across a variety of experimental contexts, we observe significant reductions in sample complexity of active selection scheme relative to cyclic or i.i.d. sampling, demonstrating the merit of exploiting covariates in practice.

preprint2020arXiv

Adversarial Radar Inference. From Inverse Tracking to Inverse Reinforcement Learning of Cognitive Radar

Cognitive sensing refers to a reconfigurable sensor that dynamically adapts its sensing mechanism by using stochastic control to optimize its sensing resources. For example, cognitive radars are sophisticated dynamical systems; they use stochastic control to sense the environment, learn from it relevant information about the target and background, then adapt the radar sensor to satisfy the needs of their mission. The last two decades have witnessed intense research in cognitive/adaptive radars.This paper discusses addresses the next logical step, namely inverse cognitive sensing. By observing the emissions of a sensor (e.g. radar or in general a controlled stochastic dynamical system) in real time, how can we detect if the sensor is cognitive (rational utility maximizer) and how can we predict its future actions? The scientific challenges involve extending Bayesian filtering, inverse reinforcement learning and stochastic optimization of dynamical systems to a data-driven adversarial setting. Our methodology transcends classical statistical signal processing (sensing and estimation/detection theory) to address the deeper issue of how to infer strategy from sensing. The generative models, adversarial inference algorithms and associated mathematical analysis will lead to advances in understanding how sophisticated adaptive sensors such as cognitive radars operate.

preprint2020arXiv

Controlled Sequential Information Fusion with Social Sensors

A sequence of social sensors estimate an unknown parameter (modeled as a state of nature) by performing Bayesian Social Learning, and myopically optimize individual reward functions. The decisions of the social sensors contain quantized information about the underlying state. How should a fusion center dynamically incentivize the social sensors for acquiring information about the underlying state? This paper presents five results. First, sufficient conditions on the model parameters are provided under which the optimal policy for the fusion center has a threshold structure. The optimal policy is determined in closed form, and is such that it switches between two exactly specified incentive policies at the threshold. Second, it is shown that the optimal incentive sequence is a sub-martingale, i.e, the optimal incentives increase on average over time. Third, it is shown that it is possible for the fusion center to learn the true state asymptotically by employing a sub-optimal policy; in other words, controlled information fusion with social sensors can be consistent. Fourth, uniform bounds on the average additional cost incurred by the fusion center for employing a sub-optimal policy are provided. This characterizes the trade-off between the cost of information acquisition and consistency for the fusion center. Finally, when it is sufficient to estimate the state with a degree of confidence, uniform bounds on the budget saved by employing policies that guarantee state estimation in finite time are provided.

preprint2020arXiv

Policy Gradient using Weak Derivatives for Reinforcement Learning

This paper considers policy search in continuous state-action reinforcement learning problems. Typically, one computes search directions using a classic expression for the policy gradient called the Policy Gradient Theorem, which decomposes the gradient of the value function into two factors: the score function and the Q-function. This paper presents four results:(i) an alternative policy gradient theorem using weak (measure-valued) derivatives instead of score-function is established; (ii) the stochastic gradient estimates thus derived are shown to be unbiased and to yield algorithms that converge almost surely to stationary points of the non-convex value function of the reinforcement learning problem; (iii) the sample complexity of the algorithm is derived and is shown to be $O(1/\sqrt(k))$; (iv) finally, the expected variance of the gradient estimates obtained using weak derivatives is shown to be lower than those obtained using the popular score-function approach. Experiments on OpenAI gym pendulum environment show superior performance of the proposed algorithm.

preprint2020arXiv

Quickest Change Detection of Time Inconsistent Anticipatory Agents. Human-Sensor and Cyber-Physical Systems

In behavioral economics, human decision makers are modeled as anticipatory agents that make decisions by taking into account the probability of future decisions (plans). We consider cyber-physical systems involving the interaction between anticipatory agents and statistical detection. A sensing device records the decisions of an anticipatory agent. Given these decisions, how can the sensing device achieve quickest detection of a change in the anticipatory system? From a decision theoretic point of view, anticipatory models are time inconsistent meaning that Bellman's principle of optimality does not hold. The appropriate formalism is the subgame Nash equilibrium. We show that the interaction between anticipatory agents and sequential quickest detection results in unusual (nonconvex) structure of the quickest change detection policy. Our methodology yields a useful framework for situation awareness systems and anticipatory human decision makers interacting with sequential detectors.

preprint2020arXiv

Rationally Inattentive Inverse Reinforcement Learning Explains YouTube Commenting Behavior

We consider a novel application of inverse reinforcement learning with behavioral economics constraints to model, learn and predict the commenting behavior of YouTube viewers. Each group of users is modeled as a rationally inattentive Bayesian agent which solves a contextual bandit problem. Our methodology integrates three key components. First, to identify distinct commenting patterns, we use deep embedded clustering to estimate framing information (essential extrinsic features) that clusters users into distinct groups.Second, we present an inverse reinforcement learning algorithm that uses Bayesian revealed preferences to test for rationality: does there exist a utility function that rationalizes the given data, and if yes, can it be used to predict commenting behavior? Finally, we impose behavioral economics constraints stemming from rational inattention to characterize the attention span of groups of users. The test imposes a R{é}nyi mutual information cost constraint which impacts how the agent can select attention strategies to maximize their expected utility. After a careful analysis of a massive YouTube dataset, our surprising result is that in most YouTube user groups, the commenting behavior is consistent with optimizing a Bayesian utility with rationally inattentive constraints. The paper also highlights how the rational inattention model can accurately predict commenting behavior. The massive YouTube dataset and analysis used in this paper are available on GitHub and completely reproducible.

preprint2019arXiv

Friendship Paradox Biases Perceptions in Directed Networks

How popular a topic or an opinion appears to be in a network can be very different from its actual popularity. For example, in an online network of a social media platform, the number of people who mention a topic in their posts---i.e., its global popularity---can be dramatically different from how people see it in their social feeds---i.e., its perceived popularity---where the feeds aggregate their friends' posts. We trace the origin of this discrepancy to the friendship paradox in directed networks, which states that people are less popular than their friends (or followers) are, on average. We identify conditions on network structure that give rise to this perception bias, and validate the findings empirically using data from Twitter. Within messages posted by Twitter users in our sample, we identify topics that appear more frequently within the users' social feeds, than they do globally, i.e., among all posts. In addition, we present a polling algorithm that leverages the friendship paradox to obtain a statistically efficient estimate of a topic's global prevalence from biased perceptions of individuals. We characterize the bias of the polling estimate, provide an upper bound for its variance, and validate the algorithm's efficiency through synthetic polling experiments on our Twitter data. Our paper elucidates the non-intuitive ways in which the structure of directed networks can distort social perceptions and resulting behaviors.

preprint2019arXiv

How to Calibrate your Adversary's Capabilities? Inverse Filtering for Counter-Autonomous Systems

We consider an adversarial Bayesian signal processing problem involving "us" and an "adversary". The adversary observes our state in noise; updates its posterior distribution of the state and then chooses an action based on this posterior. Given knowledge of "our" state and sequence of adversary's actions observed in noise, we consider three problems: (i) How can the adversary's posterior distribution be estimated? Estimating the posterior is an inverse filtering problem involving a random measure - we formulate and solve several versions of this problem in a Bayesian setting. (ii) How can the adversary's observation likelihood be estimated? This tells us how accurate the adversary's sensors are. We compute the maximum likelihood estimator for the adversary's observation likelihood given our measurements of the adversary's actions where the adversary's actions are in response to estimating our state. (iii) How can the state be chosen by us to minimize the covariance of the estimate of the adversary's observation likelihood? "Our" state can be viewed as a probe signal which causes the adversary to act; so choosing the optimal state sequence is an input design problem. The above questions are motivated by the design of counter-autonomous systems: given measurements of the actions of a sophisticated autonomous adversary, how can our counter-autonomous system estimate the underlying belief of the adversary, predict future actions and therefore guard against these actions.

preprint2017arXiv

POMDP Structural Results for Controlled Sensing

This article provides a short review of some structural results in controlled sensing when the problem is formulated as a partially observed Markov decision process. In particular, monotone value functions, Blackwell dominance and quickest detection are described.

preprint2016arXiv

A Comprehensive Methodology for Volt-VAR Optimization in Active Smart Grids

This paper considers the problem of Volt-VAR Optimization (VVO) in active smart grids. Active smart grids are equipped with distributed generators, distributed storage systems, and tie-line switches that allow for topological reconfiguration. In this paper, the joint operation of the remotely controllable switches, storage units, under load tap changers, and shunt capacitors is formulated for a day-ahead operation scheme. The proposed VVO problem aims at minimizing the expected active power loss in the system and is formulated as a mixed-integer quadratic program. The stochasticity of the wind power generation is addressed through a first-order Markov chain model. Numerical results on a 33-node, 12.66 kV active smart grid using real data from smart meters and wind turbines are presented and compared for various test cases.

preprint2016arXiv

Engagement dynamics and sensitivity analysis of YouTube videos

YouTube, with millions of content creators, has become the preferred destination for watching videos online. Through the Partner program, YouTube allows content creators to monetize their popular videos. Of significant importance for content creators is which meta-level features (e.g. title, tag, thumbnail) are most sensitive for promoting video popularity. The popularity of videos also depends on the social dynamics, i.e. the interaction of the content creators (or channels) with YouTube users. Using real-world data consisting of about 6 million videos spread over 25 thousand channels, we empirically examine the sensitivity of YouTube meta-level features and social dynamics. The key meta-level features that impact the view counts of a video include: first day view count , number of subscribers, contrast of the video thumbnail, Google hits, number of keywords, video category, title length, and number of upper-case letters in the title respectively and illustrate that these meta-level features can be used to estimate the popularity of a video. In addition, optimizing the meta-level features after a video is posted increases the popularity of videos. In the context of social dynamics, we discover that there is a causal relationship between views to a channel and the associated number of subscribers. Additionally, insights into the effects of scheduling and video playthrough in a channel are also provided. Our findings provide a useful understanding of user engagement in YouTube.

preprint2016arXiv

Filterbased Stochastic Volatility in Continuous-Time Hidden Markov Models

Regime-switching models, in particular Hidden Markov Models (HMMs) where the switching is driven by an unobservable Markov chain, are widely-used in financial applications, due to their tractability and good econometric properties. In this work we consider HMMs in continuous time with both constant and switching volatility. In the continuous-time model with switching volatility the underlying Markov chain could be observed due to this stochastic volatility, and no estimation (filtering) of it is needed (in theory), while in the discretized model or the model with constant volatility one has to filter for the underlying Markov chain. The motivations for continuous-time models are explicit computations in finance. To have a realistic model with unobservable Markov chain in continuous time and good econometric properties we introduce a regime-switching model where the volatility depends on the filter for the underlying chain and state the filtering equations. We prove an approximation result for a fixed information filtration and further motivate the model by considering social learning arguments. We analyze its relation to the switching volatility model and present a convergence result for the discretized model. We then illustrate its econometric properties by considering numerical simulations.

preprint2016arXiv

Opportunistic Advertisement Scheduling in Live Social Media: A Multiple Stopping Time POMDP Approach

Live online social broadcasting services like YouTube Live and Twitch have steadily gained popularity due to improved bandwidth, ease of generating content and the ability to earn revenue on the generated content. In contrast to traditional cable television, revenue in online services is generated solely through advertisements, and depends on the number of clicks generated. Channel owners aim to opportunistically schedule advertisements so as to generate maximum revenue. This paper considers the problem of optimal scheduling of advertisements in live online social media. The problem is formulated as a multiple stopping problem and is addressed in a partially observed Markov decision process (POMDP) framework. Structural results are provided on the optimal advertisement scheduling policy. By exploiting the structure of the optimal policy, best linear thresholds are computed using stochastic approximation. The proposed model and framework are validated on real datasets, and the following observations are made: (i) The policy obtained by the multiple stopping problem can be used to detect changes in ground truth from online search data (ii) Numerical results show a significant improvement in the expected revenue by opportunistically scheduling the advertisements. The revenue can be improved by $20-30\%$ in comparison to currently employed periodic scheduling.

preprint2016arXiv

Partially Observed Markov Decision Processes. Problem Sets and Internet Supplement

This document is an internet supplement to my book "Partially Observed Markov Decision Processes - From Filtering to Controlled Sensing" published by Cambridge University Press in 2016. This internet supplement contains exercises, examples and case studies. The material appears in this internet supplement (instead of the book) so that it can be updated. This document will evolve over time and further discussion and examples will be added. This internet supplement document is work in progress and will be updated periodically. I welcome constructive comments from readers of the book and this internet supplement.

preprint2016arXiv

Tracking Infection Diffusion in Social Networks: Filtering Algorithms and Threshold Bounds

This paper deals with the statistical signal pro- cessing over graphs for tracking infection diffusion in social networks. Infection (or Information) diffusion is modeled using the Susceptible-Infected-Susceptible (SIS) model. Mean field approximation is employed to approximate the discrete valued infected degree distribution evolution by a deterministic ordinary differential equation for obtaining a generative model for the infection diffusion. The infected degree distribution is shown to follow polynomial dynamics and is estimated using an exact non- linear Bayesian filter. We compute posterior Cramer-Rao bounds to obtain the fundamental limits of the filter which depend on the structure of the network. Considering the time-varying nature of the real world networks, the relationship between the diffusion thresholds and the degree distribution is investigated using generative models for real world networks. In addition, we validate the efficacy of our method with the diffusion data from a real-world online social system, Twitter. We find that SIS model is a good fit for the information diffusion and the non-linear filter effectively tracks the information diffusion.

preprint2015arXiv

Myopic Bounds for Optimal Policy of POMDPs: An extension of Lovejoy's structural results

This paper provides a relaxation of the sufficient conditions, and also an extension of the structural results for Partially Observed Markov Decision Processes (POMDPs) given in Lovejoy (1987). Sufficient conditions are provided so that the optimal policy can be upper and lower bounded by judiciously chosen myopic policies. These myopic policy bounds are constructed to maximize the volume of belief states where they coincide with the optimal policy. Numerical examples illustrate these myopic bounds for both continuous and discrete observation sets.

preprint2015arXiv

Online Reputation and Polling Systems: Data Incest, Social Learning and Revealed Preferences

This paper considers online reputation and polling systems where individuals make recommendations based on their private observations and recommendations of friends. Such interaction of individuals and their social influence is modelled as social learning on a directed acyclic graph. Data incest (misinformation propagation) occurs due to unintentional re-use of identical actions in the for- mation of public belief in social learning; the information gathered by each agent is mistakenly considered to be independent. This results in overconfidence and bias in estimates of the state. Necessary and sufficient conditions are given on the structure of information exchange graph to mitigate data incest. Incest removal algorithms are presented. Experimental results on human subjects are presented to illustrate the effect of social influence and data incest on decision making. These experimental results indicate that social learning protocols require careful design to handle and mitigate data incest. The incest removal algorithms are illustrated in an expectation polling system where participants in a poll respond with a summary of their friends' beliefs. Finally, the principle of revealed preferences arising in micro-economics theory is used to parse Twitter datasets to determine if social sensors are utility maximizers and then determine their utility functions.

preprint2015arXiv

Sequential Detection of Market shocks using Risk-averse Agent Based Models

This paper considers a statistical signal processing problem involving agent based models of financial markets which at a micro-level are driven by socially aware and risk- averse trading agents. These agents trade (buy or sell) stocks by exploiting information about the decisions of previous agents (social learning) via an order book in addition to a private (noisy) signal they receive on the value of the stock. We are interested in the following: (1) Modelling the dynamics of these risk averse agents, (2) Sequential detection of a market shock based on the behaviour of these agents. Structural results which characterize social learning under a risk measure, CVaR (Conditional Value-at-risk), are presented and formulation of the Bayesian change point detection problem is provided. The structural results exhibit two interesting prop- erties: (i) Risk averse agents herd more often than risk neutral agents (ii) The stopping set in the sequential detection problem is non-convex. The framework is validated on data from the Yahoo! Tech Buzz game dataset.

preprint2015arXiv

Structural Results for Partially Observed Markov Decision Processes

This article provides an introductory tutorial on structural results in partially observed Markov decision processes (POMDPs). Typically, computing the optimal policy of a POMDP is computationally intractable. We use lattice program- ming methods to characterize the structure of the optimal policy of a POMDP without brute force computations.

preprint2014arXiv

Adaptive Search Algorithms for Discrete Stochastic Optimization: A Smooth Best-Response Approach

This paper considers simulation-based optimization of the performance of a regime-switching stochastic system over a finite set of feasible configurations. Inspired by the stochastic fictitious play learning rules in game theory, we propose an adaptive simulation-based search algorithm that uses a smooth best-response sampling strategy and tracks the set of global optima, yet distributes the search so that most of the effort is spent on simulating the system performance at the global optima. The algorithm converges weakly to the set of global optima even when the observation data is correlated (as long as a weak law of large numbers holds). Numerical examples show that the proposed scheme yields a faster convergence for finite sample lengths compared with several existing random search and pure exploration methods in the literature.

preprint2014arXiv

Boundary value problems in consensus networks

This paper studies the effect of boundary value conditions on consensus networks. Consider a network where some nodes keep their estimates constant while other nodes average their estimates with that of their neighbors. We analyze such networks and show that in contrast to standard consensus networks, the network estimate converges to a general harmonic function on the graph. Furthermore, the final value depends only on the value at the boundary nodes. This has important implications in consensus networks -- for example, we show that consensus networks are extremely sensitive to the existence of a single malicious node or consistent errors in a single node. We also discuss applications of this result in social and sensor networks. We investigate the existence of boundary nodes in human social networks via an experimental study involving human subjects. Finally, the paper is concluded with the numerical studies of the boundary value problems in consensus networks.

preprint2014arXiv

Interactive Sensing and Decision Making in Social Networks

The proliferation of social media such as real time microblogging and online reputation systems facilitate real time sensing of social patterns and behavior. In the last decade, sensing and decision making in social networks have witnessed significant progress in the electrical engineering, computer science, economics, finance, and sociology research communities. Research in this area involves the interaction of dynamic random graphs, socio-economic analysis, and statistical inference algorithms. This monograph provides a survey, tutorial development, and discussion of four highly stylized examples: social learning for interactive sensing; tracking the degree distribution of social networks; sensing and information diffusion; and coordination of decision making via game-theoretic learning. Each of the four examples is motivated by practical examples, and comprises of a literature survey together with careful problem formulation and mathematical analysis. Despite being highly stylized, these examples provide a rich variety of models, algorithms and analysis tools that are readily accessible to a signal processing, control/systems theory, and applied mathematics audience.

preprint2014arXiv

Reduced Complexity Filtering with Stochastic Dominance Bounds: A Convex Optimization Approach

This paper uses stochastic dominance principles to construct upper and lower sample path bounds for Hidden Markov Model (HMM) filters. Given a HMM, by using convex optimization methods for nuclear norm minimization with copositive constraints, we construct low rank stochastic marices so that the optimal filters using these matrices provably lower and upper bound (with respect to a partially ordered set) the true filtered distribution at each time instant. Since these matrices are low rank (say R), the computational cost of evaluating the filtering bounds is O(XR) instead of O(X2). A Monte-Carlo importance sampling filter is presented that exploits these upper and lower bounds to estimate the optimal posterior. Finally, using the Dobrushin coefficient, explicit bounds are given on the variational norm between the true posterior and the upper and lower bounds.

preprint2014arXiv

Reinforcement Learning and Nonparametric Detection of Game-Theoretic Equilibrium Play in Social Networks

This paper studies two important signal processing aspects of equilibrium behavior in non-cooperative games arising in social networks, namely, reinforcement learning and detection of equilibrium play. The first part of the paper presents a reinforcement learning (adaptive filtering) algorithm that facilitates learning an equilibrium by resorting to diffusion cooperation strategies in a social network. Agents form homophilic social groups, within which they exchange past experiences over an undirected graph. It is shown that, if all agents follow the proposed algorithm, their global behavior is attracted to the correlated equilibria set of the game. The second part of the paper provides a test to detect if the actions of agents are consistent with play from the equilibrium of a concave potential game. The theory of revealed preference from microeconomics is used to construct a non-parametric decision test and statistical test which only require the probe and associated actions of agents. A stochastic gradient algorithm is given to optimize the probe in real time to minimize the Type-II error probabilities of the detection test subject to specified Type-I error probability. We provide a real-world example using the energy market, and a numerical example to detect malicious agents in an online social network.

preprint2014arXiv

Social Learning in a Human Society: An Experimental Study

This paper presents an experimental study to investigate the learning and decision making behavior of individuals in a human society. Social learning is used as the mathematical basis for modelling interaction of individuals that aim to perform a perceptual task interactively. A psychology experiment was conducted on a group of undergraduate students at the University of British Columbia to examine whether the decision (action) of one individual affects the decision of the subsequent individuals. The major experimental observation that stands out here is that the participants of the experiment (agents) were affected by decisions of their partners in a relatively large fraction (60%) of trials. We fit a social learning model that mimics the interactions between participants of the psychology experiment. Misinformation propagation (also known as data incest) within the society under study is further investigated in this paper.

preprint2013arXiv

Interactive Sensing in Social Networks

This paper presents models and algorithms for interactive sensing in social networks where individuals act as sensors and the information exchange between individuals is exploited to optimize sensing. Social learning is used to model the interaction between individuals that aim to estimate an underlying state of nature. In this context the following questions are addressed: How can self-interested agents that interact via social learning achieve a tradeoff between individual privacy and reputation of the social group? How can protocols be designed to prevent data incest in online reputation blogs where individuals make recommendations? How can sensing by individuals that interact with each other be used by a global decision maker to detect changes in the underlying state of nature? When individual agents possess limited sensing, computation and communication capabilities, can a network of agents achieve sophisticated global behavior? Social and game theoretic learning are natural settings for addressing these questions. This article presents an overview, insights and discussion of social learning models in the context of data incest propagation, change detection and coordination of decision making.

preprint2013arXiv

Removal of Data Incest in Multi-agent Social Learning in Social Networks

Motivated by online reputation systems, we investigate social learning in a network where agents interact on a time dependent graph to estimate an underlying state of nature. Agents record their own private observations, then update their private beliefs about the state of nature using Bayes' rule. Based on their belief, each agent then chooses an action (rating) from a finite set and transmits this action over the social network. An important consequence of such social learning over a network is the ruinous multiple re-use of information known as data incest (or mis-information propagation). In this paper, the data incest management problem in social learning context is formulated on a directed acyclic graph. We give necessary and sufficient conditions on the graph topology of social interactions to eliminate data incest. A data incest removal algorithm is proposed such that the public belief of social learning (and hence the actions of agents) is not affected by data incest propagation. This results in an online reputation system with a higher trust rating. Numerical examples are provided to illustrate the performance of the proposed optimal data incest removal algorithm.

preprint2013arXiv

Tracking the Empirical Distribution of a Markov-modulated Duplication-Deletion Random Graph

This paper considers a Markov-modulated duplication-deletion random graph where at each time instant, one node can either join or leave the network; the probabilities of joining or leaving evolve according to the realization of a finite state Markov chain. The paper comprises of 2 results. First, motivated by social network applications, we analyze the asymptotic behavior of the degree distribution of the Markov-modulated random graph. Using the asymptotic degree distribution, an expression is obtained for the delay in searching such graphs. Second, a stochastic approximation algorithm is presented to track empirical degree distribution as it evolves over time. The tracking performance of the algorithm is analyzed in terms of mean square error and a functional central limit theorem is presented for the asymptotic tracking error.

preprint2012arXiv

Quickest Detection with Social Learning: Interaction of local and global decision makers

We consider how local and global decision policies interact in stopping time problems such as quickest time change detection. Individual agents make myopic local decisions via social learning, that is, each agent records a private observation of a noisy underlying state process, selfishly optimizes its local utility and then broadcasts its local decision. Given these local decisions, how can a global decision maker achieve quickest time change detection when the underlying state changes according to a phase-type distribution? The paper presents four results. First, using Blackwell dominance of measures, it is shown that the optimal cost incurred in social learning based quickest detection is always larger than that of classical quickest detection. Second, it is shown that in general the optimal decision policy for social learning based quickest detection is characterized by multiple thresholds within the space of Bayesian distributions. Third, using lattice programming and stochastic dominance, sufficient conditions are given for the optimal decision policy to consist of a single linear hyperplane, or, more generally, a threshold curve. Estimation of the optimal linear approximation to this threshold curve is formulated as a simulation-based stochastic optimization problem. Finally, the paper shows that in multi-agent sensor management with quickest detection, where each agent views the world according to its prior, the optimal policy has a similar structure to social learning.

preprint2012arXiv

When to look at a noisy Markov chain in sequential decision making if measurements are costly?

A decision maker records measurements of a finite-state Markov chain corrupted by noise. The goal is to decide when the Markov chain hits a specific target state. The decision maker can choose from a finite set of sampling intervals to pick the next time to look at the Markov chain. The aim is to optimize an objective comprising of false alarm, delay cost and cumulative measurement sampling cost. Taking more frequent measurements yields accurate estimates but incurs a higher measurement cost. Making an erroneous decision too soon incurs a false alarm penalty. Waiting too long to declare the target state incurs a delay penalty. What is the optimal sequential strategy for the decision maker? The paper shows that under reasonable conditions, the optimal strategy has the following intuitive structure: when the Bayesian estimate (posterior distribution) of the Markov chain is away from the target state, look less frequently; while if the posterior is close to the target state, look more frequently. Bounds are derived for the optimal strategy. Also the achievable optimal cost of the sequential detector as a function of transition dynamics and observation distribution is analyzed. The sensitivity of the optimal achievable cost to parameter variations is bounded in terms of the Kullback divergence. To prove the results in this paper, novel stochastic dominance results on the Bayesian filtering recursion are derived. The formulation in this paper generalizes quickest time change detection to consider optimal sampling and also yields useful results in sensor scheduling (active sensing).

preprint2011arXiv

Average-Consensus Algorithms in a Deterministic Framework

We consider the average-consensus problem in a multi-node network of finite size. Communication between nodes is modeled by a sequence of directed signals with arbitrary communication delays. Four distributed algorithms that achieve average-consensus are proposed. Necessary and sufficient communication conditions are given for each algorithm to achieve average-consensus. Resource costs for each algorithm are derived based on the number of scalar values that are required for communication and storage at each node. Numerical examples are provided to illustrate the empirical convergence rate of the four algorithms in comparison with a well-known "gossip" algorithm as well as a randomized information spreading algorithm when assuming a fully connected random graph with instantaneous communication.

preprint2011arXiv

Bayesian Sequential Detection with Phase-Distributed Change Time and Nonlinear Penalty -- A POMDP Approach

We show that the optimal decision policy for several types of Bayesian sequential detection problems has a threshold switching curve structure on the space of posterior distributions. This is established by using lattice programming and stochastic orders in a partially observed Markov decision process (POMDP) framework. A stochastic gradient algorithm is presented to estimate the optimal linear approximation to this threshold curve. We illustrate these results by first considering quickest time detection with phase-type distributed change time and a variance stopping penalty. Then it is proved that the threshold switching curve also arises in several other Bayesian decision problems such as quickest transient detection, exponential delay (risk-sensitive) penalties, stopping time problems in social learning, and multi-agent scheduling in a changing world. Using Blackwell dominance, it is shown that for dynamic decision making problems, the optimal decision policy is lower bounded by a myopic policy. Finally, it is shown how the achievable cost of the optimal decision policy varies with change time distribution by imposing a partial order on transition matrices.

preprint2011arXiv

Biosensor Arrays for Estimating Molecular Concentration in Fluid Flows

This paper constructs dynamical models and estimation algorithms for the concentration of target molecules in a fluid flow using an array of novel biosensors. Each biosensor is constructed out of protein molecules embedded in a synthetic cell membrane. The concentration evolves according to an advection-diffusion partial differential equation which is coupled with chemical reaction equations on the biosensor surface. By using averaging theory methods and the divergence theorem, an approximate model is constructed that describes the asymptotic behaviour of the concentration as a system of ordinary differential equations. The estimate of target molecules is then obtained by solving a nonlinear least squares problem. It is shown that the estimator is strongly consistent and asymptotically normal. An explicit expression is obtained for the asymptotic variance of the estimation error. As an example, the results are illustrated for a novel biosensor built out of protein molecules.

preprint2011arXiv

Intent Inference and Syntactic Tracking with GMTI Measurements

In conventional target tracking systems, human operators use the estimated target tracks to make higher level inference of the target behaviour/intent. This paper develops syntactic filtering algorithms that assist human operators by extracting spatial patterns from target tracks to identify suspicious/anomalous spatial trajectories. The targets' spatial trajectories are modeled by a stochastic context free grammar (SCFG) and a switched mode state space model. Bayesian filtering algorithms for stochastic context free grammars are presented for extracting the syntactic structure and illustrated for a ground moving target indicator (GMTI) radar example. The performance of the algorithms is tested with the experimental data collected using DRDC Ottawa's X-band Wideband Experimental Airborne Radar (XWEAR).

preprint2011arXiv

QoS Provisioning for Multimedia Transmission in Cognitive Radio Networks

In cognitive radio (CR) networks, the perceived reduction of application layer quality of service (QoS), such as multimedia distortion, by secondary users may impede the success of CR technologies. Most previous work in CR networks ignores application layer QoS. In this paper we take an integrated design approach to jointly optimize multimedia intra refreshing rate, an application layer parameter, together with access strategy, and spectrum sensing for multimedia transmission in a CR system with time varying wireless channels. Primary network usage and channel gain are modeled as a finite state Markov process. With channel sensing and channel state information errors, the system state cannot be directly observed. We formulate the QoS optimization problem as a partially observable Markov decision process (POMDP). A low complexity dynamic programming framework is presented to obtain the optimal policy. Simulation results show the effectiveness of the proposed scheme.

preprint2011arXiv

Quickest Time Herding and Detection for Optimal Social Learning

This paper considers social learning amongst rational agents (for example, sensors in a network). We consider three models of social learning in increasing order of sophistication. In the first model, based on its private observation of a noisy underlying state process, each agent selfishly optimizes its local utility and broadcasts its action. This protocol leads to a herding behavior where the agents eventually choose the same action irrespective of their observations. We then formulate a second more general model where each agent is benevolent and chooses its sensor-mode to optimize a social welfare function to facilitate social learning. Using lattice programming and stochastic orders, it is shown that the optimal decision each agent makes is characterized by a switching curve on the space of Bayesian distributions. We then present a third more general model where social learning takes place to achieve quickest time change detection. Both geometric and phase-type change time distributions are considered. It is proved that the optimal decision is again characterized by a switching curve We present a stochastic approximation (adaptive filtering) algorithms to estimate this switching curve. Finally, we present extensions of the social learning model in a changing world (Markovian target) where agents learn in multiple iterations. By using Blackwell stochastic dominance, we give conditions under which myopic decisions are optimal. We also analyze the effect of target dynamics on the social welfare cost.

preprint2011arXiv

Sequential Detection with Mutual Information Stopping Cost

This paper formulates and solves a sequential detection problem that involves the mutual information (stochastic observability) of a Gaussian process observed in noise with missing measurements. The main result is that the optimal decision is characterized by a monotone policy on the partially ordered set of positive definite covariance matrices. This monotone structure implies that numerically efficient algorithms can be designed to estimate and implement monotone parametrized decision policies.The sequential detection problem is motivated by applications in radar scheduling where the aim is to maintain the mutual information of all targets within a specified bound. We illustrate the problem formulation and performance of monotone parametrized policies via numerical examples in fly-by and persistent-surveillance applications involving a GMTI (Ground Moving Target Indicator) radar.

Vikram Krishnamurthy

What is connected

Connect this record

See the researcher in context

Building this map preview

48 published item(s)

Adaptive Filtering Algorithms for Set-Valued Observations -- Symmetric Measurement Approach to Unlabeled and Anonymized Data

Estimating Exposure to Information on Social Networks

Hawkes Process Modeling of Block Arrivals in Bitcoin Blockchain

Inverse-Inverse Reinforcement Learning. How to Hide Strategy from an Adversarial Inverse Reinforcement Learner

Lyapunov based Stochastic Stability of a Quantum Decision System for Human-Machine Interaction

Lyapunov based Stochastic Stability of Human-Machine Interaction: A Quantum Decision System Approach

Meta-Cognition. An Inverse-Inverse Reinforcement Learning Approach for Cognitive Radars

Quickest Detection for Human-Sensor Systems using Quantum Decision Theory

Langevin Dynamics for Adaptive Inverse Reinforcement Learning of Stochastic Gradient Algorithms

Maximum Likelihood Estimation of Power-law Degree Distributions via Friendship Paradox based Sampling

Multi-kernel Passive Stochastic Gradient Algorithms and Transfer Learning

A Markov Decision Process Approach to Active Meta Learning

Adversarial Radar Inference. From Inverse Tracking to Inverse Reinforcement Learning of Cognitive Radar

Controlled Sequential Information Fusion with Social Sensors

Policy Gradient using Weak Derivatives for Reinforcement Learning

Quickest Change Detection of Time Inconsistent Anticipatory Agents. Human-Sensor and Cyber-Physical Systems

Rationally Inattentive Inverse Reinforcement Learning Explains YouTube Commenting Behavior

Friendship Paradox Biases Perceptions in Directed Networks

How to Calibrate your Adversary's Capabilities? Inverse Filtering for Counter-Autonomous Systems

POMDP Structural Results for Controlled Sensing

A Comprehensive Methodology for Volt-VAR Optimization in Active Smart Grids

Engagement dynamics and sensitivity analysis of YouTube videos

Filterbased Stochastic Volatility in Continuous-Time Hidden Markov Models

Opportunistic Advertisement Scheduling in Live Social Media: A Multiple Stopping Time POMDP Approach

Partially Observed Markov Decision Processes. Problem Sets and Internet Supplement

Tracking Infection Diffusion in Social Networks: Filtering Algorithms and Threshold Bounds

Myopic Bounds for Optimal Policy of POMDPs: An extension of Lovejoy's structural results

Online Reputation and Polling Systems: Data Incest, Social Learning and Revealed Preferences

Sequential Detection of Market shocks using Risk-averse Agent Based Models

Structural Results for Partially Observed Markov Decision Processes

Adaptive Search Algorithms for Discrete Stochastic Optimization: A Smooth Best-Response Approach

Boundary value problems in consensus networks

Interactive Sensing and Decision Making in Social Networks

Reduced Complexity Filtering with Stochastic Dominance Bounds: A Convex Optimization Approach

Reinforcement Learning and Nonparametric Detection of Game-Theoretic Equilibrium Play in Social Networks

Social Learning in a Human Society: An Experimental Study

Interactive Sensing in Social Networks

Removal of Data Incest in Multi-agent Social Learning in Social Networks

Tracking the Empirical Distribution of a Markov-modulated Duplication-Deletion Random Graph

Quickest Detection with Social Learning: Interaction of local and global decision makers

When to look at a noisy Markov chain in sequential decision making if measurements are costly?

Average-Consensus Algorithms in a Deterministic Framework

Bayesian Sequential Detection with Phase-Distributed Change Time and Nonlinear Penalty -- A POMDP Approach

Biosensor Arrays for Estimating Molecular Concentration in Fluid Flows

Intent Inference and Syntactic Tracking with GMTI Measurements

QoS Provisioning for Multimedia Transmission in Cognitive Radio Networks

Quickest Time Herding and Detection for Optimal Social Learning

Sequential Detection with Mutual Information Stopping Cost