Researcher profile

Vikram Krishnamurthy

Vikram Krishnamurthy contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
19works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

19 published item(s)

preprint2022arXiv

Adaptive Filtering Algorithms for Set-Valued Observations -- Symmetric Measurement Approach to Unlabeled and Anonymized Data

Suppose $L$ simultaneous independent stochastic systems generate observations, where the observations from each system depend on the underlying parameter of that system. The observations are unlabeled (anonymized), in the sense that an analyst does not know which observation came from which stochastic system. How can the analyst estimate the underlying parameters of the $L$ systems? Since the anonymized observations at each time are an unordered set of L measurements (rather than a vector), classical stochastic gradient algorithms cannot be directly used. By using symmetric polynomials, we formulate a symmetric measurement equation that maps the observation set to a unique vector. By exploiting that fact that the algebraic ring of multi-variable polynomials is a unique factorization domain over the ring of one-variable polynomials, we construct an adaptive filtering algorithm that yields a statistically consistent estimate of the underlying parameters. We analyze the asymptotic covariance of these estimates to quantify the effect of anonymization. Finally, we characterize the anonymity of the observations in terms of the error probability of the maximum aposteriori Bayesian estimator. Using Blackwell dominance of mean preserving spreads, we construct a partial ordering of the noise densities which relates the anonymity of the observations to the asymptotic covariance of the adaptive filtering algorithm.

preprint2022arXiv

Estimating Exposure to Information on Social Networks

This paper considers the problem of estimating exposure to information in a social network. Given a piece of information (e.g., a URL of a news article on Facebook, a hashtag on Twitter), our aim is to find the fraction of people on the network who have been exposed to it. The exact value of exposure to a piece of information is determined by two features: the structure of the underlying social network and the set of people who shared the piece of information. Often, both features are not publicly available (i.e., access to the two features is limited only to the internal administrators of the platform) and difficult to be estimated from data. As a solution, we propose two methods to estimate the exposure to a piece of information in an unbiased manner: a vanilla method which is based on sampling the network uniformly and a method which non-uniformly samples the network motivated by the Friendship Paradox. We provide theoretical results which characterize the conditions (in terms of properties of the network and the piece of information) under which one method outperforms the other. Further, we outline extensions of the proposed methods to dynamic information cascades (where the exposure needs to be tracked in real-time). We demonstrate the practical feasibility of the proposed methods via experiments on multiple synthetic and real-world datasets.

preprint2022arXiv

Hawkes Process Modeling of Block Arrivals in Bitcoin Blockchain

The paper constructs a multi-variate Hawkes process model of Bitcoin block arrivals and price jumps. Hawkes processes are selfexciting point processes that can capture the self- and cross-excitation effects of block mining and Bitcoin price volatility. We use publicly available blockchain datasets to estimate the model parameters via maximum likelihood estimation. The results show that Bitcoin price volatility boost block mining rate and Bitcoin investment return demonstrates mean reversion. Quantile-Quantile plots show that the proposed Hawkes process model is a better fit to the blockchain datasets than a Poisson process model.

preprint2022arXiv

Inverse-Inverse Reinforcement Learning. How to Hide Strategy from an Adversarial Inverse Reinforcement Learner

Inverse reinforcement learning (IRL) deals with estimating an agent's utility function from its actions. In this paper, we consider how an agent can hide its strategy and mitigate an adversarial IRL attack; we call this inverse IRL (I-IRL). How should the decision maker choose its response to ensure a poor reconstruction of its strategy by an adversary performing IRL to estimate the agent's strategy? This paper comprises four results: First, we present an adversarial IRL algorithm that estimates the agent's strategy while controlling the agent's utility function. Our second result for I-IRL result spoofs the IRL algorithm used by the adversary. Our I-IRL results are based on revealed preference theory in micro-economics. The key idea is for the agent to deliberately choose sub-optimal responses that sufficiently masks its true strategy. Third, we give a sample complexity result for our main I-IRL result when the agent has noisy estimates of the adversary specified utility function. Finally, we illustrate our I-IRL scheme in a radar problem where a meta-cognitive radar is trying to mitigate an adversarial target.

preprint2022arXiv

Lyapunov based Stochastic Stability of a Quantum Decision System for Human-Machine Interaction

In mathematical psychology, decision makers are modeled using the Lindbladian equations from quantum mechanics to capture important human-centric features such as order effects and violation of the sure thing principle. We consider human-machine interaction involving a quantum decision maker (human) and a controller (machine). Given a sequence of human decisions over time, how can the controller dynamically provide input messages to adapt these decisions so as to converge to a specific decision? We show via novel stochastic Lyapunov arguments how the Lindbladian dynamics of the quantum decision maker can be controlled to converge to a specific decision asymptotically. Our methodology yields a useful mathematical framework for human-sensor decision making. The stochastic Lyapunov results are also of independent interest as they generalize recent results in the literature.

preprint2022arXiv

Lyapunov based Stochastic Stability of Human-Machine Interaction: A Quantum Decision System Approach

In mathematical psychology, decision makers are modeled using the Lindbladian equations from quantum mechanics to capture important human-centric features such as order effects and violation of the sure thing principle. We consider human-machine interaction involving a quantum decision maker (human) and a controller (machine). Given a sequence of human decisions over time, how can the controller dynamically provide input messages to adapt these decisions so as to converge to a specific decision? We show via novel stochastic Lyapunov arguments how the Lindbladian dynamics of the quantum decision maker can be controlled to converge to a specific decision asymptotically. Our methodology yields a useful mathematical framework for human-sensor decision making. The stochastic Lyapunov results are also of independent interest as they generalize recent results in the literature.

preprint2022arXiv

Meta-Cognition. An Inverse-Inverse Reinforcement Learning Approach for Cognitive Radars

This paper considers meta-cognitive radars in an adversarial setting. A cognitive radar optimally adapts its waveform (response) in response to maneuvers (probes) of a possibly adversarial moving target. A meta-cognitive radar is aware of the adversarial nature of the target and seeks to mitigate the adversarial target. How should the meta-cognitive radar choose its responses to sufficiently confuse the adversary trying to estimate the radar's utility function? This paper abstracts the radar's meta-cognition problem in terms of the spectra (eigenvalues) of the state and observation noise covariance matrices, and embeds the algebraic Riccati equation into an economics-based utility maximization setup. This adversarial target is an inverse reinforcement learner. By observing a noisy sequence of radar's responses (waveforms), the adversarial target uses a statistical hypothesis test to detect if the radar is a utility maximizer. In turn, the meta-cognitive radar deliberately chooses sub-optimal responses that increasing its Type-I error probability of the adversary's detector. We call this counter-adversarial step taken by the meta-cognitive radar as inverse inverse reinforcement learning (I-IRL). We illustrate the meta-cognition results of this paper via simple numerical examples. Our approach for meta-cognition in this paper is based on revealed preference theory in micro-economics and inspired by results in differential privacy and adversarial obfuscation in machine learning.

preprint2022arXiv

Quickest Detection for Human-Sensor Systems using Quantum Decision Theory

In mathematical psychology, recent models for human decision-making use Quantum Decision Theory to capture important human-centric features such as order effects and violation of the sure-thing principle (total probability law). We construct and analyze a human-sensor system where a quickest detector aims to detect a change in an underlying state by observing human decisions that are influenced by the state. Apart from providing an analytical framework for such human-sensor systems, we also analyze the structure of the quickest detection policy. We show that the quickest detection policy has a single threshold and the optimal cost incurred is lower bounded by that of the classical quickest detector. This indicates that intermediate human decisions strictly hinder detection performance. We also analyze the sensitivity of the quickest detection cost with respect to the quantum decision parameters of the human decision maker, revealing that the performance is robust to inaccurate knowledge of the decision-making process. Numerical results are provided which suggest that observing the decisions of more rational decision makers will improve the quickest detection performance. Finally, we illustrate a numerical implementation of this quickest detector in the context of the Prisoner's Dilemma problem, in which it has been observed that Quantum Decision Theory can uniquely model empirically tested violations of the sure-thing principle.

preprint2021arXiv

Langevin Dynamics for Adaptive Inverse Reinforcement Learning of Stochastic Gradient Algorithms

Inverse reinforcement learning (IRL) aims to estimate the reward function of optimizing agents by observing their response (estimates or actions). This paper considers IRL when noisy estimates of the gradient of a reward function generated by multiple stochastic gradient agents are observed. We present a generalized Langevin dynamics algorithm to estimate the reward function $R(θ)$; specifically, the resulting Langevin algorithm asymptotically generates samples from the distribution proportional to $\exp(R(θ))$. The proposed IRL algorithms use kernel-based passive learning schemes. We also construct multi-kernel passive Langevin algorithms for IRL which are suitable for high dimensional data. The performance of the proposed IRL algorithms are illustrated on examples in adaptive Bayesian learning, logistic regression (high dimensional problem) and constrained Markov decision processes. We prove weak convergence of the proposed IRL algorithms using martingale averaging methods. We also analyze the tracking performance of the IRL algorithms in non-stationary environments where the utility function $R(θ)$ jump changes over time as a slow Markov chain.

preprint2021arXiv

Maximum Likelihood Estimation of Power-law Degree Distributions via Friendship Paradox based Sampling

This paper considers the problem of estimating a power-law degree distribution of an undirected network using sampled data. Although power-law degree distributions are ubiquitous in nature, the widely used parametric methods for estimating them (e.g. linear regression on double-logarithmic axes, maximum likelihood estimation with uniformly sampled nodes) suffer from the large variance introduced by the lack of data-points from the tail portion of the power-law degree distribution. As a solution, we present a novel maximum likelihood estimation approach that exploits the friendship paradox to sample more efficiently from the tail of the degree distribution. We analytically show that the proposed method results in a smaller bias, variance and a Cramer-Rao lower bound compared to the vanilla maximum-likelihood estimate obtained with uniformly sampled nodes (which is the most commonly used method in literature). Detailed numerical and empirical results are presented to illustrate the performance of the proposed method under different conditions and how it compares with alternative methods. We also show that the proposed method and its desirable properties (i.e. smaller bias, variance and Cramer-Rao lower bound compared to vanilla method based on uniform samples) extend to parametric degree distributions other than the power-law such as exponential degree distributions as well. All the numerical and empirical results are reproducible and the code is publicly available on Github.

preprint2021arXiv

Multi-kernel Passive Stochastic Gradient Algorithms and Transfer Learning

This paper develops a novel passive stochastic gradient algorithm. In passive stochastic approximation, the stochastic gradient algorithm does not have control over the location where noisy gradients of the cost function are evaluated. Classical passive stochastic gradient algorithms use a kernel that approximates a Dirac delta to weigh the gradients based on how far they are evaluated from the desired point. In this paper we construct a multi-kernel passive stochastic gradient algorithm. The algorithm performs substantially better in high dimensional problems and incorporates variance reduction. We analyze the weak convergence of the multi-kernel algorithm and its rate of convergence. In numerical examples, we study the multi-kernel version of the passive least mean squares (LMS) algorithm for transfer learning to compare the performance with the classical passive version.

preprint2020arXiv

A Markov Decision Process Approach to Active Meta Learning

In supervised learning, we fit a single statistical model to a given data set, assuming that the data is associated with a singular task, which yields well-tuned models for specific use, but does not adapt well to new contexts. By contrast, in meta-learning, the data is associated with numerous tasks, and we seek a model that may perform well on all tasks simultaneously, in pursuit of greater generalization. One challenge in meta-learning is how to exploit relationships between tasks and classes, which is overlooked by commonly used random or cyclic passes through data. In this work, we propose actively selecting samples on which to train by discerning covariates inside and between meta-training sets. Specifically, we cast the problem of selecting a sample from a number of meta-training sets as either a multi-armed bandit or a Markov Decision Process (MDP), depending on how one encapsulates correlation across tasks. We develop scheduling schemes based on Upper Confidence Bound (UCB), Gittins Index and tabular Markov Decision Problems (MDPs) solved with linear programming, where the reward is the scaled statistical accuracy to ensure it is a time-invariant function of state and action. Across a variety of experimental contexts, we observe significant reductions in sample complexity of active selection scheme relative to cyclic or i.i.d. sampling, demonstrating the merit of exploiting covariates in practice.

preprint2020arXiv

Adversarial Radar Inference. From Inverse Tracking to Inverse Reinforcement Learning of Cognitive Radar

Cognitive sensing refers to a reconfigurable sensor that dynamically adapts its sensing mechanism by using stochastic control to optimize its sensing resources. For example, cognitive radars are sophisticated dynamical systems; they use stochastic control to sense the environment, learn from it relevant information about the target and background, then adapt the radar sensor to satisfy the needs of their mission. The last two decades have witnessed intense research in cognitive/adaptive radars.This paper discusses addresses the next logical step, namely inverse cognitive sensing. By observing the emissions of a sensor (e.g. radar or in general a controlled stochastic dynamical system) in real time, how can we detect if the sensor is cognitive (rational utility maximizer) and how can we predict its future actions? The scientific challenges involve extending Bayesian filtering, inverse reinforcement learning and stochastic optimization of dynamical systems to a data-driven adversarial setting. Our methodology transcends classical statistical signal processing (sensing and estimation/detection theory) to address the deeper issue of how to infer strategy from sensing. The generative models, adversarial inference algorithms and associated mathematical analysis will lead to advances in understanding how sophisticated adaptive sensors such as cognitive radars operate.

preprint2020arXiv

Controlled Sequential Information Fusion with Social Sensors

A sequence of social sensors estimate an unknown parameter (modeled as a state of nature) by performing Bayesian Social Learning, and myopically optimize individual reward functions. The decisions of the social sensors contain quantized information about the underlying state. How should a fusion center dynamically incentivize the social sensors for acquiring information about the underlying state? This paper presents five results. First, sufficient conditions on the model parameters are provided under which the optimal policy for the fusion center has a threshold structure. The optimal policy is determined in closed form, and is such that it switches between two exactly specified incentive policies at the threshold. Second, it is shown that the optimal incentive sequence is a sub-martingale, i.e, the optimal incentives increase on average over time. Third, it is shown that it is possible for the fusion center to learn the true state asymptotically by employing a sub-optimal policy; in other words, controlled information fusion with social sensors can be consistent. Fourth, uniform bounds on the average additional cost incurred by the fusion center for employing a sub-optimal policy are provided. This characterizes the trade-off between the cost of information acquisition and consistency for the fusion center. Finally, when it is sufficient to estimate the state with a degree of confidence, uniform bounds on the budget saved by employing policies that guarantee state estimation in finite time are provided.

preprint2020arXiv

Policy Gradient using Weak Derivatives for Reinforcement Learning

This paper considers policy search in continuous state-action reinforcement learning problems. Typically, one computes search directions using a classic expression for the policy gradient called the Policy Gradient Theorem, which decomposes the gradient of the value function into two factors: the score function and the Q-function. This paper presents four results:(i) an alternative policy gradient theorem using weak (measure-valued) derivatives instead of score-function is established; (ii) the stochastic gradient estimates thus derived are shown to be unbiased and to yield algorithms that converge almost surely to stationary points of the non-convex value function of the reinforcement learning problem; (iii) the sample complexity of the algorithm is derived and is shown to be $O(1/\sqrt(k))$; (iv) finally, the expected variance of the gradient estimates obtained using weak derivatives is shown to be lower than those obtained using the popular score-function approach. Experiments on OpenAI gym pendulum environment show superior performance of the proposed algorithm.

preprint2020arXiv

Quickest Change Detection of Time Inconsistent Anticipatory Agents. Human-Sensor and Cyber-Physical Systems

In behavioral economics, human decision makers are modeled as anticipatory agents that make decisions by taking into account the probability of future decisions (plans). We consider cyber-physical systems involving the interaction between anticipatory agents and statistical detection. A sensing device records the decisions of an anticipatory agent. Given these decisions, how can the sensing device achieve quickest detection of a change in the anticipatory system? From a decision theoretic point of view, anticipatory models are time inconsistent meaning that Bellman's principle of optimality does not hold. The appropriate formalism is the subgame Nash equilibrium. We show that the interaction between anticipatory agents and sequential quickest detection results in unusual (nonconvex) structure of the quickest change detection policy. Our methodology yields a useful framework for situation awareness systems and anticipatory human decision makers interacting with sequential detectors.

preprint2020arXiv

Rationally Inattentive Inverse Reinforcement Learning Explains YouTube Commenting Behavior

We consider a novel application of inverse reinforcement learning with behavioral economics constraints to model, learn and predict the commenting behavior of YouTube viewers. Each group of users is modeled as a rationally inattentive Bayesian agent which solves a contextual bandit problem. Our methodology integrates three key components. First, to identify distinct commenting patterns, we use deep embedded clustering to estimate framing information (essential extrinsic features) that clusters users into distinct groups.Second, we present an inverse reinforcement learning algorithm that uses Bayesian revealed preferences to test for rationality: does there exist a utility function that rationalizes the given data, and if yes, can it be used to predict commenting behavior? Finally, we impose behavioral economics constraints stemming from rational inattention to characterize the attention span of groups of users. The test imposes a R{é}nyi mutual information cost constraint which impacts how the agent can select attention strategies to maximize their expected utility. After a careful analysis of a massive YouTube dataset, our surprising result is that in most YouTube user groups, the commenting behavior is consistent with optimizing a Bayesian utility with rationally inattentive constraints. The paper also highlights how the rational inattention model can accurately predict commenting behavior. The massive YouTube dataset and analysis used in this paper are available on GitHub and completely reproducible.

preprint2019arXiv

Friendship Paradox Biases Perceptions in Directed Networks

How popular a topic or an opinion appears to be in a network can be very different from its actual popularity. For example, in an online network of a social media platform, the number of people who mention a topic in their posts---i.e., its global popularity---can be dramatically different from how people see it in their social feeds---i.e., its perceived popularity---where the feeds aggregate their friends' posts. We trace the origin of this discrepancy to the friendship paradox in directed networks, which states that people are less popular than their friends (or followers) are, on average. We identify conditions on network structure that give rise to this perception bias, and validate the findings empirically using data from Twitter. Within messages posted by Twitter users in our sample, we identify topics that appear more frequently within the users' social feeds, than they do globally, i.e., among all posts. In addition, we present a polling algorithm that leverages the friendship paradox to obtain a statistically efficient estimate of a topic's global prevalence from biased perceptions of individuals. We characterize the bias of the polling estimate, provide an upper bound for its variance, and validate the algorithm's efficiency through synthetic polling experiments on our Twitter data. Our paper elucidates the non-intuitive ways in which the structure of directed networks can distort social perceptions and resulting behaviors.

preprint2019arXiv

How to Calibrate your Adversary's Capabilities? Inverse Filtering for Counter-Autonomous Systems

We consider an adversarial Bayesian signal processing problem involving "us" and an "adversary". The adversary observes our state in noise; updates its posterior distribution of the state and then chooses an action based on this posterior. Given knowledge of "our" state and sequence of adversary's actions observed in noise, we consider three problems: (i) How can the adversary's posterior distribution be estimated? Estimating the posterior is an inverse filtering problem involving a random measure - we formulate and solve several versions of this problem in a Bayesian setting. (ii) How can the adversary's observation likelihood be estimated? This tells us how accurate the adversary's sensors are. We compute the maximum likelihood estimator for the adversary's observation likelihood given our measurements of the adversary's actions where the adversary's actions are in response to estimating our state. (iii) How can the state be chosen by us to minimize the covariance of the estimate of the adversary's observation likelihood? "Our" state can be viewed as a probe signal which causes the adversary to act; so choosing the optimal state sequence is an input design problem. The above questions are motivated by the design of counter-autonomous systems: given measurements of the actions of a sophisticated autonomous adversary, how can our counter-autonomous system estimate the underlying belief of the adversary, predict future actions and therefore guard against these actions.