Source author record

Balaraman Ravindran

Balaraman Ravindran appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Computation and Language Computer Vision Neural and Evolutionary Computing Social and Information Networks Robotics Computer Science and Game Theory physics.soc-ph cs.CY Multiagent Systems

Catalog footprint

What is connected

37works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Evolutionary Approach to Security Games with Signaling

Green Security Games have become a popular way to model scenarios involving the protection of natural resources, such as wildlife. Sensors (e.g. drones equipped with cameras) have also begun to play a role in these scenarios by providing real-time information. Incorporating both human and sensor defender resources strategically is the subject of recent work on Security Games with Signaling (SGS). However, current methods to solve SGS do not scale well in terms of time or memory. We therefore propose a novel approach to SGS, which, for the first time in this domain, employs an Evolutionary Computation paradigm: EASGS. EASGS effectively searches the huge SGS solution space via suitable solution encoding in a chromosome and a specially-designed set of operators. The operators include three types of mutations, each focusing on a particular aspect of the SGS solution, optimized crossover and a local coverage improvement scheme (a memetic aspect of EASGS). We also introduce a new set of benchmark games, based on dense or locally-dense graphs that reflect real-world SGS settings. In the majority of 342 test game instances, EASGS outperforms state-of-the-art methods, including a reinforcement learning method, in terms of time scalability, nearly constant memory utilization, and quality of the returned defender's strategies (expected payoffs).

preprint2022arXiv

GrabQC: Graph based Query Contextualization for automated ICD coding

Automated medical coding is a process of codifying clinical notes to appropriate diagnosis and procedure codes automatically from the standard taxonomies such as ICD (International Classification of Diseases) and CPT (Current Procedure Terminology). The manual coding process involves the identification of entities from the clinical notes followed by querying a commercial or non-commercial medical codes Information Retrieval (IR) system that follows the Centre for Medicare and Medicaid Services (CMS) guidelines. We propose to automate this manual process by automatically constructing a query for the IR system using the entities auto-extracted from the clinical notes. We propose \textbf{GrabQC}, a \textbf{Gra}ph \textbf{b}ased \textbf{Q}uery \textbf{C}ontextualization method that automatically extracts queries from the clinical text, contextualizes the queries using a Graph Neural Network (GNN) model and obtains the ICD Codes using an external IR system. We also propose a method for labelling the dataset for training the model. We perform experiments on two datasets of clinical text in three different setups to assert the effectiveness of our approach. The experimental results show that our proposed method is better than the compared baselines in all three settings.

preprint2022arXiv

Multi-Variate Time Series Forecasting on Variable Subsets

We formulate a new inference task in the domain of multivariate time series forecasting (MTSF), called Variable Subset Forecast (VSF), where only a small subset of the variables is available during inference. Variables are absent during inference because of long-term data loss (eg. sensor failures) or high -> low-resource domain shift between train / test. To the best of our knowledge, robustness of MTSF models in presence of such failures, has not been studied in the literature. Through extensive evaluation, we first show that the performance of state of the art methods degrade significantly in the VSF setting. We propose a non-parametric, wrapper technique that can be applied on top any existing forecast models. Through systematic experiments across 4 datasets and 5 forecast models, we show that our technique is able to recover close to 95\% performance of the models even when only 15\% of the original variables are present.

preprint2022arXiv

TAG: Task-based Accumulated Gradients for Lifelong learning

When an agent encounters a continual stream of new tasks in the lifelong learning setting, it leverages the knowledge it gained from the earlier tasks to help learn the new tasks better. In such a scenario, identifying an efficient knowledge representation becomes a challenging problem. Most research works propose to either store a subset of examples from the past tasks in a replay buffer, dedicate a separate set of parameters to each task or penalize excessive updates over parameters by introducing a regularization term. While existing methods employ the general task-agnostic stochastic gradient descent update rule, we propose a task-aware optimizer that adapts the learning rate based on the relatedness among tasks. We utilize the directions taken by the parameters during the updates by accumulating the gradients specific to each task. These task-based accumulated gradients act as a knowledge base that is maintained and updated throughout the stream. We empirically show that our proposed adaptive learning rate not only accounts for catastrophic forgetting but also allows positive backward transfer. We also show that our method performs better than several state-of-the-art methods in lifelong learning on complex datasets with a large number of tasks.

preprint2021arXiv

Hyperedge Prediction using Tensor Eigenvalue Decomposition

Link prediction in graphs is studied by modeling the dyadic interactions among two nodes. The relationships can be more complex than simple dyadic interactions and could require the user to model super-dyadic associations among nodes. Such interactions can be modeled using a hypergraph, which is a generalization of a graph where a hyperedge can connect more than two nodes. In this work, we consider the problem of hyperedge prediction in a $k-$uniform hypergraph. We utilize the tensor-based representation of hypergraphs and propose a novel interpretation of the tensor eigenvectors. This is further used to propose a hyperedge prediction algorithm. The proposed algorithm utilizes the \textit{Fiedler} eigenvector computed using tensor eigenvalue decomposition of hypergraph Laplacian. The \textit{Fiedler} eigenvector is used to evaluate the construction cost of new hyperedges, which is further utilized to determine the most probable hyperedges to be constructed. The functioning and efficacy of the proposed method are illustrated using some example hypergraphs and a few real datasets. The code for the proposed method is available on https://github.com/d-maurya/hypred_ tensorEVD

preprint2021arXiv

Neural Fitted Q Iteration based Optimal Bidding Strategy in Real Time Reactive Power Market_1

In real time electricity markets, the objective of generation companies while bidding is to maximize their profit. The strategies for learning optimal bidding have been formulated through game theoretical approaches and stochastic optimization problems. Similar studies in reactive power markets have not been reported so far because the network voltage operating conditions have an increased impact on reactive power markets than on active power markets. Contrary to active power markets, the bids of rivals are not directly related to fuel costs in reactive power markets. Hence, the assumption of a suitable probability distribution function is unrealistic, making the strategies adopted in active power markets unsuitable for learning optimal bids in reactive power market mechanisms. Therefore, a bidding strategy is to be learnt from market observations and experience in imperfect oligopolistic competition-based markets. In this paper, a pioneer work on learning optimal bidding strategies from observation and experience in a three-stage reactive power market is reported.

preprint2021arXiv

qRRT: Quality-Biased Incremental RRT for Optimal Motion Planning in Non-Holonomic Systems

This paper presents a sampling-based method for optimal motion planning in non-holonomic systems in the absence of known cost functions. It uses the principle of learning through experience to deduce the cost-to-go of regions within the workspace. This cost information is used to bias an incremental graph-based search algorithm that produces solution trajectories. Iterative improvement of cost information and search biasing produces solutions that are proven to be asymptotically optimal. The proposed framework builds on incremental Rapidly-exploring Random Trees (RRT) for random sampling-based search and Reinforcement Learning (RL) to learn workspace costs. A series of experiments were performed to evaluate and demonstrate the performance of the proposed method.

preprint2020arXiv

Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources in the same domain

Transferring knowledge from prior source tasks in solving a new target task can be useful in several learning applications. The application of transfer poses two serious challenges which have not been adequately addressed. First, the agent should be able to avoid negative transfer, which happens when the transfer hampers or slows down the learning instead of helping it. Second, the agent should be able to selectively transfer, which is the ability to select and transfer from different and multiple source tasks for different parts of the state space of the target task. We propose A2T (Attend, Adapt and Transfer), an attentive deep architecture which adapts and transfers from these source tasks. Our model is generic enough to effect transfer of either policies or value functions. Empirical evaluations on different learning algorithms show that A2T is an effective architecture for transfer by being able to avoid negative transfer while transferring selectively from multiple source tasks in the same domain.

preprint2020arXiv

Dynamic Frame skip Deep Q Network

Deep Reinforcement Learning methods have achieved state of the art performance in learning control policies for the games in the Atari 2600 domain. One of the important parameters in the Arcade Learning Environment (ALE) is the frame skip rate. It decides the granularity at which agents can control game play. A frame skip value of $k$ allows the agent to repeat a selected action $k$ number of times. The current state of the art architectures like Deep Q-Network (DQN) and Dueling Network Architectures (DuDQN) consist of a framework with a static frame skip rate, where the action output from the network is repeated for a fixed number of frames regardless of the current state. In this paper, we propose a new architecture, Dynamic Frame skip Deep Q-Network (DFDQN) which makes the frame skip rate a dynamic learnable parameter. This allows us to choose the number of times an action is to be repeated based on the current state. We show empirically that such a setting improves the performance on relatively harder games like Seaquest.

preprint2020arXiv

EMPIR: Ensembles of Mixed Precision Deep Networks for Increased Robustness against Adversarial Attacks

Ensuring robustness of Deep Neural Networks (DNNs) is crucial to their adoption in safety-critical applications such as self-driving cars, drones, and healthcare. Notably, DNNs are vulnerable to adversarial attacks in which small input perturbations can produce catastrophic misclassifications. In this work, we propose EMPIR, ensembles of quantized DNN models with different numerical precisions, as a new approach to increase robustness against adversarial attacks. EMPIR is based on the observation that quantized neural networks often demonstrate much higher robustness to adversarial attacks than full precision networks, but at the cost of a substantial loss in accuracy on the original (unperturbed) inputs. EMPIR overcomes this limitation to achieve the 'best of both worlds', i.e., the higher unperturbed accuracies of the full precision models combined with the higher robustness of the low precision models, by composing them in an ensemble. Further, as low precision DNN models have significantly lower computational and storage requirements than full precision models, EMPIR models only incur modest compute and memory overheads compared to a single full-precision model (<25% in our evaluations). We evaluate EMPIR across a suite of DNNs for 3 different image recognition tasks (MNIST, CIFAR-10 and ImageNet) and under 4 different adversarial attacks. Our results indicate that EMPIR boosts the average adversarial accuracies by 42.6%, 15.2% and 10.5% for the DNN models trained on the MNIST, CIFAR-10 and ImageNet datasets respectively, when compared to single full-precision models, without sacrificing accuracy on the unperturbed inputs.

preprint2020arXiv

ExTra: Transfer-guided Exploration

In this work we present a novel approach for transfer-guided exploration in reinforcement learning that is inspired by the human tendency to leverage experiences from similar encounters in the past while navigating a new task. Given an optimal policy in a related task-environment, we show that its bisimulation distance from the current task-environment gives a lower bound on the optimal advantage of state-action pairs in the current task-environment. Transfer-guided Exploration (ExTra) samples actions from a Softmax distribution over these lower bounds. In this way, actions with potentially higher optimum advantage are sampled more frequently. In our experiments on gridworld environments, we demonstrate that given access to an optimal policy in a related task-environment, ExTra can outperform popular domain-specific exploration strategies viz. epsilon greedy, Model-Based Interval Estimation - Exploration Bonus (MBIE-EB), Pursuit and Boltzmann in rate of convergence. We further show that ExTra is robust to choices of source task and shows a graceful degradation of performance as the dissimilarity of the source task increases. We also demonstrate that ExTra, when used alongside traditional exploration algorithms, improves their rate of convergence. Thus it is capable of complementing the efficacy of traditional exploration algorithms.

preprint2020arXiv

HPRA: Hyperedge Prediction using Resource Allocation

Many real-world systems involve higher-order interactions and thus demand complex models such as hypergraphs. For instance, a research article could have multiple collaborating authors, and therefore the co-authorship network is best represented as a hypergraph. In this work, we focus on the problem of hyperedge prediction. This problem has immense applications in multiple domains, such as predicting new collaborations in social networks, discovering new chemical reactions in metabolic networks, etc. Despite having significant importance, the problem of hyperedge prediction hasn't received adequate attention, mainly because of its inherent complexity. In a graph with $n$ nodes the number of potential edges is $\mathcal{O}(n^{2})$, whereas in a hypergraph, the number of potential hyperedges is $\mathcal{O}(2^{n})$. To avoid searching through such a huge space, current methods restrain the original problem in the following two ways. One class of algorithms assume the hypergraphs to be $k$-uniform. However, many real-world systems are not confined only to have interactions involving $k$ components. Thus, these algorithms are not suitable for many real-world applications. The second class of algorithms requires a candidate set of hyperedges from which the potential hyperedges are chosen. In the absence of domain knowledge, the candidate set can have $\mathcal{O}(2^{n})$ possible hyperedges, which makes this problem intractable. We propose HPRA - Hyperedge Prediction using Resource Allocation, the first of its kind algorithm, which overcomes these issues and predicts hyperedges of any cardinality without using any candidate hyperedge set. HPRA is a similarity-based method working on the principles of the resource allocation process. In addition to recovering missing hyperedges, we demonstrate that HPRA can predict future hyperedges in a wide range of hypergraphs.

preprint2020arXiv

Influence maximization in unknown social networks: Learning Policies for Effective Graph Sampling

A serious challenge when finding influential actors in real-world social networks is the lack of knowledge about the structure of the underlying network. Current state-of-the-art methods rely on hand-crafted sampling algorithms; these methods sample nodes and their neighbours in a carefully constructed order and choose opinion leaders from this discovered network to maximize influence spread in the (unknown) complete network. In this work, we propose a reinforcement learning framework for network discovery that automatically learns useful node and graph representations that encode important structural properties of the network. At training time, the method identifies portions of the network such that the nodes selected from this sampled subgraph can effectively influence nodes in the complete network. The realization of such transferable network structure based adaptable policies is attributed to the meticulous design of the framework that encodes relevant node and graph signatures driven by an appropriate reward scheme. We experiment with real-world social networks from four different domains and show that the policies learned by our RL agent provide a 10-36% improvement over the current state-of-the-art method.

preprint2020arXiv

Learning to Prevent Monocular SLAM Failure using Reinforcement Learning

Monocular SLAM refers to using a single camera to estimate robot ego motion while building a map of the environment. While Monocular SLAM is a well studied problem, automating Monocular SLAM by integrating it with trajectory planning frameworks is particularly challenging. This paper presents a novel formulation based on Reinforcement Learning (RL) that generates fail safe trajectories wherein the SLAM generated outputs do not deviate largely from their true values. Quintessentially, the RL framework successfully learns the otherwise complex relation between perceptual inputs and motor actions and uses this knowledge to generate trajectories that do not cause failure of SLAM. We show systematically in simulations how the quality of the SLAM dramatically improves when trajectories are computed using RL. Our method scales effectively across Monocular SLAM frameworks in both simulation and in real world experiments with a mobile robot.

preprint2020arXiv

Learning to Prevent Monocular SLAM Failure using Reinforcement Learning

preprint2020arXiv

Learning to Repeat: Fine Grained Action Repetition for Deep Reinforcement Learning

Reinforcement Learning algorithms can learn complex behavioral patterns for sequential decision making tasks wherein an agent interacts with an environment and acquires feedback in the form of rewards sampled from it. Traditionally, such algorithms make decisions, i.e., select actions to execute, at every single time step of the agent-environment interactions. In this paper, we propose a novel framework, Fine Grained Action Repetition (FiGAR), which enables the agent to decide the action as well as the time scale of repeating it. FiGAR can be used for improving any Deep Reinforcement Learning algorithm which maintains an explicit policy estimate by enabling temporal abstractions in the action space. We empirically demonstrate the efficacy of our framework by showing performance improvements on top of three policy search algorithms in different domains: Asynchronous Advantage Actor Critic in the Atari 2600 domain, Trust Region Policy Optimization in Mujoco domain and Deep Deterministic Policy Gradients in the TORCS car racing domain.

preprint2020arXiv

Missed calls, Automated Calls and Health Support: Using AI to improve maternal health outcomes by increasing program engagement

India accounts for 11% of maternal deaths globally where a woman dies in childbirth every fifteen minutes. Lack of access to preventive care information is a significant problem contributing to high maternal morbidity and mortality numbers, especially in low-income households. We work with ARMMAN, a non-profit based in India, to further the use of call-based information programs by early-on identifying women who might not engage on these programs that are proven to affect health parameters positively.We analyzed anonymized call-records of over 300,000 women registered in an awareness program created by ARMMAN that uses cellphone calls to regularly disseminate health related information. We built robust deep learning based models to predict short term and long term dropout risk from call logs and beneficiaries' demographic information. Our model performs 13% better than competitive baselines for short-term forecasting and 7% better for long term forecasting. We also discuss the applicability of this method in the real world through a pilot validation that uses our method to perform targeted interventions.

preprint2020arXiv

On Incorporating Structural Information to improve Dialogue Response Generation

We consider the task of generating dialogue responses from background knowledge comprising of domain specific resources. Specifically, given a conversation around a movie, the task is to generate the next response based on background knowledge about the movie such as the plot, review, Reddit comments etc. This requires capturing structural, sequential and semantic information from the conversation context and the background resources. This is a new task and has not received much attention from the community. We propose a new architecture that uses the ability of BERT to capture deep contextualized representations in conjunction with explicit structure and sequence information. More specifically, we use (i) Graph Convolutional Networks (GCNs) to capture structural information, (ii) LSTMs to capture sequential information and (iii) BERT for the deep contextualized representations that capture semantic information. We analyze the proposed architecture extensively. To this end, we propose a plug-and-play Semantics-Sequences-Structures (SSS) framework which allows us to effectively combine such linguistic information. Through a series of experiments we make some interesting observations. First, we observe that the popular adaptation of the GCN model for NLP tasks where structural information (GCNs) was added on top of sequential information (LSTMs) performs poorly on our task. This leads us to explore interesting ways of combining semantic and structural information to improve the performance. Second, we observe that while BERT already outperforms other deep contextualized representations such as ELMo, it still benefits from the additional structural information explicitly added using GCNs. This is a bit surprising given the recent claims that BERT already captures structural information. Lastly, the proposed SSS framework gives an improvement of 7.95% over the baseline.

preprint2020arXiv

Option Discovery in Hierarchical Reinforcement Learning using Spatio-Temporal Clustering

This paper introduces an automated skill acquisition framework in reinforcement learning which involves identifying a hierarchical description of the given task in terms of abstract states and extended actions between abstract states. Identifying such structures present in the task provides ways to simplify and speed up reinforcement learning algorithms. These structures also help to generalize such algorithms over multiple tasks without relearning policies from scratch. We use ideas from dynamical systems to find metastable regions in the state space and associate them with abstract states. The spectral clustering algorithm PCCA+ is used to identify suitable abstractions aligned to the underlying structure. Skills are defined in terms of the sequence of actions that lead to transitions between such abstract states. The connectivity information from PCCA+ is used to generate these skills or options. These skills are independent of the learning task and can be efficiently reused across a variety of tasks defined over the same model. This approach works well even without the exact model of the environment by using sample trajectories to construct an approximate estimate. We also present our approach to scaling the skill acquisition framework to complex tasks with large state spaces for which we perform state aggregation using the representation learned from an action conditional video prediction network and use the skill acquisition framework on the aggregated state space.

preprint2020arXiv

Option Encoder: A Framework for Discovering a Policy Basis in Reinforcement Learning

Option discovery and skill acquisition frameworks are integral to the functioning of a Hierarchically organized Reinforcement learning agent. However, such techniques often yield a large number of options or skills, which can potentially be represented succinctly by filtering out any redundant information. Such a reduction can reduce the required computation while also improving the performance on a target task. In order to compress an array of option policies, we attempt to find a policy basis that accurately captures the set of all options. In this work, we propose Option Encoder, an auto-encoder based framework with intelligently constrained weights, that helps discover a collection of basis policies. The policy basis can be used as a proxy for the original set of skills in a suitable hierarchically organized framework. We demonstrate the efficacy of our method on a collection of grid-worlds and on the high-dimensional Fetch-Reach robotic manipulation task by evaluating the obtained policy basis on a set of downstream tasks.

preprint2020arXiv

Reinforcement Learning for Improving Object Detection

The performance of a trained object detection neural network depends a lot on the image quality. Generally, images are pre-processed before feeding them into the neural network and domain knowledge about the image dataset is used to choose the pre-processing techniques. In this paper, we introduce an algorithm called ObjectRL to choose the amount of a particular pre-processing to be applied to improve the object detection performances of pre-trained networks. The main motivation for ObjectRL is that an image which looks good to a human eye may not necessarily be the optimal one for a pre-trained object detector to detect objects.

preprint2020arXiv

Reinforcement Learning for Multi-Product Multi-Node Inventory Management in Supply Chains

This paper describes the application of reinforcement learning (RL) to multi-product inventory management in supply chains. The problem description and solution are both adapted from a real-world business solution. The novelty of this problem with respect to supply chain literature is (i) we consider concurrent inventory management of a large number (50 to 1000) of products with shared capacity, (ii) we consider a multi-node supply chain consisting of a warehouse which supplies three stores, (iii) the warehouse, stores, and transportation from warehouse to stores have finite capacities, (iv) warehouse and store replenishment happen at different time scales and with realistic time lags, and (v) demand for products at the stores is stochastic. We describe a novel formulation in a multi-agent (hierarchical) reinforcement learning framework that can be used for parallelised decision-making, and use the advantage actor critic (A2C) algorithm with quantised action spaces to solve the problem. Experiments show that the proposed approach is able to handle a multi-objective reward comprised of maximising product sales and minimising wastage of perishable products.

preprint2020arXiv

Towards Accurate Vehicle Behaviour Classification With Multi-Relational Graph Convolutional Networks

Understanding on-road vehicle behaviour from a temporal sequence of sensor data is gaining in popularity. In this paper, we propose a pipeline for understanding vehicle behaviour from a monocular image sequence or video. A monocular sequence along with scene semantics, optical flow and object labels are used to get spatial information about the object (vehicle) of interest and other objects (semantically contiguous set of locations) in the scene. This spatial information is encoded by a Multi-Relational Graph Convolutional Network (MR-GCN), and a temporal sequence of such encodings is fed to a recurrent network to label vehicle behaviours. The proposed framework can classify a variety of vehicle behaviours to high fidelity on datasets that are diverse and include European, Chinese and Indian on-road scenes. The framework also provides for seamless transfer of models across datasets without entailing re-annotation, retraining and even fine-tuning. We show comparative performance gain over baseline Spatio-temporal classifiers and detail a variety of ablations to showcase the efficacy of the framework.

preprint2020arXiv

Towards Transparent and Explainable Attention Models

Recent studies on interpretability of attention distributions have led to notions of faithful and plausible explanations for a model's predictions. Attention distributions can be considered a faithful explanation if a higher attention weight implies a greater impact on the model's prediction. They can be considered a plausible explanation if they provide a human-understandable justification for the model's predictions. In this work, we first explain why current attention mechanisms in LSTM based encoders can neither provide a faithful nor a plausible explanation of the model's predictions. We observe that in LSTM based encoders the hidden representations at different time-steps are very similar to each other (high conicity) and attention weights in these situations do not carry much meaning because even a random permutation of the attention weights does not affect the model's predictions. Based on experiments on a wide variety of tasks and datasets, we observe attention distributions often attribute the model's predictions to unimportant words such as punctuation and fail to offer a plausible explanation for the predictions. To make attention mechanisms more faithful and plausible, we propose a modified LSTM cell with a diversity-driven training objective that ensures that the hidden representations learned at different time steps are diverse. We show that the resulting attention distributions offer more transparency as they (i) provide a more precise importance ranking of the hidden states (ii) are better indicative of words important for the model's predictions (iii) correlate better with gradient-based attribution methods. Human evaluations indicate that the attention distributions learned by our model offer a plausible explanation of the model's predictions. Our code has been made publicly available at https://github.com/akashkm99/Interpretable-Attention

preprint2020arXiv

Understanding Dynamic Scenes using Graph Convolution Networks

We present a novel Multi-Relational Graph Convolutional Network (MRGCN) based framework to model on-road vehicle behaviors from a sequence of temporally ordered frames as grabbed by a moving monocular camera. The input to MRGCN is a multi-relational graph where the graph's nodes represent the active and passive agents/objects in the scene, and the bidirectional edges that connect every pair of nodes are encodings of their Spatio-temporal relations. We show that this proposed explicit encoding and usage of an intermediate spatio-temporal interaction graph to be well suited for our tasks over learning end-end directly on a set of temporally ordered spatial relations. We also propose an attention mechanism for MRGCNs that conditioned on the scene dynamically scores the importance of information from different interaction types. The proposed framework achieves significant performance gain over prior methods on vehicle-behavior classification tasks on four datasets. We also show a seamless transfer of learning to multiple datasets without resorting to fine-tuning. Such behavior prediction methods find immediate relevance in a variety of navigation tasks such as behavior planning, state estimation, and applications relating to the detection of traffic violations over videos.

preprint2019arXiv

Interpretability with Accurate Small Models

Models often need to be constrained to a certain size for them to be considered interpretable. For example, a decision tree of depth 5 is much easier to understand than one of depth 50. Limiting model size, however, often reduces accuracy. We suggest a practical technique that minimizes this trade-off between interpretability and classification accuracy. This enables an arbitrary learning algorithm to produce highly accurate small-sized models. Our technique identifies the training data distribution to learn from that leads to the highest accuracy for a model of a given size. We represent the training distribution as a combination of sampling schemes. Each scheme is defined by a parameterized probability mass function applied to the segmentation produced by a decision tree. An Infinite Mixture Model with Beta components is used to represent a combination of such schemes. The mixture model parameters are learned using Bayesian Optimization. Under simplistic assumptions, we would need to optimize for $O(d)$ variables for a distribution over a $d$-dimensional input space, which is cumbersome for most real-world data. However, we show that our technique significantly reduces this number to a \emph{fixed set of eight variables} at the cost of relatively cheap preprocessing. The proposed technique is flexible: it is \emph{model-agnostic}, i.e., it may be applied to the learning algorithm for any model family, and it admits a general notion of model size. We demonstrate its effectiveness using multiple real-world datasets to construct decision trees, linear probability models and gradient boosted models with different sizes. We observe significant improvements in the F1-score in most instances, exceeding an improvement of $100\%$ in some cases.

preprint2016arXiv

Bridge Correlational Neural Networks for Multilingual Multimodal Representation Learning

Recently there has been a lot of interest in learning common representations for multiple views of data. Typically, such common representations are learned using a parallel corpus between the two views (say, 1M images and their English captions). In this work, we address a real-world scenario where no direct parallel data is available between two views of interest (say, $V_1$ and $V_2$) but parallel data is available between each of these views and a pivot view ($V_3$). We propose a model for learning a common representation for $V_1$, $V_2$ and $V_3$ using only the parallel data available between $V_1V_3$ and $V_2V_3$. The proposed model is generic and even works when there are $n$ views of interest and only one pivot view which acts as a bridge between them. There are two specific downstream applications that we focus on (i) transfer learning between languages $L_1$,$L_2$,...,$L_n$ using a pivot language $L$ and (ii) cross modal access between images and a language $L_1$ using a pivot language $L_2$. Our model achieves state-of-the-art performance in multilingual document classification on the publicly available multilingual TED corpus and promising results in multilingual multimodal retrieval on a new dataset created and released as a part of this work.

preprint2016arXiv

Exploration for Multi-task Reinforcement Learning with Deep Generative Models

Exploration in multi-task reinforcement learning is critical in training agents to deduce the underlying MDP. Many of the existing exploration frameworks such as $E^3$, $R_{max}$, Thompson sampling assume a single stationary MDP and are not suitable for system identification in the multi-task setting. We present a novel method to facilitate exploration in multi-task reinforcement learning using deep generative models. We supplement our method with a low dimensional energy model to learn the underlying MDP distribution and provide a resilient and adaptive exploration signal to the agent. We evaluate our method on a new set of environments and provide intuitive interpretation of our results.

preprint2016arXiv

HEMI: Hyperedge Majority Influence Maximization

In this work, we consider the problem of influence maximization on a hypergraph. We first extend the Independent Cascade (IC) model to hypergraphs, and prove that the traditional influence maximization problem remains submodular. We then present a variant of the influence maximization problem (HEMI) where one seeks to maximize the number of hyperedges, a majority of whose nodes are influenced. We prove that HEMI is non-submodular under the diffusion model proposed.

preprint2016arXiv

Linear Bandit algorithms using the Bootstrap

This study presents two new algorithms for solving linear stochastic bandit problems. The proposed methods use an approach from non-parametric statistics called bootstrapping to create confidence bounds. This is achieved without making any assumptions about the distribution of noise in the underlying system. We present the X-Random and X-Fixed bootstrap bandits which correspond to the two well-known approaches for conducting bootstraps on models, in the literature. The proposed methods are compared to other popular solutions for linear stochastic bandit problems, namely, OFUL, LinUCB and Thompson Sampling. The comparisons are carried out using a simulation study on a hierarchical probability meta-model, built from published data of experiments, which are run on real systems. The model representing the response surfaces is conceptualized as a Bayesian Network which is presented with varying degrees of noise for the simulations. One of the proposed methods, X-Random bootstrap, performs better than the baselines in-terms of cumulative regret across various degrees of noise and different number of trials. In certain settings the cumulative regret of this method is less than half of the best baseline. The X-Fixed bootstrap performs comparably in most situations and particularly well when the number of trials is low. The study concludes that these algorithms could be a preferred alternative for solving linear bandit problems, especially when the distribution of the noise in the system is unknown.

preprint2015arXiv

A Reinforcement Learning Approach to Online Learning of Decision Trees

Online decision tree learning algorithms typically examine all features of a new data point to update model parameters. We propose a novel alternative, Reinforcement Learning- based Decision Trees (RLDT), that uses Reinforcement Learning (RL) to actively examine a minimal number of features of a data point to classify it with high accuracy. Furthermore, RLDT optimizes a long term return, providing a better alternative to the traditional myopic greedy approach to growing decision trees. We demonstrate that this approach performs as well as batch learning algorithms and other online decision tree learning algorithms, while making significantly fewer queries about the features of the data points. We also show that RLDT can effectively handle concept drift.

preprint2015arXiv

Correlational Neural Networks

Common Representation Learning (CRL), wherein different descriptions (or views) of the data are embedded in a common subspace, is receiving a lot of attention recently. Two popular paradigms here are Canonical Correlation Analysis (CCA) based approaches and Autoencoder (AE) based approaches. CCA based approaches learn a joint representation by maximizing correlation of the views when projected to the common subspace. AE based methods learn a common representation by minimizing the error of reconstructing the two views. Each of these approaches has its own advantages and disadvantages. For example, while CCA based approaches outperform AE based approaches for the task of transfer learning, they are not as scalable as the latter. In this work we propose an AE based approach called Correlational Neural Network (CorrNet), that explicitly maximizes correlation among the views when projected to the common subspace. Through a series of experiments, we demonstrate that the proposed CorrNet is better than the above mentioned approaches with respect to its ability to learn correlated common representations. Further, we employ CorrNet for several cross language tasks and show that the representations learned using CorrNet perform better than the ones learned using other state of the art approaches.

preprint2015arXiv

Scalable Positional Analysis for Studying Evolution of Nodes in Networks

In social network analysis, the fundamental idea behind the notion of position is to discover actors who have similar structural signatures. Positional analysis of social networks involves partitioning the actors into disjoint sets using a notion of equivalence which captures the structure of relationships among actors. Classical approaches to Positional Analysis, such as Regular equivalence and Equitable Partitions, are too strict in grouping actors and often lead to trivial partitioning of actors in real world networks. An Epsilon Equitable Partition (EEP) of a graph, which is similar in spirit to Stochastic Blockmodels, is a useful relaxation to the notion of structural equivalence which results in meaningful partitioning of actors. In this paper we propose and implement a new scalable distributed algorithm based on MapReduce methodology to find EEP of a graph. Empirical studies on random power-law graphs show that our algorithm is highly scalable for sparse graphs, thereby giving us the ability to study positional analysis on very large scale networks. We also present the results of our algorithm on time evolving snapshots of the facebook and flickr social graphs. Results show the importance of positional analysis on large dynamic networks.

preprint2015arXiv

TSEB: More Efficient Thompson Sampling for Policy Learning

In model-based solution approaches to the problem of learning in an unknown environment, exploring to learn the model parameters takes a toll on the regret. The optimal performance with respect to regret or PAC bounds is achievable, if the algorithm exploits with respect to reward or explores with respect to the model parameters, respectively. In this paper, we propose TSEB, a Thompson Sampling based algorithm with adaptive exploration bonus that aims to solve the problem with tighter PAC guarantees, while being cautious on the regret as well. The proposed approach maintains distributions over the model parameters which are successively refined with more experience. At any given time, the agent solves a model sampled from this distribution, and the sampled reward distribution is skewed by an exploration bonus in order to generate more informative exploration. The policy by solving is then used for generating more experience that helps in updating the posterior over the model parameters. We provide a detailed analysis of the PAC guarantees, and convergence of the proposed approach. We show that our adaptive exploration bonus encourages the additional exploration required for better PAC bounds on the algorithm. We provide empirical analysis on two different simulated domains.

preprint2014arXiv

An Autoencoder Approach to Learning Bilingual Word Representations

Cross-language learning allows us to use training data from one language to build models for a different language. Many approaches to bilingual learning require that we have word-level alignment of sentences from parallel corpora. In this work we explore the use of autoencoder-based methods for cross-language learning of vectorial word representations that are aligned between two languages, while not relying on word-level alignments. We show that by simply learning to reconstruct the bag-of-words representations of aligned sentences, within and between languages, we can in fact learn high-quality representations and do without word alignments. Since training autoencoders on word observations presents certain computational issues, we propose and compare different variations adapted to this setting. We also propose an explicit correlation maximizing regularizer that leads to significant improvement in the performance. We empirically investigate the success of our approach on the problem of cross-language test classification, where a classifier trained on a given language (e.g., English) must learn to generalize to a different language (e.g., German). These experiments demonstrate that our approaches are competitive with the state-of-the-art, achieving up to 10-14 percentage point improvements over the best reported results on this task.

preprint2014arXiv

Efficient Computation of the Shapley Value for Game-Theoretic Network Centrality

The Shapley value---probably the most important normative payoff division scheme in coalitional games---has recently been advocated as a useful measure of centrality in networks. However, although this approach has a variety of real-world applications (including social and organisational networks, biological networks and communication networks), its computational properties have not been widely studied. To date, the only practicable approach to compute Shapley value-based centrality has been via Monte Carlo simulations which are computationally expensive and not guaranteed to give an exact answer. Against this background, this paper presents the first study of the computational aspects of the Shapley value for network centralities. Specifically, we develop exact analytical formulae for Shapley value-based centrality in both weighted and unweighted networks and develop efficient (polynomial time) and exact algorithms based on them. We empirically evaluate these algorithms on two real-life examples (an infrastructure network representing the topology of the Western States Power Grid and a collaboration network from the field of astrophysics) and demonstrate that they deliver significant speedups over the Monte Carlo approach. For instance, in the case of unweighted networks our algorithms are able to return the exact solution about 1600 times faster than the Monte Carlo approximation, even if we allow for a generous 10% error margin for the latter method.

preprint2012arXiv

Fractional Moments on Bandit Problems

Reinforcement learning addresses the dilemma between exploration to find profitable actions and exploitation to act according to the best observations already made. Bandit problems are one such class of problems in stateless environments that represent this explore/exploit situation. We propose a learning algorithm for bandit problems based on fractional expectation of rewards acquired. The algorithm is theoretically shown to converge on an eta-optimal arm and achieve O(n) sample complexity. Experimental results show the algorithm incurs substantially lower regrets than parameter-optimized eta-greedy and SoftMax approaches and other low sample complexity state-of-the-art techniques.

Balaraman Ravindran

What is connected

Connect this record

See the researcher in context

Building this map preview

37 published item(s)

Evolutionary Approach to Security Games with Signaling

GrabQC: Graph based Query Contextualization for automated ICD coding

Multi-Variate Time Series Forecasting on Variable Subsets

TAG: Task-based Accumulated Gradients for Lifelong learning

Hyperedge Prediction using Tensor Eigenvalue Decomposition

Neural Fitted Q Iteration based Optimal Bidding Strategy in Real Time Reactive Power Market_1

qRRT: Quality-Biased Incremental RRT for Optimal Motion Planning in Non-Holonomic Systems

Attend, Adapt and Transfer: Attentive Deep Architecture for Adaptive Transfer from multiple sources in the same domain

Dynamic Frame skip Deep Q Network

EMPIR: Ensembles of Mixed Precision Deep Networks for Increased Robustness against Adversarial Attacks

ExTra: Transfer-guided Exploration

HPRA: Hyperedge Prediction using Resource Allocation

Influence maximization in unknown social networks: Learning Policies for Effective Graph Sampling

Learning to Prevent Monocular SLAM Failure using Reinforcement Learning

Learning to Prevent Monocular SLAM Failure using Reinforcement Learning

Learning to Repeat: Fine Grained Action Repetition for Deep Reinforcement Learning

Missed calls, Automated Calls and Health Support: Using AI to improve maternal health outcomes by increasing program engagement

On Incorporating Structural Information to improve Dialogue Response Generation

Option Discovery in Hierarchical Reinforcement Learning using Spatio-Temporal Clustering

Option Encoder: A Framework for Discovering a Policy Basis in Reinforcement Learning

Reinforcement Learning for Improving Object Detection

Reinforcement Learning for Multi-Product Multi-Node Inventory Management in Supply Chains

Towards Accurate Vehicle Behaviour Classification With Multi-Relational Graph Convolutional Networks

Towards Transparent and Explainable Attention Models

Understanding Dynamic Scenes using Graph Convolution Networks

Interpretability with Accurate Small Models

Bridge Correlational Neural Networks for Multilingual Multimodal Representation Learning

Exploration for Multi-task Reinforcement Learning with Deep Generative Models

HEMI: Hyperedge Majority Influence Maximization

Linear Bandit algorithms using the Bootstrap

A Reinforcement Learning Approach to Online Learning of Decision Trees

Correlational Neural Networks

Scalable Positional Analysis for Studying Evolution of Nodes in Networks

TSEB: More Efficient Thompson Sampling for Policy Learning

An Autoencoder Approach to Learning Bilingual Word Representations

Efficient Computation of the Shapley Value for Game-Theoretic Network Centrality

Fractional Moments on Bandit Problems