Source author record

Mihaela van der Schaar

Mihaela van der Schaar appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

138works

24topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

CauSim: Scaling Causal Reasoning with Increasingly Complex Causal Simulators

Despite surpassing human performance across mathematics, coding, and other knowledge-intensive tasks, large language models (LLMs) continue to struggle with causal reasoning. A core obstacle is the target data itself: causal systems are complex and often expressed in non-executable forms, while ground-truth answers to causal queries are inherently scarce. We introduce CauSim, a framework that turns causal reasoning from a scarce-label problem into a scalable supervised one. CauSim constructs increasingly complex causal simulators: executable structural causal models (SCMs), incrementally built by LLMs, that scale to globally complex systems while maintaining verifiable answers to causal queries. CauSim operates across representations by formalizing non-executable causal knowledge into code, enabling data augmentation, and translating executable SCMs into natural language, enabling supervision in previously difficult-to-supervise representations. We structure our research into two parts: (1) how to construct increasingly complex causal simulators, and (2) a systematic study of what CauSim enables, demonstrating generalization across representations, consistent gains from curriculum scaling and data volume, LLM self-improvement through self-generated simulators, and data augmentation via formalization of existing domain knowledge.

preprint2026arXiv

Discovery of Hidden Miscalibration Regimes

Calibration is commonly evaluated by comparing model confidence with its empirical correctness, implicitly treating reliability as a function of the confidence score alone. However, this view can hide substantial structure: models may be systematically overconfident on some kinds of inputs and underconfident on others, causing global reliability diagnostics to obscure localised calibration failures. To address this, we formulate the problem of discovering hidden miscalibration regimes without assuming access to predefined data slices. We define the corresponding miscalibration field and propose a diagnostic framework for estimating it. Our approach learns a calibration-aware representation of the input space and estimates signed local miscalibration by kernel smoothing in the learned geometry. Across four real-world LLM benchmarks and twelve LLMs, we find that input-dependent calibration heterogeneity is prevalent. We further show that the discovered fields are actionable: they support local confidence correction and reduce calibration error in systematically miscalibrated regions where confidence-based methods such as isotonic regression and temperature scaling are less effective.

preprint2026arXiv

Skill Neologisms: Towards Skill-based Continual Learning

Modern LLMs show mastery over an ever-growing range of skills, as well as the ability to compose them flexibly. However, extending model capabilities to new skills in a scalable manner is an open problem: fine-tuning and parameter-efficient variants risk catastrophic forgetting, while context-based approaches have limited expressiveness and are constrained by the model's effective context. We explore skill neologisms--soft tokens integrated in the model's vocabulary and optimized to improve capabilities over a specific skill--as a way to selectively acquire new skills without weight updates. We first observe that pre-trained LLMs already exhibit tokens associated with procedural knowledge. We then show on a controlled synthetic task that skill neologisms can be learned to improve model capabilities on specific skills while being composable with out-of-distribution skills, and that independently trained skill neologisms can be composed zero-shot. Finally, we validate zero-shot composition of independently learned skill neologisms on the more realistic natural language setting of the Skill-Mix benchmark. These results suggest that skill neologisms may provide a scalable path towards skill-based continual learning.

preprint2026arXiv

The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving

State-of-the-art large language model (LLM) pipelines rely on bootstrapped reasoning loops: sampling diverse chains of thought and reinforcing the highest-scoring ones, mainly optimizing correctness. We analyze how this design choice is sensitive to the collapse of the model's distribution over reasoning paths, slashing semantic entropy and undermining creative problem-solving. To analyze this failure, we introduce Distributional Creative Reasoning (DCR), a unified variational objective that casts training as gradient flow through probability measures on solution traces. STaR, GRPO, and DPO, as well as entropy bonuses, and other methods, all constitute special cases of the same loss. The framework delivers three core results: (i) the diversity decay theorem, describing how correctness-based objectives lead to distinct modes of diversity decay for STaR, GRPO, and DPO; (ii) designs that ensure convergence to a stable and diverse policy, effectively preventing collapse; and (iii) simple, actionable recipes to achieve this in practice. DCR thus offers the first principled recipe for LLMs that remain both correct and creative.

preprint2026arXiv

TimeTok: Granularity-Controllable Time-Series Generation via Hierarchical Tokenization

Time-series generative models often lack control over temporal granularity, forcing users to accept whatever granularity the model produces. To enable truly user-driven generation, we introduce TimeTok, a unified framework for Granularity-Controllable Time-Series Generation (GC-TSG), which generates time series at any target granularity from any coarser input (e.g., rough sketches) or from scratch. At the core of TimeTok is a hierarchical tokenization strategy that maps time series into an ordered sequence of tokens, from coarse to fine temporal granularity. Our autoregressive generation process operates across these granularity levels, producing token blocks that are decoded back into continuous time series. This design naturally enables GC-TSG - including standard generation - within a single framework, where controlling the number of token blocks provides explicit control over output detail. Experiments show that TimeTok excels at GC-TSG tasks while achieving state-of-the-art performance in standard generation. Furthermore, we showcase TimeTok's potential as a foundational tokenizer by training on multiple datasets with heterogeneous temporal granularities, verifying strong transferability that consistently outperforms models trained on individual datasets. To our knowledge, this is the first unified framework that covers the full generative spectrum for time series, offering a valuable foundation for models that benefit from diverse temporal granularities.

preprint2023arXiv

Composite Feature Selection using Deep Ensembles

In many real world problems, features do not act alone but in combination with each other. For example, in genomics, diseases might not be caused by any single mutation but require the presence of multiple mutations. Prior work on feature selection either seeks to identify individual features or can only determine relevant groups from a predefined set. We investigate the problem of discovering groups of predictive features without predefined grouping. To do so, we define predictive groups in terms of linear and non-linear interactions between features. We introduce a novel deep learning architecture that uses an ensemble of feature selection models to find predictive groups, without requiring candidate groups to be provided. The selected groups are sparse and exhibit minimum overlap. Furthermore, we propose a new metric to measure similarity between discovered groups and the ground truth. We demonstrate the utility of our model on multiple synthetic tasks and semi-synthetic chemistry datasets, where the ground truth structure is known, as well as an image dataset and a real-world cancer dataset.

preprint2023arXiv

Deep Generative Symbolic Regression

Symbolic regression (SR) aims to discover concise closed-form mathematical equations from data, a task fundamental to scientific discovery. However, the problem is highly challenging because closed-form equations lie in a complex combinatorial search space. Existing methods, ranging from heuristic search to reinforcement learning, fail to scale with the number of input variables. We make the observation that closed-form equations often have structural characteristics and invariances (e.g., the commutative law) that could be further exploited to build more effective symbolic regression solutions. Motivated by this observation, our key contribution is to leverage pre-trained deep generative models to capture the intrinsic regularities of equations, thereby providing a solid foundation for subsequent optimization steps. We show that our novel formalism unifies several prominent approaches of symbolic regression and offers a new perspective to justify and improve on the previous ad hoc designs, such as the usage of cross-entropy loss during pre-training. Specifically, we propose an instantiation of our framework, Deep Generative Symbolic Regression (DGSR). In our experiments, we show that DGSR achieves a higher recovery rate of true equations in the setting of a larger number of input variables, and it is more computationally efficient at inference time than state-of-the-art RL symbolic regression solutions.

preprint2023arXiv

Synthcity: facilitating innovative use cases of synthetic data in different data modalities

Synthcity is an open-source software package for innovative use cases of synthetic data in ML fairness, privacy and augmentation across diverse tabular data modalities, including static data, regular and irregular time series, data with censoring, multi-source data, composite data, and more. Synthcity provides the practitioners with a single access point to cutting edge research and tools in synthetic data. It also offers the community a playground for rapid experimentation and prototyping, a one-stop-shop for SOTA benchmarks, and an opportunity for extending research impact. The library can be accessed on GitHub (https://github.com/vanderschaarlab/synthcity) and pip (https://pypi.org/project/synthcity/). We warmly invite the community to join the development effort by providing feedback, reporting bugs, and contributing code.

preprint2022arXiv

Accounting for Unobserved Confounding in Domain Generalization

This paper investigates the problem of learning robust, generalizable prediction models from a combination of multiple datasets and qualitative assumptions about the underlying data-generating model. Part of the challenge of learning robust models lies in the influence of unobserved confounders that void many of the invariances and principles of minimum error presently used for this problem. Our approach is to define a different invariance property of causal solutions in the presence of unobserved confounders which, through a relaxation of this invariance, can be connected with an explicit distributionally robust optimization problem over a set of affine combination of data distributions. Concretely, our objective takes the form of a standard loss, plus a regularization term that encourages partial equality of error derivatives with respect to model parameters. We demonstrate the empirical performance of our approach on healthcare data from different modalities, including image, speech and tabular data.

preprint2022arXiv

Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability

Estimating personalized effects of treatments is a complex, yet pervasive problem. To tackle it, recent developments in the machine learning (ML) literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools: due to their flexibility, modularity and ability to learn constrained representations, neural networks in particular have become central to this literature. Unfortunately, the assets of such black boxes come at a cost: models typically involve countless nontrivial operations, making it difficult to understand what they have learned. Yet, understanding these models can be crucial -- in a medical context, for example, discovered knowledge on treatment effect heterogeneity could inform treatment prescription in clinical practice. In this work, we therefore use post-hoc feature importance methods to identify features that influence the model's predictions. This allows us to evaluate treatment effect estimators along a new and important dimension that has been overlooked in previous work: We construct a benchmarking environment to empirically investigate the ability of personalized treatment effect models to identify predictive covariates -- covariates that determine differential responses to treatment. Our benchmarking environment then enables us to provide new insight into the strengths and weaknesses of different types of treatment effects models as we modulate different challenges specific to treatment effect estimation -- e.g. the ratio of prognostic to predictive information, the possible nonlinearity of potential outcomes and the presence and type of confounding.

preprint2022arXiv

Combining Observational and Randomized Data for Estimating Heterogeneous Treatment Effects

Estimating heterogeneous treatment effects is an important problem across many domains. In order to accurately estimate such treatment effects, one typically relies on data from observational studies or randomized experiments. Currently, most existing works rely exclusively on observational data, which is often confounded and, hence, yields biased estimates. While observational data is confounded, randomized data is unconfounded, but its sample size is usually too small to learn heterogeneous treatment effects. In this paper, we propose to estimate heterogeneous treatment effects by combining large amounts of observational data and small amounts of randomized data via representation learning. In particular, we introduce a two-step framework: first, we use observational data to learn a shared structure (in form of a representation); and then, we use randomized data to learn the data-specific structures. We analyze the finite sample properties of our framework and compare them to several natural baselines. As such, we derive conditions for when combining observational and randomized data is beneficial, and for when it is not. Based on this, we introduce a sample-efficient algorithm, called CorNet. We use extensive simulation studies to verify the theoretical properties of CorNet and multiple real-world datasets to demonstrate our method's superiority compared to existing methods.

preprint2022arXiv

Continuous-Time Modeling of Counterfactual Outcomes Using Neural Controlled Differential Equations

Estimating counterfactual outcomes over time has the potential to unlock personalized healthcare by assisting decision-makers to answer ''what-iF'' questions. Existing causal inference approaches typically consider regular, discrete-time intervals between observations and treatment decisions and hence are unable to naturally model irregularly sampled data, which is the common setting in practice. To handle arbitrary observation patterns, we interpret the data as samples from an underlying continuous-time process and propose to model its latent trajectory explicitly using the mathematics of controlled differential equations. This leads to a new approach, the Treatment Effect Neural Controlled Differential Equation (TE-CDE), that allows the potential outcomes to be evaluated at any time point. In addition, adversarial training is used to adjust for time-dependent confounding which is critical in longitudinal settings and is an added challenge not encountered in conventional time-series. To assess solutions to this problem, we propose a controllable simulation environment based on a model of tumor growth for a range of scenarios with irregular sampling reflective of a variety of clinical scenarios. TE-CDE consistently outperforms existing approaches in all simulated scenarios with irregular sampling.

preprint2022arXiv

DAPDAG: Domain Adaptation via Perturbed DAG Reconstruction

Leveraging labelled data from multiple domains to enable prediction in another domain without labels is a significant, yet challenging problem. To address this problem, we introduce the framework DAPDAG (\textbf{D}omain \textbf{A}daptation via \textbf{P}erturbed \textbf{DAG} Reconstruction) and propose to learn an auto-encoder that undertakes inference on population statistics given features and reconstructing a directed acyclic graph (DAG) as an auxiliary task. The underlying DAG structure is assumed invariant among observed variables whose conditional distributions are allowed to vary across domains led by a latent environmental variable $E$. The encoder is designed to serve as an inference device on $E$ while the decoder reconstructs each observed variable conditioned on its graphical parents in the DAG and the inferred $E$. We train the encoder and decoder jointly in an end-to-end manner and conduct experiments on synthetic and real datasets with mixed variables. Empirical results demonstrate that reconstructing the DAG benefits the approximate inference. Furthermore, our approach can achieve competitive performance against other benchmarks in prediction tasks, with better adaptation ability, especially in the target domain significantly different from the source domains.

preprint2022arXiv

Data-SUITE: Data-centric identification of in-distribution incongruous examples

Systematic quantification of data quality is critical for consistent model performance. Prior works have focused on out-of-distribution data. Instead, we tackle an understudied yet equally important problem of characterizing incongruous regions of in-distribution (ID) data, which may arise from feature space heterogeneity. To this end, we propose a paradigm shift with Data-SUITE: a data-centric AI framework to identify these regions, independent of a task-specific model. Data-SUITE leverages copula modeling, representation learning, and conformal prediction to build feature-wise confidence interval estimators based on a set of training instances. These estimators can be used to evaluate the congruence of test instances with respect to the training set, to answer two practically useful questions: (1) which test instances will be reliably predicted by a model trained with the training instances? and (2) can we identify incongruous regions of the feature space so that data owners understand the data's limitations or guide future data collection? We empirically validate Data-SUITE's performance and coverage guarantees and demonstrate on cross-site medical data, biased data, and data with concept drift, that Data-SUITE best identifies ID regions where a downstream model may be reliable (independent of said model). We also illustrate how these identified regions can provide insights into datasets and highlight their limitations.

preprint2022arXiv

How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating and Auditing Generative Models

Devising domain- and model-agnostic evaluation metrics for generative models is an important and as yet unresolved problem. Most existing metrics, which were tailored solely to the image synthesis setup, exhibit a limited capacity for diagnosing the different modes of failure of generative models across broader application domains. In this paper, we introduce a 3-dimensional evaluation metric, ($α$-Precision, $β$-Recall, Authenticity), that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion. Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity. We introduce generalization as an additional, independent dimension (to the fidelity-diversity trade-off) that quantifies the extent to which a model copies training data -- a crucial performance indicator when modeling sensitive data with requirements on privacy. The three metric components correspond to (interpretable) probabilistic quantities, and are estimated via sample-level binary classification. The sample-level nature of our metric inspires a novel use case which we call model auditing, wherein we judge the quality of individual samples generated by a (black-box) model, discarding low-quality samples and hence improving the overall model performance in a post-hoc manner.

preprint2022arXiv

HyperImpute: Generalized Iterative Imputation with Automatic Model Selection

Consider the problem of imputing missing values in a dataset. One the one hand, conventional approaches using iterative imputation benefit from the simplicity and customizability of learning conditional distributions directly, but suffer from the practical requirement for appropriate model specification of each and every variable. On the other hand, recent methods using deep generative modeling benefit from the capacity and efficiency of learning with neural network function approximators, but are often difficult to optimize and rely on stronger data assumptions. In this work, we study an approach that marries the advantages of both: We propose *HyperImpute*, a generalized iterative imputation framework for adaptively and automatically configuring column-wise models and their hyperparameters. Practically, we provide a concrete implementation with out-of-the-box learners, optimizers, simulators, and extensible interfaces. Empirically, we investigate this framework via comprehensive experiments and sensitivities on a variety of public datasets, and demonstrate its ability to generate accurate imputations relative to a strong suite of benchmarks. Contrary to recent work, we believe our findings constitute a strong defense of the iterative imputation paradigm.

preprint2022arXiv

Identifiable Energy-based Representations: An Application to Estimating Heterogeneous Causal Effects

Conditional average treatment effects (CATEs) allow us to understand the effect heterogeneity across a large population of individuals. However, typical CATE learners assume all confounding variables are measured in order for the CATE to be identifiable. This requirement can be satisfied by collecting many variables, at the expense of increased sample complexity for estimating CATEs. To combat this, we propose an energy-based model (EBM) that learns a low-dimensional representation of the variables by employing a noise contrastive loss function. With our EBM we introduce a preprocessing step that alleviates the dimensionality curse for any existing learner developed for estimating CATEs. We prove that our EBM keeps the representations partially identifiable up to some universal constant, as well as having universal approximation capability. These properties enable the representations to converge and keep the CATE estimates consistent. Experiments demonstrate the convergence of the representations, as well as show that estimating CATEs on our representations performs better than on the variables or the representations obtained through other dimensionality reduction methods.

preprint2022arXiv

Inferring Lexicographically-Ordered Rewards from Preferences

Modeling the preferences of agents over a set of alternatives is a principal concern in many areas. The dominant approach has been to find a single reward/utility function with the property that alternatives yielding higher rewards are preferred over alternatives yielding lower rewards. However, in many settings, preferences are based on multiple, often competing, objectives; a single reward function is not adequate to represent such preferences. This paper proposes a method for inferring multi-objective reward-based representations of an agent's observed preferences. We model the agent's priorities over different objectives as entering lexicographically, so that objectives with lower priorities matter only when the agent is indifferent with respect to objectives with higher priorities. We offer two example applications in healthcare, one inspired by cancer treatment, the other inspired by organ transplantation, to illustrate how the lexicographically-ordered rewards we learn can provide a better understanding of a decision-maker's preferences and help improve policies when used in reinforcement learning.

preprint2022arXiv

Inverse Contextual Bandits: Learning How Behavior Evolves over Time

Understanding a decision-maker's priorities by observing their behavior is critical for transparency and accountability in decision processes, such as in healthcare. Though conventional approaches to policy learning almost invariably assume stationarity in behavior, this is hardly true in practice: Medical practice is constantly evolving as clinical professionals fine-tune their knowledge over time. For instance, as the medical community's understanding of organ transplantations has progressed over the years, a pertinent question is: How have actual organ allocation policies been evolving? To give an answer, we desire a policy learning method that provides interpretable representations of decision-making, in particular capturing an agent's non-stationary knowledge of the world, as well as operating in an offline manner. First, we model the evolving behavior of decision-makers in terms of contextual bandits, and formalize the problem of Inverse Contextual Bandits (ICB). Second, we propose two concrete algorithms as solutions, learning parametric and nonparametric representations of an agent's behavior. Finally, using both real and simulated data for liver transplantations, we illustrate the applicability and explainability of our method, as well as benchmarking and validating its accuracy.

preprint2022arXiv

Label-Free Explainability for Unsupervised Models

Unsupervised black-box models are challenging to interpret. Indeed, most existing explainability methods require labels to select which component(s) of the black-box's output to interpret. In the absence of labels, black-box outputs often are representation vectors whose components do not correspond to any meaningful quantity. Hence, choosing which component(s) to interpret in a label-free unsupervised/self-supervised setting is an important, yet unsolved problem. To bridge this gap in the literature, we introduce two crucial extensions of post-hoc explanation techniques: (1) label-free feature importance and (2) label-free example importance that respectively highlight influential features and training examples for a black-box to construct representations at inference time. We demonstrate that our extensions can be successfully implemented as simple wrappers around many existing feature and example importance methods. We illustrate the utility of our label-free explainability paradigm through a qualitative and quantitative comparison of representation spaces learned by various autoencoders trained on distinct unsupervised tasks.

preprint2022arXiv

Neural graphical modelling in continuous-time: consistency guarantees and algorithms

The discovery of structure from time series data is a key problem in fields of study working with complex systems. Most identifiability results and learning algorithms assume the underlying dynamics to be discrete in time. Comparatively few, in contrast, explicitly define dependencies in infinitesimal intervals of time, independently of the scale of observation and of the regularity of sampling. In this paper, we consider score-based structure learning for the study of dynamical systems. We prove that for vector fields parameterized in a large class of neural networks, least squares optimization with adaptive regularization schemes consistently recovers directed graphs of local independencies in systems of stochastic differential equations. Using this insight, we propose a score-based learning algorithm based on penalized Neural Ordinary Differential Equations (modelling the mean process) that we show to be applicable to the general setting of irregularly-sampled multivariate time series and to outperform the state of the art across a range of dynamical systems.

preprint2022arXiv

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data

We study the problem of inferring heterogeneous treatment effects from time-to-event data. While both the related problems of (i) estimating treatment effects for binary or continuous outcomes and (ii) predicting survival outcomes have been well studied in the recent machine learning literature, their combination -- albeit of high practical relevance -- has received considerably less attention. With the ultimate goal of reliably estimating the effects of treatments on instantaneous risk and survival probabilities, we focus on the problem of learning (discrete-time) treatment-specific conditional hazard functions. We find that unique challenges arise in this context due to a variety of covariate shift issues that go beyond a mere combination of well-studied confounding and censoring biases. We theoretically analyse their effects by adapting recent generalization bounds from domain adaptation and treatment effect estimation to our setting and discuss implications for model design. We use the resulting insights to propose a novel deep learning method for treatment-specific hazard estimation based on balancing representations. We investigate performance across a range of experimental settings and empirically confirm that our method outperforms baselines by addressing covariate shifts from various sources.

preprint2022arXiv

The Medkit-Learn(ing) Environment: Medical Decision Modelling through Simulation

Understanding decision-making in clinical environments is of paramount importance if we are to bring the strengths of machine learning to ultimately improve patient outcomes. Several factors including the availability of public data, the intrinsically offline nature of the problem, and the complexity of human decision making, has meant that the mainstream development of algorithms is often geared towards optimal performance in tasks that do not necessarily translate well into the medical regime; often overlooking more niche issues commonly associated with the area. We therefore present a new benchmarking suite designed specifically for medical sequential decision making: the Medkit-Learn(ing) Environment, a publicly available Python package providing simple and easy access to high-fidelity synthetic medical data. While providing a standardised way to compare algorithms in a realistic medical setting we employ a generating process that disentangles the policy and environment dynamics to allow for a range of customisations, thus enabling systematic evaluation of algorithms' robustness against specific challenges prevalent in healthcare.

preprint2021arXiv

A Variational Information Bottleneck Approach to Multi-Omics Data Integration

Integration of data from multiple omics techniques is becoming increasingly important in biomedical research. Due to non-uniformity and technical limitations in omics platforms, such integrative analyses on multiple omics, which we refer to as views, involve learning from incomplete observations with various view-missing patterns. This is challenging because i) complex interactions within and across observed views need to be properly addressed for optimal predictive power and ii) observations with various view-missing patterns need to be flexibly integrated. To address such challenges, we propose a deep variational information bottleneck (IB) approach for incomplete multi-view observations. Our method applies the IB framework on marginal and joint representations of the observed views to focus on intra-view and inter-view interactions that are relevant for the target. Most importantly, by modeling the joint representations as a product of marginal representations, we can efficiently learn from observed views with various view-missing patterns. Experiments on real-world datasets show that our method consistently achieves gain from data integration and outperforms state-of-the-art benchmarks.

preprint2021arXiv

Conservative Policy Construction Using Variational Autoencoders for Logged Data with Missing Values

In high-stakes applications of data-driven decision making like healthcare, it is of paramount importance to learn a policy that maximizes the reward while avoiding potentially dangerous actions when there is uncertainty. There are two main challenges usually associated with this problem. Firstly, learning through online exploration is not possible due to the critical nature of such applications. Therefore, we need to resort to observational datasets with no counterfactuals. Secondly, such datasets are usually imperfect, additionally cursed with missing values in the attributes of features. In this paper, we consider the problem of constructing personalized policies using logged data when there are missing values in the attributes of features in both training and test data. The goal is to recommend an action (treatment) when $\Xt$, a degraded version of $\Xb$ with missing values, is observed. We consider three strategies for dealing with missingness. In particular, we introduce the \textit{conservative strategy} where the policy is designed to safely handle the uncertainty due to missingness. In order to implement this strategy we need to estimate posterior distribution $p(\Xb|\Xt)$, we use variational autoencoder to achieve this. In particular, our method is based on partial variational autoencoders (PVAE) which are designed to capture the underlying structure of features with missing values.

preprint2021arXiv

Estimating Structural Target Functions using Machine Learning and Influence Functions

We aim to construct a class of learning algorithms that are of practical value to applied researchers in fields such as biostatistics, epidemiology and econometrics, where the need to learn from incompletely observed information is ubiquitous. We propose a new framework for statistical machine learning of target functions arising as identifiable functionals from statistical models, which we call `IF-learning' due to its reliance on influence functions (IFs). This framework is problem- and model-agnostic and can be used to estimate a broad variety of target parameters of interest in applied statistics: we can consider any target function for which an IF of a population-averaged version exists in analytic form. Throughout, we put particular focus on so-called coarsening at random/doubly robust problems with partially unobserved information. This includes problems such as treatment effect estimation and inference in the presence of missing outcome data. Within this framework, we propose two general learning algorithms that build on the idea of nonparametric plug-in bias removal via IFs: the 'IF-learner' which uses pseudo-outcomes motivated by uncentered IFs for regression in large samples and outputs entire target functions without confidence bands, and the 'Group-IF-learner', which outputs only approximations to a function but can give confidence estimates if sufficient information on coarsening mechanisms is available. We apply both in a simulation study on inferring treatment effects.

preprint2021arXiv

Kernel Hypothesis Testing with Set-valued Data

We present a general framework for hypothesis testing on distributions of sets of individual examples. Sets may represent many common data sources such as groups of observations in time series, collections of words in text or a batch of images of a given phenomenon. This observation pattern, however, differs from the common assumptions required for hypothesis testing: each set differs in size, may have differing levels of noise, and also may incorporate nuisance variability, irrelevant for the analysis of the phenomenon of interest; all features that bias test decisions if not accounted for. In this paper, we propose to interpret sets as independent samples from a collection of latent probability distributions, and introduce kernel two-sample and independence tests in this latent space of distributions. We prove the consistency of tests and observe them to outperform in a wide range of synthetic experiments. Finally, we showcase their use in practice with experiments of healthcare and climate data, where previously heuristics were needed for feature extraction and testing.

preprint2021arXiv

Learning Matching Representations for Individualized Organ Transplantation Allocation

Organ transplantation is often the last resort for treating end-stage illness, but the probability of a successful transplantation depends greatly on compatibility between donors and recipients. Current medical practice relies on coarse rules for donor-recipient matching, but is short of domain knowledge regarding the complex factors underlying organ compatibility. In this paper, we formulate the problem of learning data-driven rules for organ matching using observational data for organ allocations and transplant outcomes. This problem departs from the standard supervised learning setup in that it involves matching the two feature spaces (i.e., donors and recipients), and requires estimating transplant outcomes under counterfactual matches not observed in the data. To address these problems, we propose a model based on representation learning to predict donor-recipient compatibility; our model learns representations that cluster donor features, and applies donor-invariant transformations to recipient features to predict outcomes for a given donor-recipient feature instance. Experiments on semi-synthetic and real-world datasets show that our model outperforms state-of-art allocation methods and policies executed by human experts.

preprint2021arXiv

Model-Attentive Ensemble Learning for Sequence Modeling

Medical time-series datasets have unique characteristics that make prediction tasks challenging. Most notably, patient trajectories often contain longitudinal variations in their input-output relationships, generally referred to as temporal conditional shift. Designing sequence models capable of adapting to such time-varying distributions remains a prevailing problem. To address this we present Model-Attentive Ensemble learning for Sequence modeling (MAES). MAES is a mixture of time-series experts which leverages an attention-based gating mechanism to specialize the experts on different sequence dynamics and adaptively weight their predictions. We demonstrate that MAES significantly out-performs popular sequence models on datasets subject to temporal shift.

preprint2021arXiv

Nonparametric Estimation of Heterogeneous Treatment Effects: From Theory to Learning Algorithms

The need to evaluate treatment effectiveness is ubiquitous in most of empirical science, and interest in flexibly investigating effect heterogeneity is growing rapidly. To do so, a multitude of model-agnostic, nonparametric meta-learners have been proposed in recent years. Such learners decompose the treatment effect estimation problem into separate sub-problems, each solvable using standard supervised learning methods. Choosing between different meta-learners in a data-driven manner is difficult, as it requires access to counterfactual information. Therefore, with the ultimate goal of building better understanding of the conditions under which some learners can be expected to perform better than others a priori, we theoretically analyze four broad meta-learning strategies which rely on plug-in estimation and pseudo-outcome regression. We highlight how this theoretical reasoning can be used to guide principled algorithm design and translate our analyses into practice by considering a variety of neural network architectures as base-learners for the discussed meta-learning strategies. In a simulation study, we showcase the relative strengths of the learners under different data-generating processes.

preprint2021arXiv

Personalized Education in the AI Era: What to Expect Next?

The objective of personalized learning is to design an effective knowledge acquisition track that matches the learner's strengths and bypasses her weaknesses to ultimately meet her desired goal. This concept emerged several years ago and is being adopted by a rapidly-growing number of educational institutions around the globe. In recent years, the boost of artificial intelligence (AI) and machine learning (ML), together with the advances in big data analysis, has unfolded novel perspectives to enhance personalized education in numerous dimensions. By taking advantage of AI/ML methods, the educational platform precisely acquires the student's characteristics. This is done, in part, by observing the past experiences as well as analyzing the available big data through exploring the learners' features and similarities. It can, for example, recommend the most appropriate content among numerous accessible ones, advise a well-designed long-term curriculum, connect appropriate learners by suggestion, accurate performance evaluation, and the like. Still, several aspects of AI-based personalized education remain unexplored. These include, among others, compensating for the adverse effects of the absence of peers, creating and maintaining motivations for learning, increasing diversity, removing the biases induced by the data and algorithms, and the like. In this paper, while providing a brief review of state-of-the-art research, we investigate the challenges of AI/ML-based personalized education and discuss potential solutions.

preprint2021arXiv

Policy Analysis using Synthetic Controls in Continuous-Time

Counterfactual estimation using synthetic controls is one of the most successful recent methodological developments in causal inference. Despite its popularity, the current description only considers time series aligned across units and synthetic controls expressed as linear combinations of observed control units. We propose a continuous-time alternative that models the latent counterfactual path explicitly using the formalism of controlled differential equations. This model is directly applicable to the general setting of irregularly-aligned multivariate time series and may be optimized in rich function spaces -- thereby improving on some limitations of existing approaches.

preprint2021arXiv

SDF-Bayes: Cautious Optimism in Safe Dose-Finding Clinical Trials with Drug Combinations and Heterogeneous Patient Groups

Phase I clinical trials are designed to test the safety (non-toxicity) of drugs and find the maximum tolerated dose (MTD). This task becomes significantly more challenging when multiple-drug dose-combinations (DC) are involved, due to the inherent conflict between the exponentially increasing DC candidates and the limited patient budget. This paper proposes a novel Bayesian design, SDF-Bayes, for finding the MTD for drug combinations in the presence of safety constraints. Rather than the conventional principle of escalating or de-escalating the current dose of one drug (perhaps alternating between drugs), SDF-Bayes proceeds by cautious optimism: it chooses the next DC that, on the basis of current information, is most likely to be the MTD (optimism), subject to the constraint that it only chooses DCs that have a high probability of being safe (caution). We also propose an extension, SDF-Bayes-AR, that accounts for patient heterogeneity and enables heterogeneous patient recruitment. Extensive experiments based on both synthetic and real-world datasets demonstrate the advantages of SDF-Bayes over state of the art DC trial designs in terms of accuracy and safety.

preprint2021arXiv

Selecting Treatment Effects Models for Domain Adaptation Using Causal Knowledge

Selecting causal inference models for estimating individualized treatment effects (ITE) from observational data presents a unique challenge since the counterfactual outcomes are never observed. The problem is challenged further in the unsupervised domain adaptation (UDA) setting where we only have access to labeled samples in the source domain, but desire selecting a model that achieves good performance on a target domain for which only unlabeled samples are available. Existing techniques for UDA model selection are designed for the predictive setting. These methods examine discriminative density ratios between the input covariates in the source and target domain and do not factor in the model's predictions in the target domain. Because of this, two models with identical performance on the source domain would receive the same risk score by existing methods, but in reality, have significantly different performance in the test domain. We leverage the invariance of causal structures across domains to propose a novel model selection metric specifically designed for ITE methods under the UDA setting. In particular, we propose selecting models whose predictions of interventions' effects satisfy known causal structures in the target domain. Experimentally, our method selects ITE models that are more robust to covariate shifts on several healthcare datasets, including estimating the effect of ventilation in COVID-19 patients from different geographic locations.

preprint2021arXiv

Strictly Batch Imitation Learning by Energy-based Distribution Matching

Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment. This *strictly batch imitation learning* problem arises wherever live experimentation is costly, such as in healthcare. One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting. But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient. We argue that a good solution should be able to explicitly parameterize a policy (i.e. respecting action conditionals), implicitly learn from rollout dynamics (i.e. leveraging state marginals), and -- crucially -- operate in an entirely offline fashion. To address this challenge, we propose a novel technique by *energy-based distribution matching* (EDM): By identifying parameterizations of the (discriminative) model of a policy with the (generative) energy function for state distributions, EDM yields a simple but effective solution that equivalently minimizes a divergence between the occupancy measure for the demonstrator and a model thereof for the imitator. Through experiments with application to control and healthcare settings, we illustrate consistent performance gains over existing algorithms for strictly batch imitation learning.

preprint2020arXiv

A Non-Stationary Bandit-Learning Approach to Energy-Efficient Femto-Caching with Rateless-Coded Transmission

The ever-increasing demand for media streaming together with limited backhaul capacity renders developing efficient file-delivery methods imperative. One such method is femto-caching, which, despite its great potential, imposes several challenges such as efficient resource management. We study a resource allocation problem for joint caching and transmission in small cell networks, where the system operates in two consecutive phases: (i) cache placement, and (ii) joint file- and transmit power selection followed by broadcasting. We define the utility of every small base station in terms of the number of successful reconstructions per unit of transmission power. We then formulate the problem as to select a file from the cache together with a transmission power level for every broadcast round so that the accumulated utility over the horizon is maximized. The former problem boils down to a stochastic knapsack problem, and we cast the latter as a multi-armed bandit problem. We develop a solution to each problem and provide theoretical and numerical evaluations. In contrast to the state-of-the-art research, the proposed approach is especially suitable for networks with time-variant statistical properties. Moreover, it is applicable and operates well even when no initial information about the statistical characteristics of the random parameters such as file popularity and channel quality is available.

preprint2020arXiv

A primer on coupled state-switching models for multiple interacting time series

State-switching models such as hidden Markov models or Markov-switching regression models are routinely applied to analyse sequences of observations that are driven by underlying non-observable states. Coupled state-switching models extend these approaches to address the case of multiple observation sequences whose underlying state variables interact. In this paper, we provide an overview of the modelling techniques related to coupling in state-switching models, thereby forming a rich and flexible statistical framework particularly useful for modelling correlated time series. Simulation experiments demonstrate the relevance of being able to account for an asynchronous evolution as well as interactions between the underlying latent processes. The models are further illustrated using two case studies related to a) interactions between a dolphin mother and her calf as inferred from movement data; and b) electronic health record data collected on 696 patients within an intensive care unit.

preprint2020arXiv

AutoCP: Automated Pipelines for Accurate Prediction Intervals

Successful application of machine learning models to real-world prediction problems, e.g. financial forecasting and personalized medicine, has proved to be challenging, because such settings require limiting and quantifying the uncertainty in the model predictions, i.e. providing valid and accurate prediction intervals. Conformal Prediction is a distribution-free approach to construct valid prediction intervals in finite samples. However, the prediction intervals constructed by Conformal Prediction are often (because of over-fitting, inappropriate measures of nonconformity, or other issues) overly conservative and hence inadequate for the application(s) at hand. This paper proposes an AutoML framework called Automatic Machine Learning for Conformal Prediction (AutoCP). Unlike the familiar AutoML frameworks that attempt to select the best prediction model, AutoCP constructs prediction intervals that achieve the user-specified target coverage rate while optimizing the interval length to be accurate and less conservative. We tested AutoCP on a variety of datasets and found that it significantly outperforms benchmark algorithms.

preprint2020arXiv

Contextual Constrained Learning for Dose-Finding Clinical Trials

Clinical trials in the medical domain are constrained by budgets. The number of patients that can be recruited is therefore limited. When a patient population is heterogeneous, this creates difficulties in learning subgroup specific responses to a particular drug and especially for a variety of dosages. In addition, patient recruitment can be difficult by the fact that clinical trials do not aim to provide a benefit to any given patient in the trial. In this paper, we propose C3T-Budget, a contextual constrained clinical trial algorithm for dose-finding under both budget and safety constraints. The algorithm aims to maximize drug efficacy within the clinical trial while also learning about the drug being tested. C3T-Budget recruits patients with consideration of the remaining budget, the remaining time, and the characteristics of each group, such as the population distribution, estimated expected efficacy, and estimation credibility. In addition, the algorithm aims to avoid unsafe dosages. These characteristics are further illustrated in a simulated clinical trial study, which corroborates the theoretical analysis and demonstrates an efficient budget usage as well as a balanced learning-treatment trade-off.

preprint2020arXiv

CPAS: the UK's National Machine Learning-based Hospital Capacity Planning System for COVID-19

The coronavirus disease 2019 (COVID-19) global pandemic poses the threat of overwhelming healthcare systems with unprecedented demands for intensive care resources. Managing these demands cannot be effectively conducted without a nationwide collective effort that relies on data to forecast hospital demands on the national, regional, hospital and individual levels. To this end, we developed the COVID-19 Capacity Planning and Analysis System (CPAS) - a machine learning-based system for hospital resource planning that we have successfully deployed at individual hospitals and across regions in the UK in coordination with NHS Digital. In this paper, we discuss the main challenges of deploying a machine learning-based decision support system at national scale, and explain how CPAS addresses these challenges by (1) defining the appropriate learning problem, (2) combining bottom-up and top-down analytical approaches, (3) using state-of-the-art machine learning algorithms, (4) integrating heterogeneous data sources, and (5) presenting the result with an interactive and transparent interface. CPAS is one of the first machine learning-based systems to be deployed in hospitals on a national scale to address the COVID-19 pandemic - we conclude the paper with a summary of the lessons learned from this experience.

preprint2020arXiv

Discriminative Jackknife: Quantifying Uncertainty in Deep Learning via Higher-Order Influence Functions

Deep learning models achieve high predictive accuracy across a broad spectrum of tasks, but rigorously quantifying their predictive uncertainty remains challenging. Usable estimates of predictive uncertainty should (1) cover the true prediction targets with high probability, and (2) discriminate between high- and low-confidence prediction instances. Existing methods for uncertainty quantification are based predominantly on Bayesian neural networks; these may fall short of (1) and (2) -- i.e., Bayesian credible intervals do not guarantee frequentist coverage, and approximate posterior inference undermines discriminative accuracy. In this paper, we develop the discriminative jackknife (DJ), a frequentist procedure that utilizes influence functions of a model's loss functional to construct a jackknife (or leave-one-out) estimator of predictive confidence intervals. The DJ satisfies (1) and (2), is applicable to a wide range of deep learning models, is easy to implement, and can be applied in a post-hoc fashion without interfering with model training or compromising its accuracy. Experiments demonstrate that DJ performs competitively compared to existing Bayesian and non-Bayesian regression baselines.

preprint2020arXiv

Estimating Counterfactual Treatment Outcomes over Time Through Adversarially Balanced Representations

Identifying when to give treatments to patients and how to select among multiple treatments over time are important medical problems with a few existing solutions. In this paper, we introduce the Counterfactual Recurrent Network (CRN), a novel sequence-to-sequence model that leverages the increasingly available patient observational data to estimate treatment effects over time and answer such medical questions. To handle the bias from time-varying confounders, covariates affecting the treatment assignment policy in the observational data, CRN uses domain adversarial training to build balancing representations of the patient history. At each timestep, CRN constructs a treatment invariant representation which removes the association between patient history and treatment assignments and thus can be reliably used for making counterfactual predictions. On a simulated model of tumour growth, with varying degree of time-dependent confounding, we show how our model achieves lower error in estimating counterfactuals and in choosing the correct treatment and timing of treatment than current state-of-the-art methods.

preprint2020arXiv

Frequentist Uncertainty in Recurrent Neural Networks via Blockwise Influence Functions

Recurrent neural networks (RNNs) are instrumental in modelling sequential and time-series data. Yet, when using RNNs to inform decision-making, predictions by themselves are not sufficient; we also need estimates of predictive uncertainty. Existing approaches for uncertainty quantification in RNNs are based predominantly on Bayesian methods; these are computationally prohibitive, and require major alterations to the RNN architecture and training. Capitalizing on ideas from classical jackknife resampling, we develop a frequentist alternative that: (a) does not interfere with model training or compromise its accuracy, (b) applies to any RNN architecture, and (c) provides theoretical coverage guarantees on the estimated uncertainty intervals. Our method derives predictive uncertainty from the variability of the (jackknife) sampling distribution of the RNN outputs, which is estimated by repeatedly deleting blocks of (temporally-correlated) training data, and collecting the predictions of the RNN re-trained on the remaining data. To avoid exhaustive re-training, we utilize influence functions to estimate the effect of removing training data blocks on the learned RNN parameters. Using data from a critical care setting, we demonstrate the utility of uncertainty quantification in sequential decision-making.

preprint2020arXiv

Hide-and-Seek Privacy Challenge

The clinical time-series setting poses a unique combination of challenges to data modeling and sharing. Due to the high dimensionality of clinical time series, adequate de-identification to preserve privacy while retaining data utility is difficult to achieve using common de-identification techniques. An innovative approach to this problem is synthetic data generation. From a technical perspective, a good generative model for time-series data should preserve temporal dynamics, in the sense that new sequences respect the original relationships between high-dimensional variables across time. From the privacy perspective, the model should prevent patient re-identification by limiting vulnerability to membership inference attacks. The NeurIPS 2020 Hide-and-Seek Privacy Challenge is a novel two-tracked competition to simultaneously accelerate progress in tackling both problems. In our head-to-head format, participants in the synthetic data generation track (i.e. "hiders") and the patient re-identification track (i.e. "seekers") are directly pitted against each other by way of a new, high-quality intensive care time-series dataset: the AmsterdamUMCdb dataset. Ultimately, we seek to advance generative techniques for dense and high-dimensional temporal data streams that are (1) clinically meaningful in terms of fidelity and predictivity, as well as (2) capable of minimizing membership privacy risks in terms of the concrete notion of patient re-identification.

preprint2020arXiv

Inverse Active Sensing: Modeling and Understanding Timely Decision-Making

Evidence-based decision-making entails collecting (costly) observations about an underlying phenomenon of interest, and subsequently committing to an (informed) decision on the basis of accumulated evidence. In this setting, active sensing is the goal-oriented problem of efficiently selecting which acquisitions to make, and when and what decision to settle on. As its complement, inverse active sensing seeks to uncover an agent's preferences and strategy given their observable decision-making behavior. In this paper, we develop an expressive, unified framework for the general setting of evidence-based decision-making under endogenous, context-dependent time pressure---which requires negotiating (subjective) tradeoffs between accuracy, speediness, and cost of information. Using this language, we demonstrate how it enables modeling intuitive notions of surprise, suspense, and optimality in decision strategies (the forward problem). Finally, we illustrate how this formulation enables understanding decision-making behavior by quantifying preferences implicit in observed decision strategies (the inverse problem).

preprint2020arXiv

Learning Dynamic and Personalized Comorbidity Networks from Event Data using Deep Diffusion Processes

Comorbid diseases co-occur and progress via complex temporal patterns that vary among individuals. In electronic health records we can observe the different diseases a patient has, but can only infer the temporal relationship between each co-morbid condition. Learning such temporal patterns from event data is crucial for understanding disease pathology and predicting prognoses. To this end, we develop deep diffusion processes (DDP) to model "dynamic comorbidity networks", i.e., the temporal relationships between comorbid disease onsets expressed through a dynamic graph. A DDP comprises events modelled as a multi-dimensional point process, with an intensity function parameterized by the edges of a dynamic weighted graph. The graph structure is modulated by a neural network that maps patient history to edge weights, enabling rich temporal representations for disease trajectories. The DDP parameters decouple into clinically meaningful components, which enables serving the dual purpose of accurate risk prediction and intelligible representation of disease pathology. We illustrate these features in experiments using cancer registry data.

preprint2020arXiv

Learning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints

Phase I dose-finding trials are increasingly challenging as the relationship between efficacy and toxicity of new compounds (or combination of them) becomes more complex. Despite this, most commonly used methods in practice focus on identifying a Maximum Tolerated Dose (MTD) by learning only from toxicity events. We present a novel adaptive clinical trial methodology, called Safe Efficacy Exploration Dose Allocation (SEEDA), that aims at maximizing the cumulative efficacies while satisfying the toxicity safety constraint with high probability. We evaluate performance objectives that have operational meanings in practical clinical trials, including cumulative efficacy, recommendation/allocation success probabilities, toxicity violation probability, and sample efficiency. An extended SEEDA-Plateau algorithm that is tailored for the increase-then-plateau efficacy behavior of molecularly targeted agents (MTA) is also presented. Through numerical experiments using both synthetic and real-world datasets, we show that SEEDA outperforms state-of-the-art clinical trial designs by finding the optimal dose with higher success rate and fewer patients.

preprint2020arXiv

Learning Overlapping Representations for the Estimation of Individualized Treatment Effects

The choice of making an intervention depends on its potential benefit or harm in comparison to alternatives. Estimating the likely outcome of alternatives from observational data is a challenging problem as all outcomes are never observed, and selection bias precludes the direct comparison of differently intervened groups. Despite their empirical success, we show that algorithms that learn domain-invariant representations of inputs (on which to make predictions) are often inappropriate, and develop generalization bounds that demonstrate the dependence on domain overlap and highlight the need for invertible latent maps. Based on these results, we develop a deep kernel regression algorithm and posterior regularization framework that substantially outperforms the state-of-the-art on a variety of benchmarks data sets.

preprint2020arXiv

Stepwise Model Selection for Sequence Prediction via Deep Kernel Learning

An essential problem in automated machine learning (AutoML) is that of model selection. A unique challenge in the sequential setting is the fact that the optimal model itself may vary over time, depending on the distribution of features and labels available up to each point in time. In this paper, we propose a novel Bayesian optimization (BO) algorithm to tackle the challenge of model selection in this setting. This is accomplished by treating the performance at each time step as its own black-box function. In order to solve the resulting multiple black-box function optimization problem jointly and efficiently, we exploit potential correlations among black-box functions using deep kernel learning (DKL). To the best of our knowledge, we are the first to formulate the problem of stepwise model selection (SMS) for sequence prediction, and to design and demonstrate an efficient joint-learning algorithm for this purpose. Using multiple real-world datasets, we verify that our proposed method outperforms both standard BO and multi-objective BO algorithms on a variety of sequence prediction tasks.

preprint2020arXiv

Target-Embedding Autoencoders for Supervised Representation Learning

Autoencoder-based learning has emerged as a staple for disciplining representations in unsupervised and semi-supervised settings. This paper analyzes a framework for improving generalization in a purely supervised setting, where the target space is high-dimensional. We motivate and formalize the general framework of target-embedding autoencoders (TEA) for supervised prediction, learning intermediate latent representations jointly optimized to be both predictable from features as well as predictive of targets---encoding the prior that variations in targets are driven by a compact set of underlying factors. As our theoretical contribution, we provide a guarantee of generalization for linear TEAs by demonstrating uniform stability, interpreting the benefit of the auxiliary reconstruction task as a form of regularization. As our empirical contribution, we extend validation of this approach beyond existing static classification applications to multivariate sequence forecasting, verifying their advantage on both linear and nonlinear recurrent architectures---thereby underscoring the further generality of this framework beyond feedforward instantiations.

preprint2020arXiv

Temporal Phenotyping using Deep Predictive Clustering of Disease Progression

Due to the wider availability of modern electronic health records, patient care data is often being stored in the form of time-series. Clustering such time-series data is crucial for patient phenotyping, anticipating patients' prognoses by identifying "similar" patients, and designing treatment guidelines that are tailored to homogeneous patient subgroups. In this paper, we develop a deep learning approach for clustering time-series data, where each cluster comprises patients who share similar future outcomes of interest (e.g., adverse events, the onset of comorbidities). To encourage each cluster to have homogeneous future outcomes, the clustering is carried out by learning discrete representations that best describe the future outcome distribution based on novel loss functions. Experiments on two real-world datasets show that our model achieves superior clustering performance over state-of-the-art benchmarks and identifies meaningful clusters that can be translated into actionable information for clinical decision-making.

preprint2020arXiv

Time Series Deconfounder: Estimating Treatment Effects over Time in the Presence of Hidden Confounders

The estimation of treatment effects is a pervasive problem in medicine. Existing methods for estimating treatment effects from longitudinal observational data assume that there are no hidden confounders, an assumption that is not testable in practice and, if it does not hold, leads to biased estimates. In this paper, we develop the Time Series Deconfounder, a method that leverages the assignment of multiple treatments over time to enable the estimation of treatment effects in the presence of multi-cause hidden confounders. The Time Series Deconfounder uses a novel recurrent neural network architecture with multitask output to build a factor model over time and infer latent variables that render the assigned treatments conditionally independent; then, it performs causal inference using these latent variables that act as substitutes for the multi-cause unobserved confounders. We provide a theoretical analysis for obtaining unbiased causal effects of time-varying exposures using the Time Series Deconfounder. Using both simulated and real data we show the effectiveness of our method in deconfounding the estimation of treatment responses over time.

preprint2020arXiv

Unlabelled Data Improves Bayesian Uncertainty Calibration under Covariate Shift

Modern neural networks have proven to be powerful function approximators, providing state-of-the-art performance in a multitude of applications. They however fall short in their ability to quantify confidence in their predictions - this is crucial in high-stakes applications that involve critical decision-making. Bayesian neural networks (BNNs) aim at solving this problem by placing a prior distribution over the network's parameters, thereby inducing a posterior distribution that encapsulates predictive uncertainty. While existing variants of BNNs based on Monte Carlo dropout produce reliable (albeit approximate) uncertainty estimates over in-distribution data, they tend to exhibit over-confidence in predictions made on target data whose feature distribution differs from the training data, i.e., the covariate shift setup. In this paper, we develop an approximate Bayesian inference scheme based on posterior regularisation, wherein unlabelled target data are used as "pseudo-labels" of model confidence that are used to regularise the model's loss on labelled source data. We show that this approach significantly improves the accuracy of uncertainty quantification on covariate-shifted data sets, with minimal modification to the underlying model architecture. We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.

preprint2020arXiv

When and How to Lift the Lockdown? Global COVID-19 Scenario Analysis and Policy Assessment using Compartmental Gaussian Processes

The coronavirus disease 2019 (COVID-19) global pandemic has led many countries to impose unprecedented lockdown measures in order to slow down the outbreak. Questions on whether governments have acted promptly enough, and whether lockdown measures can be lifted soon have since been central in public discourse. Data-driven models that predict COVID-19 fatalities under different lockdown policy scenarios are essential for addressing these questions and informing governments on future policy directions. To this end, this paper develops a Bayesian model for predicting the effects of COVID-19 lockdown policies in a global context -- we treat each country as a distinct data point, and exploit variations of policies across countries to learn country-specific policy effects. Our model utilizes a two-layer Gaussian process (GP) prior -- the lower layer uses a compartmental SEIR (Susceptible, Exposed, Infected, Recovered) model as a prior mean function with "country-and-policy-specific" parameters that capture fatality curves under "counterfactual" policies within each country, whereas the upper layer is shared across all countries, and learns lower-layer SEIR parameters as a function of a country's features and its policy indicators. Our model combines the solid mechanistic foundations of SEIR models (Bayesian priors) with the flexible data-driven modeling and gradient-based optimization routines of machine learning (Bayesian posteriors) -- i.e., the entire model is trained end-to-end via stochastic variational inference. We compare the projections of COVID-19 fatalities by our model with other models listed by the Center for Disease Control (CDC), and provide scenario analyses for various lockdown and reopening strategies highlighting their impact on COVID-19 fatalities.

preprint2018arXiv

Distributed Task Management in Cyber-Physical Systems: How to Cooperate under Uncertainty?

We consider the problem of task allocation in a network of cyber-physical systems (CPSs). The network can have different states, and the tasks are of different types. The task arrival is stochastic and state-dependent. Every CPS is capable of performing each type of task with some specific state-dependent efficiency. The CPSs have to agree on task allocation prior to knowing about the realized network's state and/or the arrived tasks. We model the problem as a multi-state stochastic cooperative game with state uncertainty. We then use the concept of deterministic equivalence and sequential core to solve the problem. We establish the non-emptiness of the strong sequential core in our designed task allocation game and investigate its characteristics including uniqueness and optimality. Moreover, we prove that in the task allocation game, the strong sequential core is equivalent to Walrasian equilibrium under state uncertainty; consequently, it can be implemented by using the Walras' tatonnement process.

preprint2016arXiv

A Hidden Absorbing Semi-Markov Model for Informatively Censored Temporal Data: Learning and Inference

Modeling continuous-time physiological processes that manifest a patient's evolving clinical states is a key step in approaching many problems in healthcare. In this paper, we develop the Hidden Absorbing Semi-Markov Model (HASMM): a versatile probabilistic model that is capable of capturing the modern electronic health record (EHR) data. Unlike exist- ing models, an HASMM accommodates irregularly sampled, temporally correlated, and informatively censored physiological data, and can describe non-stationary clinical state transitions. Learning an HASMM from the EHR data is achieved via a novel forward- filtering backward-sampling Monte-Carlo EM algorithm that exploits the knowledge of the end-point clinical outcomes (informative censoring) in the EHR data, and implements the E-step by sequentially sampling the patients' clinical states in the reverse-time direction while conditioning on the future states. Real-time inferences are drawn via a forward- filtering algorithm that operates on a virtually constructed discrete-time embedded Markov chain that mirrors the patient's continuous-time state trajectory. We demonstrate the di- agnostic and prognostic utility of the HASMM in a critical care prognosis setting using a real-world dataset for patients admitted to the Ronald Reagan UCLA Medical Center.

preprint2016arXiv

A Non-stochastic Learning Approach to Energy Efficient Mobility Management

Energy efficient mobility management is an important problem in modern wireless networks with heterogeneous cell sizes and increased nodes densities. We show that optimization-based mobility protocols cannot achieve long-term optimal energy consumption, particularly for ultra-dense networks (UDN). To address the complex dynamics of UDN, we propose a non-stochastic online-learning approach which does not make any assumption on the statistical behavior of the small base station (SBS) activities. In addition, we introduce handover cost to the overall energy consumption, which forces the resulting solution to explicitly minimize frequent handovers. The proposed Batched Randomization with Exponential Weighting (BREW) algorithm relies on batching to explore in bulk, and hence reduces unnecessary handovers. We prove that the regret of BREW is sublinear in time, thus guaranteeing its convergence to the optimal SBS selection. We further study the robustness of the BREW algorithm to delayed or missing feedback. Moreover, we study the setting where SBSs can be dynamically turned on and off. We prove that sublinear regret is impossible with respect to arbitrary SBS on/off, and then develop a novel learning strategy, called ranking expert (RE), that simultaneously takes into account the handover cost and the availability of SBS. To address the high complexity of RE, we propose a contextual ranking expert (CRE) algorithm that only assigns experts in a given context. Rigorous regret bounds are proved for both RE and CRE with respect to the best expert. Simulations show that not only do the proposed mobility algorithms greatly reduce the system energy consumption, but they are also robust to various dynamics which are common in practical ultra-dense wireless networks.

preprint2016arXiv

A Semi-Markov Switching Linear Gaussian Model for Censored Physiological Data

Critically ill patients in regular wards are vulnerable to unanticipated clinical dete- rioration which requires timely transfer to the intensive care unit (ICU). To allow for risk scoring and patient monitoring in such a setting, we develop a novel Semi- Markov Switching Linear Gaussian Model (SSLGM) for the inpatients' physiol- ogy. The model captures the patients' latent clinical states and their corresponding observable lab tests and vital signs. We present an efficient unsupervised learn- ing algorithm that capitalizes on the informatively censored data in the electronic health records (EHR) to learn the parameters of the SSLGM; the learned model is then used to assess the new inpatients' risk for clinical deterioration in an online fashion, allowing for timely ICU admission. Experiments conducted on a het- erogeneous cohort of 6,094 patients admitted to a large academic medical center show that the proposed model significantly outperforms the currently deployed risk scores such as Rothman index, MEWS, SOFA and APACHE.

preprint2016arXiv

A Theory of Individualism, Collectivism and Economic Outcomes

This paper presents a dynamic model to study the impact on the economic outcomes in different societies during the Malthusian Era of individualism (time spent working alone) and collectivism (complementary time spent working with others). The model is driven by opposing forces: a greater degree of collectivism provides a higher safety net for low quality workers but a greater degree of individualism allows high quality workers to leave larger bequests. The model suggests that more individualistic societies display smaller populations, greater per capita income and greater income inequality. Some (limited) historical evidence is consistent with these predictions.

preprint2016arXiv

Adaptive Ensemble Learning with Confidence Bounds

Extracting actionable intelligence from distributed, heterogeneous, correlated and high-dimensional data sources requires run-time processing and learning both locally and globally. In the last decade, a large number of meta-learning techniques have been proposed in which local learners make online predictions based on their locally-collected data instances, and feed these predictions to an ensemble learner, which fuses them and issues a global prediction. However, most of these works do not provide performance guarantees or, when they do, these guarantees are asymptotic. None of these existing works provide confidence estimates about the issued predictions or rate of learning guarantees for the ensemble learner. In this paper, we provide a systematic ensemble learning method called Hedged Bandits, which comes with both long run (asymptotic) and short run (rate of learning) performance guarantees. Moreover, our approach yields performance guarantees with respect to the optimal local prediction strategy, and is also able to adapt its predictions in a data-driven manner. We illustrate the performance of Hedged Bandits in the context of medical informatics and show that it outperforms numerous online and offline ensemble learning methods.

preprint2016arXiv

Balancing Suspense and Surprise: Timely Decision Making with Endogenous Information Acquisition

We develop a Bayesian model for decision-making under time pressure with endogenous information acquisition. In our model, the decision maker decides when to observe (costly) information by sampling an underlying continuous-time stochastic process (time series) that conveys information about the potential occurrence or non-occurrence of an adverse event which will terminate the decision-making process. In her attempt to predict the occurrence of the adverse event, the decision-maker follows a policy that determines when to acquire information from the time series (continuation), and when to stop acquiring information and make a final prediction (stopping). We show that the optimal policy has a rendezvous structure, i.e. a structure in which whenever a new information sample is gathered from the time series, the optimal "date" for acquiring the next sample becomes computable. The optimal interval between two information samples balances a trade-off between the decision maker's surprise, i.e. the drift in her posterior belief after observing new information, and suspense, i.e. the probability that the adverse event occurs in the time interval between two information samples. Moreover, we characterize the continuation and stopping regions in the decision-maker's state-space, and show that they depend not only on the decision-maker's beliefs, but also on the context, i.e. the current realization of the time series.

preprint2016arXiv

ConfidentCare: A Clinical Decision Support System for Personalized Breast Cancer Screening

Breast cancer screening policies attempt to achieve timely diagnosis by the regular screening of apparently healthy women. Various clinical decisions are needed to manage the screening process; those include: selecting the screening tests for a woman to take, interpreting the test outcomes, and deciding whether or not a woman should be referred to a diagnostic test. Such decisions are currently guided by clinical practice guidelines (CPGs), which represent a one-size-fits-all approach that are designed to work well on average for a population, without guaranteeing that it will work well uniformly over that population. Since the risks and benefits of screening are functions of each patients features, personalized screening policies that are tailored to the features of individuals are needed in order to ensure that the right tests are recommended to the right woman. In order to address this issue, we present ConfidentCare: a computer-aided clinical decision support system that learns a personalized screening policy from the electronic health record (EHR) data. ConfidentCare operates by recognizing clusters of similar patients, and learning the best screening policy to adopt for each cluster. A cluster of patients is a set of patients with similar features (e.g. age, breast density, family history, etc.), and the screening policy is a set of guidelines on what actions to recommend for a woman given her features and screening test scores. ConfidentCare algorithm ensures that the policy adopted for every cluster of patients satisfies a predefined accuracy requirement with a high level of confidence. We show that our algorithm outperforms the current CPGs in terms of cost-efficiency and false positive rates.

preprint2016arXiv

Personalized Course Sequence Recommendations

Given the variability in student learning it is becoming increasingly important to tailor courses as well as course sequences to student needs. This paper presents a systematic methodology for offering personalized course sequence recommendations to students. First, a forward-search backward-induction algorithm is developed that can optimally select course sequences to decrease the time required for a student to graduate. The algorithm accounts for prerequisite requirements (typically present in higher level education) and course availability. Second, using the tools of multi-armed bandits, an algorithm is developed that can optimally recommend a course sequence that both reduces the time to graduate while also increasing the overall GPA of the student. The algorithm dynamically learns how students with different contextual backgrounds perform for given course sequences and then recommends an optimal course sequence for new students. Using real-world student data from the UCLA Mechanical and Aerospace Engineering department, we illustrate how the proposed algorithms outperform other methods that do not include student contextual information when making course sequence recommendations.

preprint2016arXiv

Personalized Donor-Recipient Matching for Organ Transplantation

Organ transplants can improve the life expectancy and quality of life for the recipient but carries the risk of serious post-operative complications, such as septic shock and organ rejection. The probability of a successful transplant depends in a very subtle fashion on compatibility between the donor and the recipient but current medical practice is short of domain knowledge regarding the complex nature of recipient-donor compatibility. Hence a data-driven approach for learning compatibility has the potential for significant improvements in match quality. This paper proposes a novel system (ConfidentMatch) that is trained using data from electronic health records. ConfidentMatch predicts the success of an organ transplant (in terms of the 3 year survival rates) on the basis of clinical and demographic traits of the donor and recipient. ConfidentMatch captures the heterogeneity of the donor and recipient traits by optimally dividing the feature space into clusters and constructing different optimal predictive models to each cluster. The system controls the complexity of the learned predictive model in a way that allows for assuring more granular and confident predictions for a larger number of potential recipient-donor pairs, thereby ensuring that predictions are "personalized" and tailored to individual characteristics to the finest possible granularity. Experiments conducted on the UNOS heart transplant dataset show the superiority of the prognostic value of ConfidentMatch to other competing benchmarks; ConfidentMatch can provide predictions of success with 95% confidence for 5,489 patients of a total population of 9,620 patients, which corresponds to 410 more patients than the most competitive benchmark algorithm (DeepBoost).

preprint2016arXiv

Personalized Risk Scoring for Critical Care Patients using Mixtures of Gaussian Process Experts

We develop a personalized real time risk scoring algorithm that provides timely and granular assessments for the clinical acuity of ward patients based on their (temporal) lab tests and vital signs. Heterogeneity of the patients population is captured via a hierarchical latent class model. The proposed algorithm aims to discover the number of latent classes in the patients population, and train a mixture of Gaussian Process (GP) experts, where each expert models the physiological data streams associated with a specific class. Self-taught transfer learning is used to transfer the knowledge of latent classes learned from the domain of clinically stable patients to the domain of clinically deteriorating patients. For new patients, the posterior beliefs of all GP experts about the patient's clinical status given her physiological data stream are computed, and a personalized risk score is evaluated as a weighted average of those beliefs, where the weights are learned from the patient's hospital admission information. Experiments on a heterogeneous cohort of 6,313 patients admitted to Ronald Regan UCLA medical center show that our risk score outperforms the currently deployed risk scores, such as MEWS and Rothman scores.

preprint2016arXiv

Personalized Risk Scoring for Critical Care Prognosis using Mixtures of Gaussian Processes

Objective: In this paper, we develop a personalized real-time risk scoring algorithm that provides timely and granular assessments for the clinical acuity of ward patients based on their (temporal) lab tests and vital signs; the proposed risk scoring system ensures timely intensive care unit (ICU) admissions for clinically deteriorating patients. Methods: The risk scoring system learns a set of latent patient subtypes from the offline electronic health record data, and trains a mixture of Gaussian Process (GP) experts, where each expert models the physiological data streams associated with a specific patient subtype. Transfer learning techniques are used to learn the relationship between a patient's latent subtype and her static admission information (e.g. age, gender, transfer status, ICD-9 codes, etc). Results: Experiments conducted on data from a heterogeneous cohort of 6,321 patients admitted to Ronald Reagan UCLA medical center show that our risk score significantly and consistently outperforms the currently deployed risk scores, such as the Rothman index, MEWS, APACHE and SOFA scores, in terms of timeliness, true positive rate (TPR), and positive predictive value (PPV). Conclusion: Our results reflect the importance of adopting the concepts of personalized medicine in critical care settings; significant accuracy and timeliness gains can be achieved by accounting for the patients' heterogeneity. Significance: The proposed risk scoring methodology can confer huge clinical and social benefits on more than 200,000 critically ill inpatient who exhibit cardiac arrests in the US every year.

preprint2016arXiv

Predicting Grades

To increase efficacy in traditional classroom courses as well as in Massive Open Online Courses (MOOCs), automated systems supporting the instructor are needed. One important problem is to automatically detect students that are going to do poorly in a course early enough to be able to take remedial actions. Existing grade prediction systems focus on maximizing the accuracy of the prediction while overseeing the importance of issuing timely and personalized predictions. This paper proposes an algorithm that predicts the final grade of each student in a class. It issues a prediction for each student individually, when the expected accuracy of the prediction is sufficient. The algorithm learns online what is the optimal prediction and time to issue a prediction based on past history of students' performance in a course. We derive a confidence estimate for the prediction accuracy and demonstrate the performance of our algorithm on a dataset obtained based on the performance of approximately 700 UCLA undergraduate students who have taken an introductory digital signal processing over the past 7 years. We demonstrate that for 85% of the students we can predict with 76% accuracy whether they are going do well or poorly in the class after the 4th course week. Using data obtained from a pilot course, our methodology suggests that it is effective to perform early in-class assessments such as quizzes, which result in timely performance prediction for each student, thereby enabling timely interventions by the instructor (at the student or class level) when necessary.

preprint2016arXiv

Reputational Learning and Network Dynamics

In many real world networks agents are initially unsure of each other's qualities and must learn about each other over time via repeated interactions. This paper is the first to provide a methodology for studying the dynamics of such networks, taking into account that agents differ from each other, that they begin with incomplete information, and that they must learn through past experiences which connections/links to form and which to break. The network dynamics in our model vary drastically from the dynamics in models of complete information. With incomplete information and learning, agents who provide high benefits will develop high reputations and remain in the network, while agents who provide low benefits will drop in reputation and become ostracized. We show, among many other things, that the information to which agents have access and the speed at which they learn and act can have a tremendous impact on the resulting network dynamics. Using our model, we can also compute the ex ante social welfare given an arbitrary initial network, which allows us to characterize the socially optimal network structures for different sets of agents. Importantly, we show through examples that the optimal network structure depends sharply on both the initial beliefs of the agents, as well as the rate of learning by the agents. Due to the potential negative consequences of ostracism, it may be necessary to place agents with lower initial reputations at less central positions within the network.

preprint2015arXiv

A Micro-foundation of Social Capital in Evolving Social Networks

A social network confers benefits and advantages on individuals (and on groups), the literature refers to these advantages as social capital. This paper presents a micro-founded mathematical model of the evolution of a social network and of the social capital of individuals within the network. The evolution of the network is influenced by the extent to which individuals are homophilic, structurally opportunistic, socially gregarious and by the distribution of types in the society. In the analysis, we identify different kinds of social capital: bonding capital, popularity capital, and bridging capital. Bonding capital is created by forming a circle of connections, homophily increases bonding capital because it makes this circle of connections more homogeneous. Popularity capital leads to preferential attachment: individuals who become popular tend to become more popular because others are more likely to link to them. Homophily creates asymmetries in the levels of popularity attained by different social groups, more gregarious types of agents are more likely to become popular. However, in homophilic societies, individuals who belong to less gregarious, less opportunistic, or major types are likely to be more central in the network and thus acquire a bridging capital.

preprint2015arXiv

Contextual Online Learning for Multimedia Content Aggregation

The last decade has witnessed a tremendous growth in the volume as well as the diversity of multimedia content generated by a multitude of sources (news agencies, social media, etc.). Faced with a variety of content choices, consumers are exhibiting diverse preferences for content; their preferences often depend on the context in which they consume content as well as various exogenous events. To satisfy the consumers' demand for such diverse content, multimedia content aggregators (CAs) have emerged which gather content from numerous multimedia sources. A key challenge for such systems is to accurately predict what type of content each of its consumers prefers in a certain context, and adapt these predictions to the evolving consumers' preferences, contexts and content characteristics. We propose a novel, distributed, online multimedia content aggregation framework, which gathers content generated by multiple heterogeneous producers to fulfill its consumers' demand for content. Since both the multimedia content characteristics and the consumers' preferences and contexts are unknown, the optimal content aggregation strategy is unknown a priori. Our proposed content aggregation algorithm is able to learn online what content to gather and how to match content and users by exploiting similarities between consumer types. We prove bounds for our proposed learning algorithms that guarantee both the accuracy of the predictions as well as the learning speed. Importantly, our algorithms operate efficiently even when feedback from consumers is missing or content and preferences evolve over time. Illustrative results highlight the merits of the proposed content aggregation system in a variety of settings.

preprint2015arXiv

Distributed Interference Management Policies for Heterogeneous Small Cell Networks

We study the problem of interference management in large-scale small cell networks, where each user equipment (UE) needs to determine in a distributed manner when and at what power level it should transmit to its serving small cell base station (SBS) such that a given network performance criterion is maximized subject to minimum quality of service (QoS) requirements by the UEs. We first propose a distributed algorithm for the UE-SBS pairs to find a subset of weakly interfering UE-SBS pairs, namely the maximal independent sets (MISs) of the interference graph in logarithmic time (with respect to the number of UEs). Then we propose a novel problem formulation which enables UE-SBS pairs to determine the optimal fractions of time occupied by each MIS in a distributed manner. We analytically bound the performance of our distributed policy in terms of the competitive ratio with respect to the optimal network performance, which is obtained in a centralized manner with NP (non-deterministic polynomial time) complexity. Remarkably, the competitive ratio is independent of the network size, which guarantees scalability in terms of performance for arbitrarily large networks. Through simulations, we show that our proposed policies achieve significant performance improvements (from 150% to 700%) over the existing policies.

preprint2015arXiv

Distributed Online Learning via Cooperative Contextual Bandits

In this paper we propose a novel framework for decentralized, online learning by many learners. At each moment of time, an instance characterized by a certain context may arrive to each learner; based on the context, the learner can select one of its own actions (which gives a reward and provides information) or request assistance from another learner. In the latter case, the requester pays a cost and receives the reward but the provider learns the information. In our framework, learners are modeled as cooperative contextual bandits. Each learner seeks to maximize the expected reward from its arrivals, which involves trading off the reward received from its own actions, the information learned from its own actions, the reward received from the actions requested of others and the cost paid for these actions - taking into account what it has learned about the value of assistance from each other learner. We develop distributed online learning algorithms and provide analytic bounds to compare the efficiency of these with algorithms with the complete knowledge (oracle) benchmark (in which the expected reward of every action in every context is known by every learner). Our estimates show that regret - the loss incurred by the algorithm - is sublinear in time. Our theoretical framework can be used in many practical applications including Big Data mining, event detection in surveillance sensor networks and distributed online recommendation systems.

preprint2015arXiv

Dynamic Network Formation with Foresighted Agents

What networks can form and persist when agents are self-interested? Can such networks be efficient? A substantial theoretical literature predicts that the only networks that can form and persist must have very special shapes and that such networks cannot be efficient, but these predictions are in stark contrast to empirical findings. In this paper, we present a new model of network formation. In contrast to the existing literature, our model is dynamic (rather than static), we model agents as foresighted (rather than myopic) and we allow for the possibility that agents are heterogeneous (rather than homogeneous). We show that a very wide variety of networks can form and persist; in particular, efficient networks can form and persist if they provide every agent a strictly positive payoff. For the widely-studied connections model, we provide a full characterization of the set of efficient networks that can form and persist. Our predictions are consistent with empirical findings.

preprint2015arXiv

Efficient Interference Management Policies for Femtocell Networks

Managing interference in a network of macrocells underlaid with femtocells presents an important, yet challenging problem. A majority of spatial (frequency/time) reuse based approaches partition the users based on coloring the interference graph, which is shown to be suboptimal. Some spatial time reuse based approaches schedule the maximal independent sets (MISs) in a cyclic, (weighted) round-robin fashion, which is inefficient for delay-sensitive applications. Our proposed policies schedule the MISs in a non-cyclic fashion, which aim to optimize any given network performance criterion for delay-sensitive applications while fulfilling minimum throughput requirements of the users. Importantly, we do not take the interference graph as given as in existing works; we propose an optimal construction of the interference graph. We prove that under certain conditions, the proposed policy achieves the optimal network performance. For large networks, we propose a low-complexity algorithm for computing the proposed policy. We show that the policy computed achieves a constant competitive ratio (with respect to the optimal network performance), which is independent of the network size, under wide range of deployment scenarios. The policy can be implemented in a decentralized manner by the users. Compared to the existing policies, our proposed policies can achieve improvement of up to 130 % in large-scale deployments.

preprint2015arXiv

Evolution of Social Networks: A Microfounded Model

Many societies are organized in networks that are formed by people who meet and interact over time. In this paper, we present a first model to capture the micro-foundations of social networks evolution, where boundedly rational agents of different types join the network; meet other agents stochastically over time; and consequently decide to form social ties. A basic premise of our model is that in real-world networks, agents form links by reasoning about the benefits that agents they meet over time can bestow. We study the evolution of the emerging networks in terms of friendship and popularity acquisition given the following exogenous parameters: structural opportunism, type distribution, homophily, and social gregariousness. We show that the time needed for an agent to find "friends" is influenced by the exogenous parameters: agents who are more gregarious, more homophilic, less opportunistic, or belong to a type "minority" spend a longer time on average searching for friendships. Moreover, we show that preferential attachment is a consequence of an emerging doubly preferential meeting process: a process that guides agents of a certain type to meet more popular similar-type agents with a higher probability, thereby creating asymmetries in the popularity evolution of different types of agents.

preprint2015arXiv

From Acquaintances to Friends: Homophily and Learning in Networks

This paper considers the evolution of a network in a discrete time, stochastic setting in which agents learn about each other through repeated interactions and maintain/break links on the basis of what they learn from these interactions. Agents have homophilous preferences and limited capacity, so they maintain links with others who are learned to be similar to themselves and cut links to others who are learned to be dissimilar to themselves. Thus learning influences the evolution of the network, but learning is imperfect so the evolution is stochastic. Homophily matters. Higher levels of homophily decrease the (average) number of links that agents form. However, the effect of homophily is anomalous: mutually beneficial links may be dropped before learning is completed, thereby resulting in sparser networks and less clustering than under complete information. There may be big differences between the networks that emerge under complete and incomplete information. Homophily matters here as well: initially, greater levels of homophily increase the difference between the complete and incomplete information networks, but sufficiently high levels of homophily eventually decrease the difference. Complete and incomplete information networks differ the most when the degree of homophily is intermediate. With multiple stages of life, the effects of incomplete information are large initially but fade somewhat over time.

preprint2015arXiv

Information-Sharing over Adaptive Networks with Self-interested Agents

We examine the behavior of multi-agent networks where information-sharing is subject to a positive communications cost over the edges linking the agents. We consider a general mean-square-error formulation where all agents are interested in estimating the same target vector. We first show that, in the absence of any incentives to cooperate, the optimal strategy for the agents is to behave in a selfish manner with each agent seeking the optimal solution independently of the other agents. Pareto inefficiency arises as a result of the fact that agents are not using historical data to predict the behavior of their neighbors and to know whether they will reciprocate and participate in sharing information. Motivated by this observation, we develop a reputation protocol to summarize the opponent's past actions into a reputation score, which can then be used to form a belief about the opponent's subsequent actions. The reputation protocol entices agents to cooperate and turns their optimal strategy into an action-choosing strategy that enhances the overall social benefit of the network. In particular, we show that when the communications cost becomes large, the expected social benefit of the proposed protocol outperforms the social benefit that is obtained by cooperative agents that always share data. We perform a detailed mean-square-error analysis of the evolution of the network over three domains: far field, near-field, and middle-field, and show that the network behavior is stable for sufficiently small step-sizes. The various theoretical results are illustrated by numerical simulations.

preprint2015arXiv

Self-organizing Networks of Information Gathering Cognitive Agents

In many scenarios, networks emerge endogenously as cognitive agents establish links in order to exchange information. Network formation has been widely studied in economics, but only on the basis of simplistic models that assume that the value of each additional piece of information is constant. In this paper we present a first model and associated analysis for network formation under the much more realistic assumption that the value of each additional piece of information depends on the type of that piece of information and on the information already possessed: information may be complementary or redundant. We model the formation of a network as a non-cooperative game in which the actions are the formation of links and the benefit of forming a link is the value of the information exchanged minus the cost of forming the link. We characterize the topologies of the networks emerging at a Nash equilibrium (NE) of this game and compare the efficiency of equilibrium networks with the efficiency of centrally designed networks. To quantify the impact of information redundancy and linking cost on social information loss, we provide estimates for the Price of Anarchy (PoA); to quantify the impact on individual information loss we introduce and provide estimates for a measure we call Maximum Information Loss (MIL). Finally, we consider the setting in which agents are not endowed with information, but must produce it. We show that the validity of the well-known "law of the few" depends on how information aggregates; in particular, the "law of the few" fails when information displays complementarities.

preprint2015arXiv

The user base dynamics of websites

In this work we study for the first time the interaction between marketing and network effects. We build a model in which the online firm starts with an initial user base and controls the growth of the user base by choosing the intensity of advertisements and referrals to potential users. A large user base provides more profits to the online firm, but building a large user base through advertisements and referrals is costly; therefore, the optimal policy must balance the marginal benefits of adding users against the marginal costs of sending advertisements and referrals. Our work offers three main insights: (1) The optimal policy prescribes that a new online firm should offer many advertisements and referrals initially, but then it should decrease advertisements and referrals over time. (2) If the network effects decrease, then the change in the optimal policy depends heavily on two factors i) the level of patience of the online firm, where patient online firms are oriented towards long term profits and impatient online firms are oriented towards short term profits and, ii) the size of the user base. If the online firm is very patient (impatient) and if the network effects decrease, then the optimal policy prescribes it to be more (less) aggressive in posting advertisements and referrals at low user base levels and less (more) aggressive in posting advertisements and referrals at high user base levels. (3) The change in the optimal policy when network effects decrease also depends heavily on the heterogeneity in the user base, as measured in terms of the revenue generated by each user. An online firm that generates most of its revenue from a core group of users should be more aggressive and protective of its user base than a firm that generates revenue uniformly from its users.

preprint2014arXiv

A Dynamic Network Formation Model for Understanding Bacterial Self-Organization into Micro-Colonies

We propose a general parametrizable model to capture the dynamic interaction among bacteria in the formation of micro-colonies. micro-colonies represent the first social step towards the formation of structured multicellular communities known as bacterial biofilms, which protect the bacteria against antimicrobials. In our model, bacteria can form links in the form of intercellular adhesins (such as polysaccharides) to collaborate in the production of resources that are fundamental to protect them against antimicrobials. Since maintaining a link can be costly, we assume that each bacterium forms and maintains a link only if the benefit received from the link is larger than the cost, and we formalize the interaction among bacteria as a dynamic network formation game. We rigorously characterize some of the key properties of the network evolution depending on the parameters of the system. In particular, we derive the parameters under which it is guaranteed that all bacteria will join micro-colonies and the parameters under which it is guaranteed that some bacteria will not join micro-colonies. Importantly, our study does not only characterize the properties of networks emerging in equilibrium, but it also provides important insights on how the network dynamically evolves and on how the formation history impacts the emerging networks in equilibrium. This analysis can be used to develop methods to influence on- the-fly the evolution of the network, and such methods can be useful to treat or prevent biofilm-related diseases.

preprint2014arXiv

Adaptive Prioritized Random Linear Coding and Scheduling for Layered Data Delivery from Multiple Servers

In this paper, we deal with the problem of jointly determining the optimal coding strategy and the scheduling decisions when receivers obtain layered data from multiple servers. The layered data is encoded by means of Prioritized Random Linear Coding (PRLC) in order to be resilient to channel loss while respecting the unequal levels of importance in the data, and data blocks are transmitted simultaneously in order to reduce decoding delays and improve the delivery performance. We formulate the optimal coding and scheduling decisions problem in our novel framework with the help of Markov Decision Processes (MDP), which are effective tools for modeling adapting streaming systems. Reinforcement learning approaches are then proposed to derive reduced computational complexity solutions to the adaptive coding and scheduling problems. The novel reinforcement learning approaches and the MDP solution are examined in an illustrative example for scalable video transmission. Our methods offer large performance gains over competing methods that deliver the data blocks sequentially. The experimental evaluation also shows that our novel algorithms offer continuous playback and guarantee small quality variations which is not the case for baseline solutions. Finally, our work highlights the advantages of reinforcement learning algorithms to forecast the temporal evolution of data demands and to decide the optimal coding and scheduling decisions.

preprint2014arXiv

Distributed Online Learning in Social Recommender Systems

In this paper, we consider decentralized sequential decision making in distributed online recommender systems, where items are recommended to users based on their search query as well as their specific background including history of bought items, gender and age, all of which comprise the context information of the user. In contrast to centralized recommender systems, in which there is a single centralized seller who has access to the complete inventory of items as well as the complete record of sales and user information, in decentralized recommender systems each seller/learner only has access to the inventory of items and user information for its own products and not the products and user information of other sellers, but can get commission if it sells an item of another seller. Therefore the sellers must distributedly find out for an incoming user which items to recommend (from the set of own items or items of another seller), in order to maximize the revenue from own sales and commissions. We formulate this problem as a cooperative contextual bandit problem, analytically bound the performance of the sellers compared to the best recommendation strategy given the complete realization of user arrivals and the inventory of items, as well as the context-dependent purchase probabilities of each item, and verify our results via numerical examples on a distributed data set adapted based on Amazon data. We evaluate the dependence of the performance of a seller on the inventory of items the seller has, the number of connections it has with the other sellers, and the commissions which the seller gets by selling items of other sellers to its users.

preprint2014arXiv

Dynamic Network Formation with Incomplete Information

How do networks form and what is their ultimate topology? Most of the literature that addresses these questions assumes complete information: agents know in advance the value of linking to other agents, even with agents they have never met and with whom they have had no previous interaction (direct or indirect). This paper addresses the same questions under what seems to us to be the much more natural assumption of incomplete information: agents do not know in advance -- but must learn -- the value of linking to agents they have never met. We show that the assumption of incomplete information has profound implications for the process of network formation and the topology of networks that ultimately form. Under complete information, the networks that form and are stable typically have a star, wheel or core-periphery form, with high-value agents in the core. Under incomplete information, the presence of positive externalities (the value of indirect links) implies that a much wider collection of network topologies can emerge and be stable. Moreover, even when the topologies that emerge are the same, the locations of agents can be very different. For instance, when information is incomplete, it is possible for a hub-and-spokes network with a low-value agent in the center to form and endure permanently: an agent can achieve a central position purely as the result of chance rather than as the result of merit. Perhaps even more strikingly: when information is incomplete, a connected network could form and persist even if, when information were complete, no links would ever form, so that the final form would be a totally disconnected network. All of this can occur even in settings where agents eventually learn everything so that information, although initially incomplete, eventually becomes complete.

preprint2014arXiv

Energy-Efficient Nonstationary Spectrum Sharing

We develop a novel design framework for energy-efficient spectrum sharing among autonomous users who aim to minimize their energy consumptions subject to minimum throughput requirements. Most existing works proposed stationary spectrum sharing policies, in which users transmit at fixed power levels. Since users transmit simultaneously under stationary policies, to fulfill minimum throughput requirements, they need to transmit at high power levels to overcome interference. To improve energy efficiency, we construct nonstationary spectrum sharing policies, in which the users transmit at time-varying power levels. Specifically, we focus on TDMA (time-division multiple access) policies in which one user transmits at each time (but not in a round-robin fashion). The proposed policy can be implemented by each user running a low-complexity algorithm in a decentralized manner. It achieves high energy efficiency even when the users have erroneous and binary feedback about their interference levels. Moreover, it can adapt to the dynamic entry and exit of users. The proposed policy is also deviation-proof, namely autonomous users will find it in their self-interests to follow it. Compared to existing policies, the proposed policy can achieve an energy saving of up to 90% when the number of users is high.

preprint2014arXiv

eTutor: Online Learning for Personalized Education

Given recent advances in information technology and artificial intelligence, web-based education systems have became complementary and, in some cases, viable alternatives to traditional classroom teaching. The popularity of these systems stems from their ability to make education available to a large demographics (see MOOCs). However, existing systems do not take advantage of the personalization which becomes possible when web-based education is offered: they continue to be one-size-fits-all. In this paper, we aim to provide a first systematic method for designing a personalized web-based education system. Personalizing education is challenging: (i) students need to be provided personalized teaching and training depending on their contexts (e.g. classes already taken, methods of learning preferred, etc.), (ii) for each specific context, the best teaching and training method (e.g type and order of teaching materials to be shown) must be learned, (iii) teaching and training should be adapted online, based on the scores/feedback (e.g. tests, quizzes, final exam, likes/dislikes etc.) of the students. Our personalized online system, e-Tutor, is able to address these challenges by learning how to adapt the teaching methodology (in this case what sequence of teaching material to present to a student) to maximize her performance in the final exam, while minimizing the time spent by the students to learn the course (and possibly dropouts). We illustrate the efficiency of the proposed method on a real-world eTutor platform which is used for remedial training for a Digital Signal Processing (DSP) course.

preprint2014arXiv

Forecasting Popularity of Videos using Social Media

This paper presents a systematic online prediction method (Social-Forecast) that is capable to accurately forecast the popularity of videos promoted by social media. Social-Forecast explicitly considers the dynamically changing and evolving propagation patterns of videos in social media when making popularity forecasts, thereby being situation and context aware. Social-Forecast aims to maximize the forecast reward, which is defined as a tradeoff between the popularity prediction accuracy and the timeliness with which a prediction is issued. The forecasting is performed online and requires no training phase or a priori knowledge. We analytically bound the prediction performance loss of Social-Forecast as compared to that obtained by an omniscient oracle and prove that the bound is sublinear in the number of video arrivals, thereby guaranteeing its short-term performance as well as its asymptotic convergence to the optimal performance. In addition, we conduct extensive experiments using real-world data traces collected from the videos shared in RenRen, one of the largest online social networks in China. These experiments show that our proposed method outperforms existing view-based approaches for popularity prediction (which are not context-aware) by more than 30% in terms of prediction rewards.

preprint2014arXiv

Foresighted Demand Side Management

We consider a smart grid with an independent system operator (ISO), and distributed aggregators who have energy storage and purchase energy from the ISO to serve its customers. All the entities in the system are foresighted: each aggregator seeks to minimize its own long-term payments for energy purchase and operational costs of energy storage by deciding how much energy to buy from the ISO, and the ISO seeks to minimize the long-term total cost of the system (e.g. energy generation costs and the aggregators' costs) by dispatching the energy production among the generators. The decision making of the entities is complicated for two reasons. First, the information is decentralized: the ISO does not know the aggregators' states (i.e. their energy consumption requests from customers and the amount of energy in their storage), and each aggregator does not know the other aggregators' states or the ISO's state (i.e. the energy generation costs and the status of the transmission lines). Second, the coupling among the aggregators is unknown to them. Specifically, each aggregator's energy purchase affects the price, and hence the payments of the other aggregators. However, none of them knows how its decision influences the price because the price is determined by the ISO based on its state. We propose a design framework in which the ISO provides each aggregator with a conjectured future price, and each aggregator distributively minimizes its own long-term cost based on its conjectured price as well as its local information. The proposed framework can achieve the social optimum despite being decentralized and involving complex coupling among the various entities.

preprint2014arXiv

Global Bandits with Holder Continuity

Standard Multi-Armed Bandit (MAB) problems assume that the arms are independent. However, in many application scenarios, the information obtained by playing an arm provides information about the remainder of the arms. Hence, in such applications, this informativeness can and should be exploited to enable faster convergence to the optimal solution. In this paper, we introduce and formalize the Global MAB (GMAB), in which arms are globally informative through a global parameter, i.e., choosing an arm reveals information about all the arms. We propose a greedy policy for the GMAB which always selects the arm with the highest estimated expected reward, and prove that it achieves bounded parameter-dependent regret. Hence, this policy selects suboptimal arms only finitely many times, and after a finite number of initial time steps, the optimal arm is selected in all of the remaining time steps with probability one. In addition, we also study how the informativeness of the arms about each other's rewards affects the speed of learning. Specifically, we prove that the parameter-free (worst-case) regret is sublinear in time, and decreases with the informativeness of the arms. We also prove a sublinear in time Bayesian risk bound for the GMAB which reduces to the well-known Bayesian risk bound for linearly parameterized bandits when the arms are fully informative. GMABs have applications ranging from drug and treatment discovery to dynamic pricing.

preprint2014arXiv

Incentive Design in Peer Review: Rating and Repeated Endogenous Matching

Peer review (e.g., grading assignments in Massive Open Online Courses (MOOCs), academic paper review) is an effective and scalable method to evaluate the products (e.g., assignments, papers) of a large number of agents when the number of dedicated reviewing experts (e.g., teaching assistants, editors) is limited. Peer review poses two key challenges: 1) identifying the reviewers' intrinsic capabilities (i.e., adverse selection) and 2) incentivizing the reviewers to exert high effort (i.e., moral hazard). Some works in mechanism design address pure adverse selection using one-shot matching rules, and pure moral hazard was addressed in repeated games with exogenously given and fixed matching rules. However, in peer review systems exhibiting both adverse selection and moral hazard, one-shot or exogenous matching rules do not link agents' current behavior with future matches and future payoffs, and as we prove, will induce myopic behavior (i.e., exerting the lowest effort) resulting in the lowest review quality. In this paper, we propose for the first time a solution that simultaneously solves adverse selection and moral hazard. Our solution exploits the repeated interactions of agents, utilizes ratings to summarize agents' past review quality, and designs matching rules that endogenously depend on agents' ratings. Our proposed matching rules are easy to implement and require no knowledge about agents' private information (e.g., their benefit and cost functions). Yet, they are effective in guiding the system to an equilibrium where the agents are incentivized to exert high effort and receive ratings that precisely reflect their review quality. Using several illustrative examples, we quantify the significant performance gains obtained by our proposed mechanism as compared to existing one-shot or exogenous matching rules.

preprint2014arXiv

Jamming Bandits

Can an intelligent jammer learn and adapt to unknown environments in an electronic warfare-type scenario? In this paper, we answer this question in the positive, by developing a cognitive jammer that adaptively and optimally disrupts the communication between a victim transmitter-receiver pair. We formalize the problem using a novel multi-armed bandit framework where the jammer can choose various physical layer parameters such as the signaling scheme, power level and the on-off/pulsing duration in an attempt to obtain power efficient jamming strategies. We first present novel online learning algorithms to maximize the jamming efficacy against static transmitter-receiver pairs and prove that our learning algorithm converges to the optimal (in terms of the error rate inflicted at the victim and the energy used) jamming strategy. Even more importantly, we prove that the rate of convergence to the optimal jamming strategy is sub-linear, i.e. the learning is fast in comparison to existing reinforcement learning algorithms, which is particularly important in dynamically changing wireless environments. Also, we characterize the performance of the proposed bandit-based learning algorithm against multiple static and adaptive transmitter-receiver pairs.

preprint2014arXiv

Non-stationary Resource Allocation Policies for Delay-constrained Video Streaming: Application to Video over Internet-of-Things-enabled Networks

Due to the high bandwidth requirements and stringent delay constraints of multi-user wireless video transmission applications, ensuring that all video senders have sufficient transmission opportunities to use before their delay deadlines expire is a longstanding research problem. We propose a novel solution that addresses this problem without assuming detailed packet-level knowledge, which is unavailable at resource allocation time. Instead, we translate the transmission delay deadlines of each sender's video packets into a monotonically-decreasing weight distribution within the considered time horizon. Higher weights are assigned to the slots that have higher probability for deadline-abiding delivery. Given the sets of weights of the senders' video streams, we propose the low-complexity Delay-Aware Resource Allocation (DARA) approach to compute the optimal slot allocation policy that maximizes the deadline-abiding delivery of all senders. A unique characteristic of the DARA approach is that it yields a non-stationary slot allocation policy that depends on the allocation of previous slots. We prove that the DARA approach is optimal for weight distributions that are exponentially decreasing in time. We further implement our framework for real-time video streaming in wireless personal area networks that are gaining significant traction within the new Internet-of-Things (IoT) paradigm. For multiple surveillance videos encoded with H.264/AVC and streamed via the 6tisch framework that simulates the IoT-oriented IEEE 802.15.4e TSCH medium access control, our solution is shown to be the only one that ensures all video bitstreams are delivered with acceptable quality in a deadline-abiding manner.

preprint2014arXiv

To Relay or Not to Relay: Learning Device-to-Device Relaying Strategies in Cellular Networks

We consider a cellular network where mobile transceiver devices that are owned by self-interested users are incentivized to cooperate with each other using tokens, which they exchange electronically to "buy" and "sell" downlink relay services, thereby increasing the network's capacity compared to a network that only supports base station-to-device (B2D) communications. We investigate how an individual device in the network can learn its optimal cooperation policy online, which it uses to decide whether or not to provide downlink relay services for other devices in exchange for tokens. We propose a supervised learning algorithm that devices can deploy to learn their optimal cooperation strategies online given their experienced network environment. We then systematically evaluate the learning algorithm in various deployment scenarios. Our simulation results suggest that devices have the greatest incentive to cooperate when the network contains (i) many devices with high energy budgets for relaying, (ii) many highly mobile users (e.g., users in motor vehicles), and (iii) neither too few nor too many tokens. Additionally, within the token system, self-interested devices can effectively learn to cooperate online, and achieve over 20% higher throughput on average than with B2D communications alone, all while selfishly maximizing their own utilities.

preprint2014arXiv

Towards a Theory of Societal Co-Evolution: Individualism versus Collectivism

Substantial empirical research has shown that the level of individualism vs. collectivism is one of the most critical and important determinants of societal traits, such as economic growth, economic institutions and health conditions. But the exact nature of this impact has thus far not been well understood in an analytical setting. In this work, we develop one of the first theoretical models that analytically studies the impact of individualism-collectivism on the society. We model the growth of an individual's welfare (wealth, resources and health) as depending not only on himself, but also on the level of collectivism, i.e. the level of dependence on the rest of the individuals in the society, which leads to a co-evolutionary setting. Based on our model, we are able to predict the impact of individualism-collectivism on various societal metrics, such as average welfare, average life-time, total population, cumulative welfare and average inequality. We analytically show that individualism has a positive impact on average welfare and cumulative welfare, but comes with the drawbacks of lower average life-time, lower total population and higher average inequality.

preprint2013arXiv

Decentralized Online Big Data Classification - a Bandit Framework

Distributed, online data mining systems have emerged as a result of applications requiring analysis of large amounts of correlated and high-dimensional data produced by multiple distributed data sources. We propose a distributed online data classification framework where data is gathered by distributed data sources and processed by a heterogeneous set of distributed learners which learn online, at run-time, how to classify the different data streams either by using their locally available classification functions or by helping each other by classifying each other's data. Importantly, since the data is gathered at different locations, sending the data to another learner to process incurs additional costs such as delays, and hence this will be only beneficial if the benefits obtained from a better classification will exceed the costs. We assume that the classification functions available to each processing element are fixed, but their prediction accuracy for various types of incoming data are unknown and can change dynamically over time, and thus they need to be learned online. We model the problem of joint classification by the distributed and heterogeneous learners from multiple data sources as a distributed contextual bandit problem where each data is characterized by a specific context. We develop distributed online learning algorithms for which we can prove that they have sublinear regret. Compared to prior work in distributed online data mining, our work is the first to provide analytic regret results characterizing the performance of the proposed algorithms.

preprint2013arXiv

Demand Side Management in Smart Grids using a Repeated Game Framework

Demand side management (DSM) is a key solution for reducing the peak-time power consumption in smart grids. To provide incentives for consumers to shift their consumption to off-peak times, the utility company charges consumers differential pricing for using power at different times of the day. Consumers take into account these differential prices when deciding when and how much power to consume daily. Importantly, while consumers enjoy lower billing costs when shifting their power usage to off-peak times, they also incur discomfort costs due to the altering of their power consumption patterns. Existing works propose stationary strategies for the myopic consumers to minimize their short-term billing and discomfort costs. In contrast, we model the interaction emerging among self-interested, foresighted consumers as a repeated energy scheduling game and prove that the stationary strategies are suboptimal in terms of long-term total billing and discomfort costs. Subsequently, we propose a novel framework for determining optimal nonstationary DSM strategies, in which consumers can choose different daily power consumption patterns depending on their preferences, routines, and needs. As a direct consequence of the nonstationary DSM policy, different subsets of consumers are allowed to use power in peak times at a low price. The subset of consumers that are selected daily to have their joint discomfort and billing costs minimized is determined based on the consumers' power consumption preferences as well as on the past history of which consumers have shifted their usage previously. Importantly, we show that the proposed strategies are incentive-compatible. Simulations confirm that, given the same peak-to-average ratio, the proposed strategy can reduce the total cost (billing and discomfort costs) by up to 50% compared to existing DSM strategies.

preprint2013arXiv

Designing Efficient Resource Sharing For Impatient Players Using Limited Monitoring

The problem of efficient sharing of a resource is nearly ubiquitous. Except for pure public goods, each agent's use creates a negative externality; often the negative externality is so strong that efficient sharing is impossible in the short run. We show that, paradoxically, the impossibility of efficient sharing in the short run enhances the possibility of efficient sharing in the long run, even if outcomes depend stochastically on actions, monitoring is limited and users are not patient. We base our analysis on the familiar framework of repeated games with imperfect public monitoring, but we extend the framework to view the monitoring structure as chosen by a designer who balances the benefits and costs of more accurate observations and reports. Our conclusions are much stronger than in the usual folk theorems: we do not require a rich signal structure or patient users and provide an explicit online construction of equilibrium strategies.

preprint2013arXiv

Distributed Online Big Data Classification Using Context Information

Distributed, online data mining systems have emerged as a result of applications requiring analysis of large amounts of correlated and high-dimensional data produced by multiple distributed data sources. We propose a distributed online data classification framework where data is gathered by distributed data sources and processed by a heterogeneous set of distributed learners which learn online, at run-time, how to classify the different data streams either by using their locally available classification functions or by helping each other by classifying each other's data. Importantly, since the data is gathered at different locations, sending the data to another learner to process incurs additional costs such as delays, and hence this will be only beneficial if the benefits obtained from a better classification will exceed the costs. We model the problem of joint classification by the distributed and heterogeneous learners from multiple data sources as a distributed contextual bandit problem where each data is characterized by a specific context. We develop a distributed online learning algorithm for which we can prove sublinear regret. Compared to prior work in distributed online data mining, our work is the first to provide analytic regret results characterizing the performance of the proposed algorithm.

preprint2013arXiv

Ensemble of Distributed Learners for Online Classification of Dynamic Data Streams

We present an efficient distributed online learning scheme to classify data captured from distributed, heterogeneous, and dynamic data sources. Our scheme consists of multiple distributed local learners, that analyze different streams of data that are correlated to a common event that needs to be classified. Each learner uses a local classifier to make a local prediction. The local predictions are then collected by each learner and combined using a weighted majority rule to output the final prediction. We propose a novel online ensemble learning algorithm to update the aggregation rule in order to adapt to the underlying data dynamics. We rigorously determine a bound for the worst case misclassification probability of our algorithm which depends on the misclassification probabilities of the best static aggregation rule, and of the best local classifier. Importantly, the worst case misclassification probability of our algorithm tends asymptotically to 0 if the misclassification probability of the best static aggregation rule or the misclassification probability of the best local classifier tend to 0. Then we extend our algorithm to address challenges specific to the distributed implementation and we prove new bounds that apply to these settings. Finally, we test our scheme by performing an evaluation study on several data sets. When applied to data sets widely used by the literature dealing with dynamic data streams and concept drift, our scheme exhibits performance gains ranging from 34% to 71% with respect to state of the art solutions.

preprint2013arXiv

Incentive Design for Direct Load Control Programs

We study the problem of optimal incentive design for voluntary participation of electricity customers in a Direct Load Scheduling (DLS) program, a new form of Direct Load Control (DLC) based on a three way communication protocol between customers, embedded controls in flexible appliances, and the central entity in charge of the program. Participation decisions are made in real-time on an event-based basis, with every customer that needs to use a flexible appliance considering whether to join the program given current incentives. Customers have different interpretations of the level of risk associated with committing to pass over the control over the consumption schedule of their devices to an operator, and these risk levels are only privately known. The operator maximizes his expected profit of operating the DLS program by posting the right participation incentives for different appliance types, in a publicly available and dynamically updated table. Customers are then faced with the dynamic decision making problem of whether to take the incentives and participate or not. We define an optimization framework to determine the profit-maximizing incentives for the operator. In doing so, we also investigate the utility that the operator expects to gain from recruiting different types of devices. These utilities also provide an upper-bound on the benefits that can be attained from any type of demand response program.

preprint2013arXiv

Information Sharing in Networks of Strategic Agents

To ensure that social networks (e.g. opinion consensus, cooperative estimation, distributed learning and adaptation etc.) proliferate and efficiently operate, the participating agents need to collaborate with each other by repeatedly sharing information. However, sharing information is often costly for the agents while resulting in no direct immediate benefit for them. Hence, lacking incentives to collaborate, strategic agents who aim to maximize their own individual utilities will withhold rather than share information, leading to inefficient operation or even collapse of networks. In this paper, we develop a systematic framework for designing distributed rating protocols aimed at incentivizing the strategic agents to collaborate with each other by sharing information. The proposed incentive protocols exploit the ongoing nature of the agents' interactions to assign ratings and through them, determine future rewards and punishments: agents that have behaved as directed enjoy high ratings -- and hence greater future access to the information of others; agents that have not behaved as directed enjoy low ratings -- and hence less future access to the information of others. Unlike existing rating protocols, the proposed protocol operates in a distributed manner, online, and takes into consideration the underlying interconnectivity of agents as well as their heterogeneity. We prove that in many deployment scenarios the price of anarchy (PoA) obtained by adopting the proposed rating protocols is one. In settings in which the PoA is larger than one, we show that the proposed rating protocol still significantly outperforms existing incentive mechanisms such as Tit-for-Tat. Importantly, the proposed rating protocols can also operate efficiently in deployment scenarios where the strategic agents interact over time-varying network topologies where new agents join the network over time.

preprint2013arXiv

Optimal Foresighted Multi-User Wireless Video

Recent years have seen an explosion in wireless video communication systems. Optimization in such systems is crucial - but most existing methods intended to optimize the performance of multi-user wireless video transmission are inefficient. Some works (e.g. Network Utility Maximization (NUM)) are myopic: they choose actions to maximize instantaneous video quality while ignoring the future impact of these actions. Such myopic solutions are known to be inferior to foresighted solutions that optimize the long-term video quality. Alternatively, foresighted solutions such as rate-distortion optimized packet scheduling focus on single-user wireless video transmission, while ignoring the resource allocation among the users. In this paper, we propose an optimal solution for performing joint foresighted resource allocation and packet scheduling among multiple users transmitting video over a shared wireless network. A key challenge in developing foresighted solutions for multiple video users is that the users' decisions are coupled. To decouple the users' decisions, we adopt a novel dual decomposition approach, which differs from the conventional optimization solutions such as NUM, and determines foresighted policies. Specifically, we propose an informationally-decentralized algorithm in which the network manager updates resource "prices" (i.e. the dual variables associated with the resource constraints), and the users make individual video packet scheduling decisions based on these prices. Because a priori knowledge of the system dynamics is almost never available at run-time, the proposed solution can learn online, concurrently with performing the foresighted optimization. Simulation results show 7 dB and 3 dB improvements in Peak Signal-to-Noise Ratio (PSNR) over myopic solutions and existing foresighted solutions, respectively.

preprint2013arXiv

Socially-Optimal Design of Service Exchange Platforms with Imperfect Monitoring

In service exchange platforms, anonymous users exchange services with each other: clients request services and are matched to servers who provide services. Because providing good-quality services requires effort, in any single interaction a server will have no incentive to exert effort and will shirk. We show that if current servers will later become clients and want good-quality services, shirking can be eliminated by rating protocols, which maintain ratings for each user, prescribe behavior in each client-server interaction, and update ratings based on whether observed/reported behavior conforms with prescribed behavior. The rating protocols proposed are the first to achieve social optimum even when observation/reporting is imperfect (quality is incorrectly assessed/reported or reports are lost). The proposed protocols are remarkably simple, requiring only binary ratings and three possible prescribed behaviors. Key to the efficacy of the proposed protocols is that they are nonstationary, and tailor prescriptions to both current and past rating distributions.

preprint2012arXiv

Designing Information Revelation and Intervention with an Application to Flow Control

There are many familiar situations in which a manager seeks to design a system in which users share a resource, but outcomes depend on the information held and actions taken by users. If communication is possible, the manager can ask users to report their private information and then, using this information, instruct them on what actions they should take. If the users are compliant, this reduces the manager's optimization problem to a well-studied problem of optimal control. However, if the users are self-interested and not compliant, the problem is much more complicated: when asked to report their private information, the users might lie; upon receiving instructions, the users might disobey. Here we ask whether the manager can design the system to get around both of these difficulties. To do so, the manager must provide for the users the incentives to report truthfully and to follow the instructions, despite the fact that the users are self-interested. For a class of environments that includes many resource allocation games in communication networks, we provide tools for the manager to design an efficient system. In addition to reports and recommendations, the design we employ allows the manager to intervene in the system after the users take actions. In an abstracted environment, we find conditions under which the manager can achieve the same outcome it could if users were compliant, and conditions under which it does not. We then apply our framework and results to design a flow control management system.

preprint2012arXiv

Designing Practical Distributed Exchange for Online Communities

In many online systems, individuals provide services for each other; the recipient of the service obtains a benefit but the provider of the service incurs a cost. If benefit exceeds cost, provision of the service increases social welfare and should therefore be encouraged -- but the individuals providing the service gain no (immediate) benefit from providing the service and hence have an incentive to withhold service. Hence there is scope for designing a system that improves welfare by encouraging exchange. To operate successfully within the confines of the online environment, such a system should be distributed, practicable, and consistent with individual incentives. This paper proposes and analyzes a simple such system that relies on the exchange of {\em tokens}; the emphasis is on the design of a protocol (number of tokens and suggested strategies). We provide estimates for the efficiency of such protocols and show that choosing the right protocol will lead to almost full efficiency if agents are sufficiently patient. However, choosing the wrong protocols may lead to an enormous loss of efficiency.

preprint2012arXiv

Designing Rating Systems to Promote Mutual Security for Interconnected Networks

Interconnected autonomous systems often share security risks. However, an autonomous system lacks the incentive to make (sufficient) security investments if the cost exceeds its own benefit even though doing that would be socially beneficial. In this paper, we develop a systematic and rigorous framework for analyzing and significantly improving the mutual security of a collection of ASs that interact frequently over a long period of time. Using this framework, we show that simple incentive schemes based on rating systems can be designed to encourage the autonomous systems' security investments, thereby significantly improving their mutual security.

preprint2012arXiv

Dynamic Spectrum Sharing Among Repeatedly Interacting Selfish Users With Imperfect Monitoring

We develop a novel design framework for dynamic distributed spectrum sharing among secondary users (SUs) who adjust their power levels to compete for spectrum opportunities while satisfying the interference temperature (IT) constraints imposed by primary users. The considered interaction among the SUs is characterized by the following three features. First, since the SUs are decentralized, they are selfish and aim to maximize their own long-term payoffs from utilizing the network rather than obeying the prescribed allocation of a centralized controller. Second, the SUs interact with each other repeatedly and they can coexist in the system for a long time. Third, the SUs have limited and imperfect monitoring ability: they only observe whether the IT constraints are violated, and their observation is imperfect due to the erroneous measurements. To capture these features, we model the interaction of the SUs as a repeated game with imperfect monitoring. We first characterize the set of Pareto optimal payoffs that can be achieved by deviation-proof spectrum sharing policies, which are policies that the selfish users find it in their interest to comply with. Next, for any given payoff in this set, we show how to construct a deviation-proof policy to achieve it. The constructed deviation-proof policy is amenable to distributed implementation, and allows users to transmit in a time-division multiple-access (TDMA) fashion. In the presence of strong multi-user interference, our policy outperforms existing spectrum sharing policies that dictate users to transmit at constant power levels simultaneously. Moreover, our policy can achieve Pareto optimality even when the SUs have limited and imperfect monitoring ability, as opposed to existing solutions based on repeated games, which require perfect monitoring abilities.

preprint2012arXiv

Entry and Spectrum Sharing Scheme Selection in Femtocell Markets

Focusing on a femtocell communications market, we study the entrant network service provider's (NSP's) long-term decision: whether to enter the market and which spectrum sharing technology to select to maximize its profit. This long-term decision is closely related to the entrant's pricing strategy and the users' aggregate demand, which we model as medium-term and short-term decisions, respectively. We consider two markets, one with no incumbent and the other with one incumbent. For both markets, we show the existence and uniqueness of an equilibrium point in the user subscription dynamics, and provide a sufficient condition for the convergence of the dynamics. For the market with no incumbent, we derive upper and lower bounds on the optimal price and market share that maximize the entrant's revenue, based on which the entrant selects an available technology to maximize its long-term profit. For the market with one incumbent, we model competition between the two NSPs as a non-cooperative game, in which the incumbent and the entrant choose their market shares independently, and provide a sufficient condition that guarantees the existence of at least one pure Nash equilibrium. Finally, we formalize the problem of entry and spectrum sharing scheme selection for the entrant and provide numerical results to complement our analysis.

preprint2012arXiv

Markov Decision Process Based Energy-Efficient On-Line Scheduling for Slice-Parallel Video Decoders on Multicore Systems

We consider the problem of energy-efficient on-line scheduling for slice-parallel video decoders on multicore systems. We assume that each of the processors are Dynamic Voltage Frequency Scaling (DVFS) enabled such that they can independently trade off performance for power, while taking the video decoding workload into account. In the past, scheduling and DVFS policies in multi-core systems have been formulated heuristically due to the inherent complexity of the on-line multicore scheduling problem. The key contribution of this report is that we rigorously formulate the problem as a Markov decision process (MDP), which simultaneously takes into account the on-line scheduling and per-core DVFS capabilities; the power consumption of the processor cores and caches; and the loss tolerant and dynamic nature of the video decoder's traffic. In particular, we model the video traffic using a Direct Acyclic Graph (DAG) to capture the precedence constraints among frames in a Group of Pictures (GOP) structure, while also accounting for the fact that frames have different display/decoding deadlines and non-deterministic decoding complexities. The objective of the MDP is to minimize long-term power consumption subject to a minimum Quality of Service (QoS) constraint related to the decoder's throughput. Although MDPs notoriously suffer from the curse of dimensionality, we show that, with appropriate simplifications and approximations, the complexity of the MDP can be mitigated. We implement a slice-parallel version of H.264 on a multiprocessor ARM (MPARM) virtual platform simulator, which provides cycle-accurate and bus signal-accurate simulation for different processors. We use this platform to generate realistic video decoding traces with which we evaluate the proposed on-line scheduling algorithm in Matlab.

preprint2012arXiv

Pricing and Intervention in Slotted-Aloha: Technical Report

In many wireless communication networks a common channel is shared by multiple users who must compete to gain access to it. The operation of the network by self-interested and strategic users usually leads to the overuse of the channel resources and to substantial inefficiencies. Hence, incentive schemes are needed to overcome the inefficiencies of non-cooperative equilibrium. In this work we consider a slotted-Aloha like random access protocol and two incentive schemes: pricing and intervention. We provide some criteria for the designer of the protocol to choose one scheme between them and to design the best policy for the selected scheme, depending on the system parameters. Our results show that intervention can achieve the maximum efficiency in the perfect monitoring scenario. In the imperfect monitoring scenario, instead, the performance of the system depends on the information held by the different entities and, in some cases, there exists a threshold for the number of users such that, for a number of users lower than the threshold, intervention outperforms pricing, whereas, for a number of users higher than the threshold pricing outperforms intervention.

preprint2012arXiv

Social Norm Design for Information Exchange Systems with Limited Observations

Information exchange systems differ in many ways, but all share a common vulnerability to selfish behavior and free-riding. In this paper, we build incentives schemes based on social norms. Social norms prescribe a social strategy for the users in the system to follow and deploy reputation schemes to reward or penalize users depending on their behaviors. Because users in these systems often have only limited capability to observe the global system information, e.g. the reputation distribution of the users participating in the system, their beliefs about the reputation distribution are heterogeneous and biased. Such belief heterogeneity causes a positive fraction of users to not follow the social strategy. In such practical scenarios, the standard equilibrium analysis deployed in the economics literature is no longer directly applicable and hence, the system design needs to consider these differences. To investigate how the system designs need to change when the participating users have only limited observations, we focus on a simple social norm with binary reputation labels but allow adjusting the punishment severity through randomization. First, we model the belief heterogeneity using a suitable Bayesian belief function. Next, we formalize the users' optimal decision problems and derive in which scenarios they follow the prescribed social strategy. With this result, we then study the system dynamics and formally define equilibrium in the sense that the system is stable when users strategically optimize their decisions. By rigorously studying two specific cases where users' belief distribution is constant or is linearly influenced by the true reputation distribution, we prove that the optimal reputation update rule is to choose the mildest possible punishment. This result is further confirmed for higher order beliefs in simulations.

preprint2012arXiv

Technology Choices and Pricing Policies in Public and Private Wireless Networks

This paper studies the provision of a wireless network by a monopolistic provider who may be either benevolent (seeking to maximize social welfare) or selfish (seeking to maximize provider profit). The paper addresses questions that do not seem to have been studied before in the engineering literature on wireless networks: Under what circumstances is it feasible for a provider, either benevolent or selfish, to operate a network in such a way as to cover costs? How is the optimal behavior of a benevolent provider different from the optimal behavior of a selfish provider, and how does this difference affect social welfare? And, most importantly, how does the medium access control (MAC) technology influence the answers to these questions? To address these questions, we build a general model, and provide analysis and simulations for simplified but typical scenarios; the focus in these scenarios is on the contrast between the outcomes obtained under carrier-sensing multiple access (CSMA) and outcomes obtained under time-division multiple access (TDMA). Simulation results demonstrate that differences in MAC technology can have a significant effect on social welfare, on provider profit, and even on the (financial) feasibility of a wireless network.

preprint2012arXiv

User Subscription, Revenue Maximization, and Competition in Communications Markets

An updated version of this paper (but with a different title) can be found at arXiv:1204.4262

preprint2011arXiv

Business Mode Selection in Digital Content Markets

In this paper, we consider a two-sided digital content market, and study which of the two business modes, i.e., Business-to-Customer (B2C) and Customer-to-Customer (C2C), should be selected and when it should be selected. The considered market is managed by an intermediary, through which content producers can sell their contents to consumers. The intermediary can select B2C or C2C as its business mode, while the content producers and consumers are rational agents that maximize their own utilities. The content producers are differentiated by their content qualities. First, given the intermediary's business mode, we show that there always exists a unique equilibrium at which neither the content producers nor the consumers change their decisions. Moreover, if there are a sufficiently large number of consumers, then the decision process based on the content producers' naive expectation can reach the unique equilibrium. Next, we show that in a market with only one intermediary, C2C should be selected if the intermediary aims at maximizing its profit. Then, by considering a particular scenario where the contents are not highly substitutable, we prove that when the intermediary chooses to maximize the social welfare, C2C should be selected if the content producers can receive sufficient compensation for content sales, and B2C should be selected otherwise.

preprint2011arXiv

Intervention in Power Control Games With Selfish Users

We study the power control problem in wireless ad hoc networks with selfish users. Without incentive schemes, selfish users tend to transmit at their maximum power levels, causing significant interference to each other. In this paper, we study a class of incentive schemes based on intervention to induce selfish users to transmit at desired power levels. An intervention scheme can be implemented by introducing an intervention device that can monitor the power levels of users and then transmit power to cause interference to users. We mainly consider first-order intervention rules based on individual transmit powers. We derive conditions on design parameters and the intervention capability to achieve a desired outcome as a (unique) Nash equilibrium and propose a dynamic adjustment process that the designer can use to guide users and the intervention device to the desired outcome. The effect of using intervention rules based on aggregate receive power is also analyzed. Our results show that with perfect monitoring intervention schemes can be designed to achieve any positive power profile while using interference from the intervention device only as a threat. We also analyze the case of imperfect monitoring and show that a performance loss can occur. Lastly, simulation results are presented to illustrate the performance improvement from using intervention rules and compare the performances of different intervention rules.

preprint2011arXiv

Minimizing weighted sum download time for one-to-many file transfer in peer-to-peer networks

This paper considers the problem of transferring a file from one source node to multiple receivers in a peer-to-peer (P2P) network. The objective is to minimize the weighted sum download time (WSDT) for the one-to-many file transfer. Previous work has shown that, given an order at which the receivers finish downloading, the minimum WSD can be solved in polynomial time by convex optimization, and can be achieved by linear network coding, assuming that node uplinks are the only bottleneck in the network. This paper, however, considers heterogeneous peers with both uplink and downlink bandwidth constraints specified. The static scenario is a file-transfer scheme in which the network resource allocation remains static until all receivers finish downloading. This paper first shows that the static scenario may be optimized in polynomial time by convex optimization, and the associated optimal static WSD can be achieved by linear network coding. This paper then presented a lower bound to the minimum WSDT that is easily computed and turns out to be tight across a wide range of parameterizations of the problem. This paper also proposes a static routing-based scheme and a static rateless-coding-based scheme which have almost-optimal empirical performances. The dynamic scenario is a file-transfer scheme which can re-allocate the network resource during the file transfer. This paper proposes a dynamic rateless-coding-based scheme, which provides significantly smaller WSDT than the optimal static scenario does.

preprint2011arXiv

Peer-to-Peer Multimedia Sharing based on Social Norms

Empirical data shows that in the absence of incentives, a peer participating in a Peer-to-Peer (P2P) network wishes to free-riding. Most solutions for providing incentives in P2P networks are based on direct reciprocity, which are not appropriate for most P2P multimedia sharing networks due to the unique features exhibited by such networks: large populations of anonymous agents interacting infrequently, asymmetric interests of peers, network errors, and multiple concurrent transactions. In this paper, we design and rigorously analyze a new family of incentive protocols that utilizes indirect reciprocity which is based on the design of efficient social norms. In the proposed P2P protocols, the social norms consist of a social strategy, which represents the rule prescribing to the peers when they should or should not provide content to other peers, and a reputation scheme, which rewards or punishes peers depending on whether they comply or not with the social strategy. We first define the concept of a sustainable social norm, under which no peer has an incentive to deviate. We then formulate the problem of designing optimal social norms, which selects the social norm that maximizes the network performance among all sustainable social norms. Hence, we prove that it becomes in the self-interest of peers to contribute their content to the network rather than to free-ride. We also investigate the impact of various punishment schemes on the social welfare as well as how should the optimal social norms be designed if altruistic and malicious peers are active in the network. Our results show that optimal social norms are capable of providing significant improvements in the sharing efficiency of multimedia P2P networks.

preprint2011arXiv

Production and Network Formation Games with Content Heterogeneity

Online social networks (e.g. Facebook, Twitter, Youtube) provide a popular, cost-effective and scalable framework for sharing user-generated contents. This paper addresses the intrinsic incentive problems residing in social networks using a game-theoretic model where individual users selfishly trade off the costs of forming links (i.e. whom they interact with) and producing contents personally against the potential rewards from doing so. Departing from the assumption that contents produced by difference users is perfectly substitutable, we explicitly consider heterogeneity in user-generated contents and study how it influences users' behavior and the structure of social networks. Given content heterogeneity, we rigorously prove that when the population of a social network is sufficiently large, every (strict) non-cooperative equilibrium should consist of either a symmetric network topology where each user produces the same amount of content and has the same degree, or a two-level hierarchical topology with all users belonging to either of the two types: influencers who produce large amounts of contents and subscribers who produce small amounts of contents and get most of their contents from influencers. Meanwhile, the law of the few disappears in such networks. Moreover, we prove that the social optimum is always achieved by networks with symmetric topologies, where the sum of users' utilities is maximized. To provide users with incentives for producing and mutually sharing the socially optimal amount of contents, a pricing scheme is proposed, with which we show that the social optimum can be achieved as a non-cooperative equilibrium with the pricing of content acquisition and link formation.

preprint2011arXiv

Repeated Games With Intervention: Theory and Applications in Communications

In communication systems where users share common resources, users' selfish behavior usually results in suboptimal resource utilization. There have been extensive works that model communication systems with selfish users as one-shot games and propose incentive schemes to achieve Pareto optimal action profiles as non-cooperative equilibria. However, in many communication systems, due to strong negative externalities among users, the sets of feasible payoffs in one-shot games are nonconvex. Thus, it is possible to expand the set of feasible payoffs by having users choose convex combinations of different payoffs. In this paper, we propose a repeated game model generalized by intervention. First, we use repeated games to convexify the set of feasible payoffs in one-shot games. Second, we combine conventional repeated games with intervention, originally proposed for one-shot games, to achieve a larger set of equilibrium payoffs and loosen requirements for users' patience to achieve it. We study the problem of maximizing a welfare function defined on users' equilibrium payoffs, subject to minimum payoff guarantees. Given the optimal equilibrium payoff, we derive the minimum intervention capability required and design corresponding equilibrium strategies. The proposed generalized repeated game model applies to various communication systems, such as power control and flow control.

preprint2011arXiv

Reputation-based Incentive Protocols in Crowdsourcing Applications

Crowdsourcing websites (e.g. Yahoo! Answers, Amazon Mechanical Turk, and etc.) emerged in recent years that allow requesters from all around the world to post tasks and seek help from an equally global pool of workers. However, intrinsic incentive problems reside in crowdsourcing applications as workers and requester are selfish and aim to strategically maximize their own benefit. In this paper, we propose to provide incentives for workers to exert effort using a novel game-theoretic model based on repeated games. As there is always a gap in the social welfare between the non-cooperative equilibria emerging when workers pursue their self-interests and the desirable Pareto efficient outcome, we propose a novel class of incentive protocols based on social norms which integrates reputation mechanisms into the existing pricing schemes currently implemented on crowdsourcing websites, in order to improve the performance of the non-cooperative equilibria emerging in such applications. We first formulate the exchanges on a crowdsourcing website as a two-sided market where requesters and workers are matched and play gift-giving games repeatedly. Subsequently, we study the protocol designer's problem of finding an optimal and sustainable (equilibrium) protocol which achieves the highest social welfare for that website. We prove that the proposed incentives protocol can make the website operate close to Pareto efficiency. Moreover, we also examine an alternative scenario, where the protocol designer aims at maximizing the revenue of the website and evaluate the performance of the optimal protocol.

preprint2011arXiv

Robust Additively Coupled Games

We study the robust Nash equilibrium (RNE) for a class of games in communications systems and networks where the impact of users on each other is an additive function of their strategies. Each user measures this impact, which may be corrupted by uncertainty in feedback delays, estimation errors, movements of users, etc. To study the outcome of the game in which such uncertainties are encountered, we utilize the worst-case robust optimization theory. The existence and uniqueness conditions of RNE are derived using finite-dimensions variational inequalities. To describe the effect of uncertainty on the performance of the system, we use two criteria measured at the RNE and at the equilibrium of the game without uncertainty. The first is the difference between the respective social utility of users and, the second is the differences between the strategies of users at their respective equilibria. These differences are obtained for the case of a unique NE and multiple NEs. To reach the RNE, we propose a distributed algorithm based on the proximal response map and derive the conditions for its convergence. Simulations of the power control game in interference channels, and Jackson networks validate our analysis.

preprint2011arXiv

Robust Stackelberg game in communication systems

This paper studies multi-user communication systems with two groups of users: leaders which possess system information, and followers which have no system information using the formulation of Stackelberg games. In such games, the leaders play and choose their actions based on their information about the system and the followers choose their actions myopically according to their observations of the aggregate impact of other users. However, obtaining the exact value of these parameters is not practical in communication systems. To study the effect of uncertainty and preserve the players' utilities in these conditions, we introduce a robust equilibrium for Stackelberg games. In this framework, the leaders' information and the followers' observations are uncertain parameters, and the leaders and the followers choose their actions by solving the worst-case robust optimizations. We show that the followers' uncertain parameters always increase the leaders' utilities and decrease the followers' utilities. Conversely, the leaders' uncertain information reduces the leaders' utilities and increases the followers' utilities. We illustrate our theoretical results with the numerical results obtained based on the power control games in the interference channels.

preprint2011arXiv

Social Norms for Online Communities

Sustaining cooperation among self-interested agents is critical for the proliferation of emerging online social communities, such as online communities formed through social networking services. Providing incentives for cooperation in social communities is particularly challenging because of their unique features: a large population of anonymous agents interacting infrequently, having asymmetric interests, and dynamically joining and leaving the community; operation errors; and low-cost reputation whitewashing. In this paper, taking these features into consideration, we propose a framework for the design and analysis of a class of incentive schemes based on a social norm, which consists of a reputation scheme and a social strategy. We first define the concept of a sustainable social norm under which every agent has an incentive to follow the social strategy given the reputation scheme. We then formulate the problem of designing an optimal social norm, which selects a social norm that maximizes overall social welfare among all sustainable social norms. Using the proposed framework, we study the structure of optimal social norms and the impacts of punishment lengths and whitewashing on optimal social norms. Our results show that optimal social norms are capable of sustaining cooperation, with the amount of cooperation varying depending on the community characteristics.

preprint2011arXiv

Strategic Learning and Robust Protocol Design for Online Communities with Selfish Users

This paper focuses on analyzing the free-riding behavior of self-interested users in online communities. Hence, traditional optimization methods for communities composed of compliant users such as network utility maximization cannot be applied here. In our prior work, we show how social reciprocation protocols can be designed in online communities which have populations consisting of a continuum of users and are stationary under stochastic permutations. Under these assumptions, we are able to prove that users voluntarily comply with the pre-determined social norms and cooperate with other users in the community by providing their services. In this paper, we generalize the study by analyzing the interactions of self-interested users in online communities with finite populations and are not stationary. To optimize their long-term performance based on their knowledge, users adapt their strategies to play their best response by solving individual stochastic control problems. The best-response dynamic introduces a stochastic dynamic process in the community, in which the strategies of users evolve over time. We then investigate the long-term evolution of a community, and prove that the community will converge to stochastically stable equilibria which are stable against stochastic permutations. Understanding the evolution of a community provides protocol designers with guidelines for designing social norms in which no user has incentives to adapt its strategy and deviate from the prescribed protocol, thereby ensuring that the adopted protocol will enable the community to achieve the optimal social welfare.

preprint2011arXiv

The Theory of Intervention Games for Resource Sharing in Wireless Communications

This paper develops a game-theoretic framework for the design and analysis of a new class of incentive schemes called intervention schemes. We formulate intervention games, propose a solution concept of intervention equilibrium, and prove its existence in a finite intervention game. We apply our framework to resource sharing scenarios in wireless communications, whose non-cooperative outcomes without intervention yield suboptimal performance. We derive analytical results and analyze illustrative examples in the cases of imperfect and perfect monitoring. In the case of imperfect monitoring, intervention schemes can improve the suboptimal performance of non-cooperative equilibrium when the intervention device has a sufficiently accurate monitoring technology, although it may not be possible to achieve the best feasible performance. In the case of perfect monitoring, the best feasible performance can be obtained with an intervention scheme when the intervention device has a sufficiently strong intervention capability.

preprint2011arXiv

Transmitting important bits and sailing high radio waves: a decentralized cross-layer approach to cooperative video transmission

We investigate the impact of cooperative relaying on uplink and downlink multi-user (MU) wireless video transmissions. The objective is to maximize the long-term sum of utilities across the video terminals in a decentralized fashion, by jointly optimizing the packet scheduling, the resource allocation, and the cooperation decisions, under the assumption that some nodes are willing to act as cooperative relays. A pricing-based distributed resource allocation framework is adopted, where the price reflects the expected future congestion in the network. Specifically, we formulate the wireless video transmission problem as an MU Markov decision process (MDP) that explicitly considers the cooperation at the physical layer and the medium access control sublayer, the video users' heterogeneous traffic characteristics, the dynamically varying network conditions, and the coupling among the users' transmission strategies across time due to the shared wireless resource. Although MDPs notoriously suffer from the curse of dimensionality, our study shows that, with appropriate simplications and approximations, the complexity of the MU-MDP can be significantly mitigated. Our simulation results demonstrate that integrating cooperative decisions into the MU-MDP optimization can increase the resource price in networks that only support low transmission rates and can decrease the price in networks that support high transmission rates. Additionally, our results show that cooperation allows users with feeble direct signals to achieve improvements in video quality on the order of 5-10 dB peak signal-to-noise ratio (PSNR), with less than 0.8 dB quality loss by users with strong direct signals, and with a moderate increase in total network energy consumption that is significantly less than the energy that a distant node would require to achieve an equivalent PSNR without exploiting cooperative diversity.

preprint2010arXiv

A Game Theoretic Analysis of Incentives in Content Production and Sharing over Peer-to-Peer Networks

User-generated content can be distributed at a low cost using peer-to-peer (P2P) networks, but the free-rider problem hinders the utilization of P2P networks. In order to achieve an efficient use of P2P networks, we investigate fundamental issues on incentives in content production and sharing using game theory. We build a basic model to analyze non-cooperative outcomes without an incentive scheme and then use different game formulations derived from the basic model to examine five incentive schemes: cooperative, payment, repeated interaction, intervention, and enforced full sharing. The results of this paper show that 1) cooperative peers share all produced content while non-cooperative peers do not share at all without an incentive scheme; 2) a cooperative scheme allows peers to consume more content than non-cooperative outcomes do; 3) a cooperative outcome can be achieved among non-cooperative peers by introducing an incentive scheme based on payment, repeated interaction, or intervention; and 4) enforced full sharing has ambiguous welfare effects on peers. In addition to describing the solutions of different formulations, we discuss enforcement and informational requirements to implement each solution, aiming to offer a guideline for protocol designers when designing incentive schemes for P2P networks.

preprint2010arXiv

Cognitive MAC Protocols Using Memory for Distributed Spectrum Sharing Under Limited Spectrum Sensing

The main challenges of cognitive radio include spectrum sensing at the physical (PHY) layer to detect the activity of primary users and spectrum sharing at the medium access control (MAC) layer to coordinate access among coexisting secondary users. In this paper, we consider a cognitive radio network in which a primary user shares a channel with secondary users that cannot distinguish the signals of the primary user from those of a secondary user. We propose a class of distributed cognitive MAC protocols to achieve efficient spectrum sharing among the secondary users while protecting the primary user from potential interference by the secondary users. By using a MAC protocol with one-slot memory, we can obtain high channel utilization by the secondary users while limiting interference to the primary user at a low level. The results of this paper suggest the possibility of utilizing MAC design in cognitive radio networks to overcome limitations in spectrum sensing at the PHY layer as well as to achieve spectrum sharing at the MAC layer.

preprint2010arXiv

Designing Incentive Schemes Based on Intervention: The Case of Imperfect Monitoring

We propose an incentive scheme based on intervention to sustain cooperation among self-interested users. In the proposed scheme, an intervention device collects imperfect signals about the actions of the users for a test period, and then chooses the level of intervention that degrades the performance of the network for the remaining time period. We analyze the problems of designing an optimal intervention rule given a test period and choosing an optimal length of the test period. The intervention device can provide the incentive for cooperation by exerting intervention following signals that involve a high likelihood of deviation. Increasing the length of the test period has two counteracting effects on the performance: It improves the quality of signals, but at the same time it weakens the incentive for cooperation due to increased delay.

preprint2010arXiv

Designing Incentive Schemes Based on Intervention: The Case of Perfect Monitoring

This paper studies a class of incentive schemes based on intervention, where there exists an intervention device that is able to monitor the actions of users and to take an action that affects the payoffs of users. We consider the case of perfect monitoring, where the intervention device can immediately observe the actions of users without errors. We also assume that there exist actions of the intervention device that are most and least preferred by all the users and the intervention device, regardless of the actions of users. We derive analytical results about the outcomes achievable with intervention, and illustrate our results with an example based on the Cournot model.

preprint2010arXiv

Distributed Power Allocation in Multi-User Multi-Channel Relay Networks

This paper has been withdrawn by the authors as they feel it inappropriate to publish this paper for the time being.

preprint2010arXiv

Intervention Mechanism Design for Networks With Selfish Users

We consider a multi-user network where a network manager and selfish users interact. The network manager monitors the behavior of users and intervenes in the interaction among users if necessary, while users make decisions independently to optimize their individual objectives. In this paper, we develop a framework of intervention mechanism design, which is aimed to optimize the objective of the manager, or the network performance, taking the incentives of selfish users into account. Our framework is general enough to cover a wide range of application scenarios, and it has advantages over existing approaches such as Stackelberg strategies and pricing. To design an intervention mechanism and to predict the resulting operating point, we formulate a new class of games called intervention games and a new solution concept called intervention equilibrium. We provide analytic results about intervention equilibrium and optimal intervention mechanisms in the case of a benevolent manager with perfect monitoring. We illustrate these results with a random access model. Our illustrative example suggests that intervention requires less knowledge about users than pricing.

preprint2010arXiv

Medium Access Control Protocols With Memory

Many existing medium access control (MAC) protocols utilize past information (e.g., the results of transmission attempts) to adjust the transmission parameters of users. This paper provides a general framework to express and evaluate distributed MAC protocols utilizing a finite length of memory for a given form of feedback information. We define protocols with memory in the context of a slotted random access network with saturated arrivals. We introduce two performance metrics, throughput and average delay, and formulate the problem of finding an optimal protocol. We first show that a TDMA outcome, which is the best outcome in the considered scenario, can be obtained after a transient period by a protocol with (N-1)-slot memory, where N is the total number of users. Next, we analyze the performance of protocols with 1-slot memory using a Markov chain and numerical methods. Protocols with 1-slot memory can achieve throughput arbitrarily close to 1 (i.e., 100% channel utilization) at the expense of large average delay, by correlating successful users in two consecutive slots. Finally, we apply our framework to wireless local area networks.

preprint2010arXiv

Near-Optimal Deviation-Proof Medium Access Control Designs in Wireless Networks

Distributed medium access control (MAC) protocols are essential for the proliferation of low cost, decentralized wireless local area networks (WLANs). Most MAC protocols are designed with the presumption that nodes comply with prescribed rules. However, selfish nodes have natural motives to manipulate protocols in order to improve their own performance. This often degrades the performance of other nodes as well as that of the overall system. In this work, we propose a class of protocols that limit the performance gain which nodes can obtain through selfish manipulation while incurring only a small efficiency loss. The proposed protocols are based on the idea of a review strategy, with which nodes collect signals about the actions of other nodes over a period of time, use a statistical test to infer whether or not other nodes are following the prescribed protocol, and trigger a punishment if a departure from the protocol is perceived. We consider the cases of private and public signals and provide analytical and numerical results to demonstrate the properties of the proposed protocols.

preprint2010arXiv

Reinforcement Learning in BitTorrent Systems

Recent research efforts have shown that the popular BitTorrent protocol does not provide fair resource reciprocation and may allow free-riding. In this paper, we propose a BitTorrent-like protocol that replaces the peer selection mechanisms in the regular BitTorrent protocol with a novel reinforcement learning (RL) based mechanism. Due to the inherent opration of P2P systems, which involves repeated interactions among peers over a long period of time, the peers can efficiently identify free-riders as well as desirable collaborators by learning the behavior of their associated peers. Thus, it can help peers improve their download rates and discourage free-riding, while improving fairness in the system. We model the peers' interactions in the BitTorrent-like network as a repeated interaction game, where we explicitly consider the strategic behavior of the peers. A peer, which applies the RL-based mechanism, uses a partial history of the observations on associated peers' statistical reciprocal behaviors to determine its best responses and estimate the corresponding impact on its expected utility. The policy determines the peer's resource reciprocations with other peers, which would maximize the peer's long-term performance, thereby making foresighted decisions. We have implemented the proposed reinforcement-learning based mechanism and incorporated it into an existing BitTorrent client. We have performed extensive experiments on a controlled Planetlab test bed. Our results confirm that our proposed protocol (1) promotes fairness in terms of incentives to each peer's contribution e.g. high capacity peers improve their download completion time by up to 33\%, (2) improves the system stability and robustness e.g. reducing the peer selection luctuations by 57\%, and (3) discourages free-riding e.g. peers reduce by 64\% their upload to \FR, in comparison to the regular \BT~protocol.

preprint2010arXiv

Structural Solutions For Additively Coupled Sum Constrained Games

We propose and analyze a broad family of games played by resource-constrained players, which are characterized by the following central features: 1) each user has a multi-dimensional action space, subject to a single sum resource constraint; 2) each user's utility in a particular dimension depends on an additive coupling between the user's action in the same dimension and the actions of the other users; and 3) each user's total utility is the sum of the utilities obtained in each dimension. Familiar examples of such multi-user environments in communication systems include power control over frequency-selective Gaussian interference channels and flow control in Jackson networks. In settings where users cannot exchange messages in real-time, we study how users can adjust their actions based on their local observations. We derive sufficient conditions under which a unique Nash equilibrium exists and the best-response algorithm converges globally and linearly to the Nash equilibrium. In settings where users can exchange messages in real-time, we focus on user choices that optimize the overall utility. We provide the convergence conditions of two distributed action update mechanisms, gradient play and Jacobi update.

preprint2010arXiv

Structural Solutions to Dynamic Scheduling for Multimedia Transmission in Unknown Wireless Environments

In this paper, we propose a systematic solution to the problem of scheduling delay-sensitive media data for transmission over time-varying wireless channels. We first formulate the dynamic scheduling problem as a Markov decision process (MDP) that explicitly considers the users' heterogeneous multimedia data characteristics (e.g. delay deadlines, distortion impacts and dependencies etc.) and time-varying channel conditions, which are not simultaneously considered in state-of-the-art packet scheduling algorithms. This formulation allows us to perform foresighted decisions to schedule multiple data units for transmission at each time in order to optimize the long-term utilities of the multimedia applications. The heterogeneity of the media data enables us to express the transmission priorities between the different data units as a priority graph, which is a directed acyclic graph (DAG). This priority graph provides us with an elegant structure to decompose the multi-data unit foresighted decision at each time into multiple single-data unit foresighted decisions which can be performed sequentially, from the high priority data units to the low priority data units, thereby significantly reducing the computation complexity. When the statistical knowledge of the multimedia data characteristics and channel conditions is unknown a priori, we develop a low-complexity online learning algorithm to update the value functions which capture the impact of the current decision on the future utility. The simulation results show that the proposed solution significantly outperforms existing state-of-the-art scheduling solutions.

preprint2010arXiv

Structure-Aware Stochastic Control for Transmission Scheduling

In this paper, we consider the problem of real-time transmission scheduling over time-varying channels. We first formulate the transmission scheduling problem as a Markov decision process (MDP) and systematically unravel the structural properties (e.g. concavity in the state-value function and monotonicity in the optimal scheduling policy) exhibited by the optimal solutions. We then propose an online learning algorithm which preserves these structural properties and achieves -optimal solutions for an arbitrarily small . The advantages of the proposed online method are that: (i) it does not require a priori knowledge of the traffic arrival and channel statistics and (ii) it adaptively approximates the state-value functions using piece-wise linear functions and has low storage and computation complexity. We also extend the proposed low-complexity online learning solution to the prioritized data transmission. The simulation results demonstrate that the proposed method achieves significantly better utility (or delay)-energy trade-offs when comparing to existing state-of-art online optimization methods.

preprint2009arXiv

Online Reinforcement Learning for Dynamic Multimedia Systems

In our previous work, we proposed a systematic cross-layer framework for dynamic multimedia systems, which allows each layer to make autonomous and foresighted decisions that maximize the system's long-term performance, while meeting the application's real-time delay constraints. The proposed solution solved the cross-layer optimization offline, under the assumption that the multimedia system's probabilistic dynamics were known a priori. In practice, however, these dynamics are unknown a priori and therefore must be learned online. In this paper, we address this problem by allowing the multimedia system layers to learn, through repeated interactions with each other, to autonomously optimize the system's long-term performance at run-time. We propose two reinforcement learning algorithms for optimizing the system under different design constraints: the first algorithm solves the cross-layer optimization in a centralized manner, and the second solves it in a decentralized manner. We analyze both algorithms in terms of their required computation, memory, and inter-layer communication overheads. After noting that the proposed reinforcement learning algorithms learn too slowly, we introduce a complementary accelerated learning algorithm that exploits partial knowledge about the system's dynamics in order to dramatically improve the system's performance. In our experiments, we demonstrate that decentralized learning can perform as well as centralized learning, while enabling the layers to act autonomously. Additionally, we show that existing application-independent reinforcement learning algorithms, and existing myopic learning algorithms deployed in multimedia systems, perform significantly worse than our proposed application-aware and foresighted learning methods.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2605.04970:author:3:mihaela-van-der-schaar

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.13484:author:2:mihaela-van-der-schaar

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.01418:author:4:mihaela-van-der-schaar

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.09079:author:3:mihaela-van-der-schaar

Imported May 20, 2026Synced May 20, 2026

21 works

Ahmed M. Alaa

Researcher

Ahmed M. Alaa contributes to research discovery and scholarly infrastructure.

Open to collaborate

17 works

Yuanzhang Xiao

Researcher

Yuanzhang Xiao contributes to research discovery and scholarly infrastructure.

Open to collaborate

15 works

Jaeok Park

Researcher

Jaeok Park contributes to research discovery and scholarly infrastructure.

Open to collaborate

11 works

Cem Tekin

Researcher

Cem Tekin contributes to research discovery and scholarly infrastructure.

Open to collaborate

Mihaela van der Schaar

What is connected

Connect this record

See the researcher in context

Building this map preview

138 published item(s)

CauSim: Scaling Causal Reasoning with Increasingly Complex Causal Simulators

Discovery of Hidden Miscalibration Regimes

Skill Neologisms: Towards Skill-based Continual Learning

The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving

TimeTok: Granularity-Controllable Time-Series Generation via Hierarchical Tokenization

Composite Feature Selection using Deep Ensembles

Deep Generative Symbolic Regression

Synthcity: facilitating innovative use cases of synthetic data in different data modalities

Accounting for Unobserved Confounding in Domain Generalization

Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability

Combining Observational and Randomized Data for Estimating Heterogeneous Treatment Effects

Continuous-Time Modeling of Counterfactual Outcomes Using Neural Controlled Differential Equations

DAPDAG: Domain Adaptation via Perturbed DAG Reconstruction

Data-SUITE: Data-centric identification of in-distribution incongruous examples

How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating and Auditing Generative Models

HyperImpute: Generalized Iterative Imputation with Automatic Model Selection

Identifiable Energy-based Representations: An Application to Estimating Heterogeneous Causal Effects

Inferring Lexicographically-Ordered Rewards from Preferences

Inverse Contextual Bandits: Learning How Behavior Evolves over Time

Label-Free Explainability for Unsupervised Models

Neural graphical modelling in continuous-time: consistency guarantees and algorithms

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data

The Medkit-Learn(ing) Environment: Medical Decision Modelling through Simulation

A Variational Information Bottleneck Approach to Multi-Omics Data Integration

Conservative Policy Construction Using Variational Autoencoders for Logged Data with Missing Values

Estimating Structural Target Functions using Machine Learning and Influence Functions

Kernel Hypothesis Testing with Set-valued Data

Learning Matching Representations for Individualized Organ Transplantation Allocation

Model-Attentive Ensemble Learning for Sequence Modeling

Nonparametric Estimation of Heterogeneous Treatment Effects: From Theory to Learning Algorithms

Personalized Education in the AI Era: What to Expect Next?

Policy Analysis using Synthetic Controls in Continuous-Time

SDF-Bayes: Cautious Optimism in Safe Dose-Finding Clinical Trials with Drug Combinations and Heterogeneous Patient Groups

Selecting Treatment Effects Models for Domain Adaptation Using Causal Knowledge

Strictly Batch Imitation Learning by Energy-based Distribution Matching

A Non-Stationary Bandit-Learning Approach to Energy-Efficient Femto-Caching with Rateless-Coded Transmission

A primer on coupled state-switching models for multiple interacting time series

AutoCP: Automated Pipelines for Accurate Prediction Intervals

Contextual Constrained Learning for Dose-Finding Clinical Trials

CPAS: the UK's National Machine Learning-based Hospital Capacity Planning System for COVID-19

Discriminative Jackknife: Quantifying Uncertainty in Deep Learning via Higher-Order Influence Functions

Estimating Counterfactual Treatment Outcomes over Time Through Adversarially Balanced Representations

Frequentist Uncertainty in Recurrent Neural Networks via Blockwise Influence Functions

Hide-and-Seek Privacy Challenge

Inverse Active Sensing: Modeling and Understanding Timely Decision-Making

Learning Dynamic and Personalized Comorbidity Networks from Event Data using Deep Diffusion Processes

Learning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints

Learning Overlapping Representations for the Estimation of Individualized Treatment Effects

Stepwise Model Selection for Sequence Prediction via Deep Kernel Learning

Target-Embedding Autoencoders for Supervised Representation Learning

Temporal Phenotyping using Deep Predictive Clustering of Disease Progression

Time Series Deconfounder: Estimating Treatment Effects over Time in the Presence of Hidden Confounders

Unlabelled Data Improves Bayesian Uncertainty Calibration under Covariate Shift

When and How to Lift the Lockdown? Global COVID-19 Scenario Analysis and Policy Assessment using Compartmental Gaussian Processes

Distributed Task Management in Cyber-Physical Systems: How to Cooperate under Uncertainty?

A Hidden Absorbing Semi-Markov Model for Informatively Censored Temporal Data: Learning and Inference

A Non-stochastic Learning Approach to Energy Efficient Mobility Management

A Semi-Markov Switching Linear Gaussian Model for Censored Physiological Data

A Theory of Individualism, Collectivism and Economic Outcomes

Adaptive Ensemble Learning with Confidence Bounds

Balancing Suspense and Surprise: Timely Decision Making with Endogenous Information Acquisition

ConfidentCare: A Clinical Decision Support System for Personalized Breast Cancer Screening

Personalized Course Sequence Recommendations

Personalized Donor-Recipient Matching for Organ Transplantation

Personalized Risk Scoring for Critical Care Patients using Mixtures of Gaussian Process Experts

Personalized Risk Scoring for Critical Care Prognosis using Mixtures of Gaussian Processes

Predicting Grades

Reputational Learning and Network Dynamics

A Micro-foundation of Social Capital in Evolving Social Networks

Contextual Online Learning for Multimedia Content Aggregation

Distributed Interference Management Policies for Heterogeneous Small Cell Networks

Distributed Online Learning via Cooperative Contextual Bandits

Dynamic Network Formation with Foresighted Agents

Efficient Interference Management Policies for Femtocell Networks