Source author record

Karthikeyan Shanmugam

Karthikeyan Shanmugam appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Information Theory math.IT Artificial Intelligence Cryptography and Security Data Structures and Algorithms Distributed, Parallel, and Cluster Computing Social and Information Networks astro-ph.SR Computer Vision cs.CY Databases Discrete Mathematics Networking and Internet Architecture Systems and Control

Catalog footprint

What is connected

33works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Inferring Asteroseismic Parameters from Short Observations Using Deep Learning: Application to TESS and K2 Red Giants

Asteroseismology is the study of resonant oscillations of stars to infer their internal structure and dynamics. It is also a powerful tool for precisely determining stellar parameters such as mass, radius, surface gravity, and age. The ongoing TESS mission, with its nearly complete sky coverage, presents a unique opportunity to uniformly probe stellar populations across the Milky Way. TESS is estimated to have observed more than 300,000 oscillating red giants, most of which have one to two months of observations. Given the scale of this dataset, we need a fast, efficient, and robust way to analyse the data. In this work, our objective is to develop a machine learning (ML) based method to infer asteroseismic parameters from short-duration observations. Specifically, we focus on two global seismic parameters, the large frequency separation ($Δν$) and the frequency at maximum power ($ν_{\mathrm{max}}$), from one-month-long TESS observations of red giants. Meanwhile, for K2 data, our focus extends to inferring the period spacings of dipolar gravity modes ($ΔΠ_{1}$), in addition to $Δν$ and $ν_{\mathrm{max}}$. Our findings demonstrate that our machine learning algorithm can accurately infer $Δν$ and $ν_{\mathrm{max}}$ for approximately 50% of samples created by taking one-month Kepler and K2 observations. For TESS one sector data however, we recover reliable $Δν$ for only about 23% of the stars. Additionally, we get reliable $ΔΠ_{1}$ inferences for about 200 young red-giants from K2. For these $ΔΠ_{1}$ inferences, we see a good match with the well known $Δν-ΔΠ_{1}$ degenerate sequence observed in Kepler red-giants.

preprint2024arXiv

Fairness under Covariate Shift: Improving Fairness-Accuracy tradeoff with few Unlabeled Test Samples

Covariate shift in the test data is a common practical phenomena that can significantly downgrade both the accuracy and the fairness performance of the model. Ensuring fairness across different sensitive groups under covariate shift is of paramount importance due to societal implications like criminal justice. We operate in the unsupervised regime where only a small set of unlabeled test samples along with a labeled training set is available. Towards improving fairness under this highly challenging yet realistic scenario, we make three contributions. First is a novel composite weighted entropy based objective for prediction accuracy which is optimized along with a representation matching loss for fairness. We experimentally verify that optimizing with our loss formulation outperforms a number of state-of-the-art baselines in the pareto sense with respect to the fairness-accuracy tradeoff on several standard datasets. Our second contribution is a new setting we term Asymmetric Covariate Shift that, to the best of our knowledge, has not been studied before. Asymmetric covariate shift occurs when distribution of covariates of one group shifts significantly compared to the other groups and this happens when a dominant group is over-represented. While this setting is extremely challenging for current baselines, We show that our proposed method significantly outperforms them. Our third contribution is theoretical, where we show that our weighted entropy term along with prediction loss on the training set approximates test loss under covariate shift. Empirically and through formal sample complexity bounds, we show that this approximation to the unseen test loss does not depend on importance sampling variance which affects many other baselines.

preprint2024arXiv

Selective classification using a robust meta-learning approach

Predictive uncertainty-a model's self awareness regarding its accuracy on an input-is key for both building robust models via training interventions and for test-time applications such as selective classification. We propose a novel instance-conditioned reweighting approach that captures predictive uncertainty using an auxiliary network and unifies these train- and test-time applications. The auxiliary network is trained using a meta-objective in a bilevel optimization framework. A key contribution of our proposal is the meta-objective of minimizing the dropout variance, an approximation of Bayesian Predictive uncertainty. We show in controlled experiments that we effectively capture the diverse specific notions of uncertainty through this meta-objective, while previous approaches only capture certain aspects. These results translate to significant gains in real-world settings-selective classification, label noise, domain adaptation, calibration-and across datasets-Imagenet, Cifar100, diabetic retinopathy, Camelyon, WILDs, Imagenet-C,-A,-R, Clothing1M, etc. For Diabetic Retinopathy, we see upto 3.4%/3.3% accuracy and AUC gains over SOTA in selective classification. We also improve upon large-scale pretrained models such as PLEX.

preprint2022arXiv

Auto-Transfer: Learning to Route Transferrable Representations

Knowledge transfer between heterogeneous source and target networks and tasks has received a lot of attention in recent times as large amounts of quality labeled data can be difficult to obtain in many applications. Existing approaches typically constrain the target deep neural network (DNN) feature representations to be close to the source DNNs feature representations, which can be limiting. We, in this paper, propose a novel adversarial multi-armed bandit approach that automatically learns to route source representations to appropriate target representations following which they are combined in meaningful ways to produce accurate target models. We see upwards of 5\% accuracy improvements compared with the state-of-the-art knowledge transfer methods on four benchmark (target) image datasets CUB200, Stanford Dogs, MIT67, and Stanford40 where the source dataset is ImageNet. We qualitatively analyze the goodness of our transfer scheme by showing individual examples of the important features focused on by our target network at different layers compared with the (closest) competitors. We also observe that our improvement over other methods is higher for smaller target datasets making it an effective tool for small data applications that may benefit from transfer learning.

preprint2022arXiv

Causal Feature Selection for Algorithmic Fairness

The use of machine learning (ML) in high-stakes societal decisions has encouraged the consideration of fairness throughout the ML lifecycle. Although data integration is one of the primary steps to generate high quality training data, most of the fairness literature ignores this stage. In this work, we consider fairness in the integration component of data management, aiming to identify features that improve prediction without adding any bias to the dataset. We work under the causal interventional fairness paradigm. Without requiring the underlying structural causal model a priori, we propose an approach to identify a sub-collection of features that ensure the fairness of the dataset by performing conditional independence tests between different subsets of features. We use group testing to improve the complexity of the approach. We theoretically prove the correctness of the proposed algorithm to identify features that ensure interventional fairness and show that sub-linear conditional independence tests are sufficient to identify these variables. A detailed empirical evaluation is performed on real-world datasets to demonstrate the efficacy and efficiency of our technique.

preprint2022arXiv

Empirical or Invariant Risk Minimization? A Sample Complexity Perspective

Recently, invariant risk minimization (IRM) was proposed as a promising solution to address out-of-distribution (OOD) generalization. However, it is unclear when IRM should be preferred over the widely-employed empirical risk minimization (ERM) framework. In this work, we analyze both these frameworks from the perspective of sample complexity, thus taking a firm step towards answering this important question. We find that depending on the type of data generation mechanism, the two approaches might have very different finite sample and asymptotic behavior. For example, in the covariate shift setting we see that the two approaches not only arrive at the same asymptotic solution, but also have similar finite sample behavior with no clear winner. For other distribution shifts such as those involving confounders or anti-causal variables, however, the two approaches arrive at different asymptotic solutions where IRM is guaranteed to be close to the desired OOD solutions in the finite sample regime, while ERM is biased even asymptotically. We further investigate how different factors -- the number of environments, complexity of the model, and IRM penalty weight -- impact the sample complexity of IRM in relation to its distance from the OOD solutions

preprint2022arXiv

Finding Valid Adjustments under Non-ignorability with Minimal DAG Knowledge

Treatment effect estimation from observational data is a fundamental problem in causal inference. There are two very different schools of thought that have tackled this problem. On one hand, Pearlian framework commonly assumes structural knowledge (provided by an expert) in form of directed acyclic graphs and provides graphical criteria such as back-door criterion to identify valid adjustment sets. On other hand, potential outcomes (PO) framework commonly assumes that all observed features satisfy ignorability (i.e., no hidden confounding), which in general is untestable. In prior works that attempted to bridge these frameworks, there is an observational criteria to identify an anchor variable and if a subset of covariates (not involving the anchor variable) passes a suitable conditional independence criteria, then that subset is a valid back-door. Our main result strengthens these prior results by showing that under a different expert-driven structural knowledge -- that one variable is a direct causal parent of treatment variable -- remarkably, testing for subsets (not involving the known parent variable) that are valid back-doors is equivalent to an invariance test. Importantly, we also cover the non-trivial case where entire set of observed features is not ignorable (generalizing the PO framework) without requiring knowledge of all parents of treatment variable. Our key technical idea involves generation of a synthetic sub-sampling (or environment) variable that is a function of the known parent variable. In addition to designing an invariance test, this sub-sampling variable allows us to leverage Invariant Risk Minimization, and thus, connects finding valid adjustments (in non-ignorable observational setting) to representation learning. We demonstrate effectiveness and tradeoffs of our approaches on a variety of synthetic data as well as real causal effect estimation benchmarks.

preprint2022arXiv

Fourier Representations for Black-Box Optimization over Categorical Variables

Optimization of real-world black-box functions defined over purely categorical variables is an active area of research. In particular, optimization and design of biological sequences with specific functional or structural properties have a profound impact in medicine, materials science, and biotechnology. Standalone search algorithms, such as simulated annealing (SA) and Monte Carlo tree search (MCTS), are typically used for such optimization problems. In order to improve the performance and sample efficiency of such algorithms, we propose to use existing methods in conjunction with a surrogate model for the black-box evaluations over purely categorical variables. To this end, we present two different representations, a group-theoretic Fourier expansion and an abridged one-hot encoded Boolean Fourier expansion. To learn such representations, we consider two different settings to update our surrogate model. First, we utilize an adversarial online regression setting where Fourier characters of each representation are considered as experts and their respective coefficients are updated via an exponential weight update rule each time the black box is evaluated. Second, we consider a Bayesian setting where queries are selected via Thompson sampling and the posterior is updated via a sparse Bayesian regression model (over our proposed representation) with a regularized horseshoe prior. Numerical experiments over synthetic benchmarks as well as real-world RNA sequence optimization and design problems demonstrate the representational power of the proposed methods, which achieve competitive or superior performance compared to state-of-the-art counterparts, while improving the computation cost and/or sample efficiency, substantially.

preprint2022arXiv

PAC Generalization via Invariant Representations

One method for obtaining generalizable solutions to machine learning tasks when presented with diverse training environments is to find \textit{invariant representations} of the data. These are representations of the covariates such that the best model on top of the representation is invariant across training environments. In the context of linear Structural Equation Models (SEMs), invariant representations might allow us to learn models with out-of-distribution guarantees, i.e., models that are robust to interventions in the SEM. To address the invariant representation problem in a {\em finite sample} setting, we consider the notion of $ε$-approximate invariance. We study the following question: If a representation is approximately invariant with respect to a given number of training interventions, will it continue to be approximately invariant on a larger collection of unseen SEMs? This larger collection of SEMs is generated through a parameterized family of interventions. Inspired by PAC learning, we obtain finite-sample out-of-distribution generalization guarantees for approximate invariance that holds \textit{probabilistically} over a family of linear SEMs without faithfulness assumptions. Our results show bounds that do not scale in ambient dimension when intervention sites are restricted to lie in a constant size subset of in-degree bounded nodes. We also show how to extend our results to a linear indirect observation model that incorporates latent variables.

preprint2021arXiv

AI Explainability 360: Impact and Design

As artificial intelligence and machine learning algorithms become increasingly prevalent in society, multiple stakeholders are calling for these algorithms to provide explanations. At the same time, these stakeholders, whether they be affected citizens, government regulators, domain experts, or system developers, have different explanation needs. To address these needs, in 2019, we created AI Explainability 360 (Arya et al. 2020), an open source software toolkit featuring ten diverse and state-of-the-art explainability methods and two evaluation metrics. This paper examines the impact of the toolkit with several case studies, statistics, and community feedback. The different ways in which users have experienced AI Explainability 360 have resulted in multiple types of impact and improvements in multiple metrics, highlighted by the adoption of the toolkit by the independent LF AI & Data Foundation. The paper also describes the flexible design of the toolkit, examples of its use, and the significant educational material and documentation available to its users.

preprint2021arXiv

Contextual Bandits with Stochastic Experts

We consider the problem of contextual bandits with stochastic experts, which is a variation of the traditional stochastic contextual bandit with experts problem. In our problem setting, we assume access to a class of stochastic experts, where each expert is a conditional distribution over the arms given a context. We propose upper-confidence bound (UCB) algorithms for this problem, which employ two different importance sampling based estimators for the mean reward for each expert. Both these estimators leverage information leakage among the experts, thus using samples collected under all the experts to estimate the mean reward of any given expert. This leads to instance dependent regret bounds of $\mathcal{O}\left(λ(\pmbμ)\mathcal{M}\log T/Δ\right)$, where $λ(\pmbμ)$ is a term that depends on the mean rewards of the experts, $Δ$ is the smallest gap between the mean reward of the optimal expert and the rest, and $\mathcal{M}$ quantifies the information leakage among the experts. We show that under some assumptions $λ(\pmbμ)$ is typically $\mathcal{O}(\log N)$, where $N$ is the number of experts. We implement our algorithm with stochastic experts generated from cost-sensitive classification oracles and show superior empirical performance on real-world datasets, when compared to other state of the art contextual bandit algorithms.

preprint2021arXiv

Efficient Encrypted Inference on Ensembles of Decision Trees

Data privacy concerns often prevent the use of cloud-based machine learning services for sensitive personal data. While homomorphic encryption (HE) offers a potential solution by enabling computations on encrypted data, the challenge is to obtain accurate machine learning models that work within the multiplicative depth constraints of a leveled HE scheme. Existing approaches for encrypted inference either make ad-hoc simplifications to a pre-trained model (e.g., replace hard comparisons in a decision tree with soft comparators) at the cost of accuracy or directly train a new depth-constrained model using the original training set. In this work, we propose a framework to transfer knowledge extracted by complex decision tree ensembles to shallow neural networks (referred to as DTNets) that are highly conducive to encrypted inference. Our approach minimizes the accuracy loss by searching for the best DTNet architecture that operates within the given depth constraints and training this DTNet using only synthetic data sampled from the training data distribution. Extensive experiments on real-world datasets demonstrate that these characteristics are critical in ensuring that DTNet accuracy approaches that of the original tree ensemble. Our system is highly scalable and can perform efficient inference on batched encrypted (134 bits of security) data with amortized time in milliseconds. This is approximately three orders of magnitude faster than the standard approach of applying soft comparison at the internal nodes of the ensemble trees.

preprint2021arXiv

Stochastic Linear Bandits with Protected Subspace

We study a variant of the stochastic linear bandit problem wherein we optimize a linear objective function but rewards are accrued only orthogonal to an unknown subspace (which we interpret as a \textit{protected space}) given only zero-order stochastic oracle access to both the objective itself and protected subspace. In particular, at each round, the learner must choose whether to query the objective or the protected subspace alongside choosing an action. Our algorithm, derived from the OFUL principle, uses some of the queries to get an estimate of the protected space, and (in almost all rounds) plays optimistically with respect to a confidence set for this space. We provide a $\tilde{O}(sd\sqrt{T})$ regret upper bound in the case where the action space is the complete unit ball in $\mathbb{R}^d$, $s < d$ is the dimension of the protected subspace, and $T$ is the time horizon. Moreover, we demonstrate that a discrete action space can lead to linear regret with an optimistic algorithm, reinforcing the sub-optimality of optimism in certain settings. We also show that protection constraints imply that for certain settings, no consistent algorithm can have a regret smaller than $Ω(T^{3/4}).$ We finally empirically validate our results with synthetic and real datasets.

preprint2020arXiv

A Multi-Channel Neural Graphical Event Model with Negative Evidence

Event datasets are sequences of events of various types occurring irregularly over the time-line, and they are increasingly prevalent in numerous domains. Existing work for modeling events using conditional intensities rely on either using some underlying parametric form to capture historical dependencies, or on non-parametric models that focus primarily on tasks such as prediction. We propose a non-parametric deep neural network approach in order to estimate the underlying intensity functions. We use a novel multi-channel RNN that optimally reinforces the negative evidence of no observable events with the introduction of fake event epochs within each consecutive inter-event interval. We evaluate our method against state-of-the-art baselines on model fitting tasks as gauged by log-likelihood. Through experiments on both synthetic and real-world datasets, we find that our proposed approach outperforms existing baselines on most of the datasets studied.

preprint2020arXiv

Differentially Private Distributed Data Summarization under Covariate Shift

We envision AI marketplaces to be platforms where consumers, with very less data for a target task, can obtain a relevant model by accessing many private data sources with vast number of data samples. One of the key challenges is to construct a training dataset that matches a target task without compromising on privacy of the data sources. To this end, we consider the following distributed data summarizataion problem. Given K private source datasets denoted by $[D_i]_{i\in [K]}$ and a small target validation set $D_v$, which may involve a considerable covariate shift with respect to the sources, compute a summary dataset $D_s\subseteq \bigcup_{i\in [K]} D_i$ such that its statistical distance from the validation dataset $D_v$ is minimized. We use the popular Maximum Mean Discrepancy as the measure of statistical distance. The non-private problem has received considerable attention in prior art, for example in prototype selection (Kim et al., NIPS 2016). Our work is the first to obtain strong differential privacy guarantees while ensuring the quality guarantees of the non-private version. We study this problem in a Parsimonious Curator Privacy Model, where a trusted curator coordinates the summarization process while minimizing the amount of private information accessed. Our central result is a novel protocol that (a) ensures the curator accesses at most $O(K^{\frac{1}{3}}|D_s| + |D_v|)$ points (b) has formal privacy guarantees on the leakage of information between the data owners and (c) closely matches the best known non-private greedy algorithm. Our protocol uses two hash functions, one inspired by the Rahimi-Recht random features method and the second leverages state of the art differential privacy mechanisms. We introduce a novel "noiseless" differentially private auctioning protocol for winner notification and demonstrate the efficacy of our protocol using real-world datasets.

preprint2020arXiv

Enhancing Simple Models by Exploiting What They Already Know

There has been recent interest in improving performance of simple models for multiple reasons such as interpretability, robust learning from small data, deployment in memory constrained settings as well as environmental considerations. In this paper, we propose a novel method SRatio that can utilize information from high performing complex models (viz. deep neural networks, boosted trees, random forests) to reweight a training dataset for a potentially low performing simple model of much lower complexity such as a decision tree or a shallow network enhancing its performance. Our method also leverages the per sample hardness estimate of the simple model which is not the case with the prior works which primarily consider the complex model's confidences/predictions and is thus conceptually novel. Moreover, we generalize and formalize the concept of attaching probes to intermediate layers of a neural network to other commonly used classifiers and incorporate this into our method. The benefit of these contributions is witnessed in the experiments where on 6 UCI datasets and CIFAR-10 we outperform competitors in a majority (16 out of 27) of the cases and tie for best performance in the remaining cases. In fact, in a couple of cases, we even approach the complex model's performance. We also conduct further experiments to validate assertions and intuitively understand why our method works. Theoretically, we motivate our approach by showing that the weighted loss minimized by simple models using our weighting upper bounds the loss of the complex model.

preprint2020arXiv

Invariant Risk Minimization Games

The standard risk minimization paradigm of machine learning is brittle when operating in environments whose test distributions are different from the training distribution due to spurious correlations. Training on data from many environments and finding invariant predictors reduces the effect of spurious features by concentrating models on features that have a causal relationship with the outcome. In this work, we pose such invariant risk minimization as finding the Nash equilibrium of an ensemble game among several environments. By doing so, we develop a simple training algorithm that uses best response dynamics and, in our experiments, yields similar or better empirical accuracy with much lower variance than the challenging bi-level optimization problem of Arjovsky et al. (2019). One key theoretical contribution is showing that the set of Nash equilibria for the proposed game are equivalent to the set of invariant predictors for any finite number of environments, even with nonlinear classifiers and transformations. As a result, our method also retains the generalization guarantees to a large set of environments shown in Arjovsky et al. (2019). The proposed algorithm adds to the collection of successful game-theoretic machine learning algorithms such as generative adversarial networks.

preprint2020arXiv

Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions

We consider a covariate shift problem where one has access to several different training datasets for the same learning problem and a small validation set which possibly differs from all the individual training distributions. This covariate shift is caused, in part, due to unobserved features in the datasets. The objective, then, is to find the best mixture distribution over the training datasets (with only observed features) such that training a learning algorithm using this mixture has the best validation performance. Our proposed algorithm, ${\sf Mix\&Match}$, combines stochastic gradient descent (SGD) with optimistic tree search and model re-use (evolving partially trained models with samples from different mixture distributions) over the space of mixtures, for this task. We prove simple regret guarantees for our algorithm with respect to recovering the optimal mixture, given a total budget of SGD evaluations. Finally, we validate our algorithm on two real-world datasets.

preprint2016arXiv

Contextual Bandits with Latent Confounders: An NMF Approach

Motivated by online recommendation and advertising systems, we consider a causal model for stochastic contextual bandits with a latent low-dimensional confounder. In our model, there are $L$ observed contexts and $K$ arms of the bandit. The observed context influences the reward obtained through a latent confounder variable with cardinality $m$ ($m \ll L,K$). The arm choice and the latent confounder causally determines the reward while the observed context is correlated with the confounder. Under this model, the $L \times K$ mean reward matrix $\mathbf{U}$ (for each context in $[L]$ and each arm in $[K]$) factorizes into non-negative factors $\mathbf{A}$ ($L \times m$) and $\mathbf{W}$ ($m \times K$). This insight enables us to propose an $ε$-greedy NMF-Bandit algorithm that designs a sequence of interventions (selecting specific arms), that achieves a balance between learning this low-dimensional structure and selecting the best arm to minimize regret. Our algorithm achieves a regret of $\mathcal{O}\left(L\mathrm{poly}(m, \log K) \log T \right)$ at time $T$, as compared to $\mathcal{O}(LK\log T)$ for conventional contextual bandits, assuming a constant gap between the best arm and the rest for each context. These guarantees are obtained under mild sufficiency conditions on the factors that are weaker versions of the well-known Statistical RIP condition. We further propose a class of generative models that satisfy our sufficient conditions, and derive a lower bound of $\mathcal{O}\left(Km\log T\right)$. These are the first regret guarantees for online matrix completion with bandit feedback, when the rank is greater than one. We further compare the performance of our algorithm with the state of the art, on synthetic and real world data-sets.

preprint2016arXiv

Distributed Estimation of Graph 4-Profiles

We present a novel distributed algorithm for counting all four-node induced subgraphs in a big graph. These counts, called the $4$-profile, describe a graph's connectivity properties and have found several uses ranging from bioinformatics to spam detection. We also study the more complicated problem of estimating the local $4$-profiles centered at each vertex of the graph. The local $4$-profile embeds every vertex in an $11$-dimensional space that characterizes the local geometry of its neighborhood: vertices that connect different clusters will have different local $4$-profiles compared to those that are only part of one dense cluster. Our algorithm is a local, distributed message-passing scheme on the graph and computes all the local $4$-profiles in parallel. We rely on two novel theoretical contributions: we show that local $4$-profiles can be calculated using compressed two-hop information and also establish novel concentration results that show that graphs can be substantially sparsified and still retain good approximation quality for the global $4$-profile. We empirically evaluate our algorithm using a distributed GraphLab implementation that we scaled up to $640$ cores. We show that our algorithm can compute global and local $4$-profiles of graphs with millions of edges in a few minutes, significantly improving upon the previous state of the art.

preprint2015arXiv

An Efficient Multiple-Groupcast Coded Multicasting Scheme for Finite Fractional Caching

Coded multicasting has been shown to improve the caching performance of content delivery networks with multiple caches downstream of a common multicast link. However, the schemes that have been shown to achieve order-optimal perfor- mance require content items to be partitioned into a number of packets that grows exponentially with the number of users [1]. In this paper, we first extend the analysis of the achievable scheme in [2] to the case of heterogeneous cache sizes and demand distribu- tions, providing an achievable scheme and an upper bound on the limiting average performance when the number of packets goes to infinity while the remaining system parameters are kept constant. We then show how the scheme achieving this upper bound can very quickly loose its multiplicative caching gain for finite content packetization. To overcome this limitation, we design a novel polynomial-time algorithm based on greedy local graph-coloring that, while keeping the same content packetization, recovers a significant part of the multiplicative caching gain. Our results show that the achievable schemes proposed to date to quantify the limiting performance, must be properly designed for practical finite system parameters.

preprint2015arXiv

Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs

We study the problem of approximating the $3$-profile of a large graph. $3$-profiles are generalizations of triangle counts that specify the number of times a small graph appears as an induced subgraph of a large graph. Our algorithm uses the novel concept of $3$-profile sparsifiers: sparse graphs that can be used to approximate the full $3$-profile counts for a given large graph. Further, we study the problem of estimating local and ego $3$-profiles, two graph quantities that characterize the local neighborhood of each vertex of a graph. Our algorithm is distributed and operates as a vertex program over the GraphLab PowerGraph framework. We introduce the concept of edge pivoting which allows us to collect $2$-hop information without maintaining an explicit $2$-hop neighborhood list at each vertex. This enables the computation of all the local $3$-profiles in parallel with minimal communication. We test out implementation in several experiments scaling up to $640$ cores on Amazon EC2. We find that our algorithm can estimate the $3$-profile of a graph in approximately the same time as triangle counting. For the harder problem of ego $3$-profiles, we introduce an algorithm that can estimate profiles of hundreds of thousands of vertices in parallel, in the timescale of minutes.

preprint2015arXiv

Finite Length Analysis of Caching-Aided Coded Multicasting

In this work, we study a noiseless broadcast link serving $K$ users whose requests arise from a library of $N$ files. Every user is equipped with a cache of size $M$ files each. It has been shown that by splitting all the files into packets and placing individual packets in a random independent manner across all the caches, it requires at most $N/M$ file transmissions for any set of demands from the library. The achievable delivery scheme involves linearly combining packets of different files following a greedy clique cover solution to the underlying index coding problem. This remarkable multiplicative gain of random placement and coded delivery has been established in the asymptotic regime when the number of packets per file $F$ scales to infinity. In this work, we initiate the finite-length analysis of random caching schemes when the number of packets $F$ is a function of the system parameters $M,N,K$. Specifically, we show that existing random placement and clique cover delivery schemes that achieve optimality in the asymptotic regime can have at most a multiplicative gain of $2$ if the number of packets is sub-exponential. Further, for any clique cover based coded delivery and a large class of random caching schemes, that includes the existing ones, we show that the number of packets required to get a multiplicative gain of $\frac{4}{3}g$ is at least $O((N/M)^g)$. We exhibit a random placement and an efficient clique cover based coded delivery scheme that approximately achieves this lower bound. We also provide tight concentration results that show that the average (over the random caching involved) number of transmissions concentrates very well requiring only polynomial number of packets in the rest of the parameters.

preprint2015arXiv

Learning Causal Graphs with Small Interventions

We consider the problem of learning causal networks with interventions, when each intervention is limited in size under Pearl's Structural Equation Model with independent errors (SEM-IE). The objective is to minimize the number of experiments to discover the causal directions of all the edges in a causal graph. Previous work has focused on the use of separating systems for complete graphs for this task. We prove that any deterministic adaptive algorithm needs to be a separating system in order to learn complete graphs in the worst case. In addition, we present a novel separating system construction, whose size is close to optimal and is arguably simpler than previous work in combinatorics. We also develop a novel information theoretic lower bound on the number of interventions that applies in full generality, including for randomized adaptive learning algorithms. For general chordal graphs, we derive worst case lower bounds on the number of interventions. Building on observations about induced trees, we give a new deterministic adaptive algorithm to learn directions on any chordal skeleton completely. In the worst case, our achievable scheme is an $α$-approximation algorithm where $α$ is the independence number of the graph. We also show that there exist graph classes for which the sufficient number of experiments is close to the lower bound. In the other extreme, there are graph classes for which the required number of experiments is multiplicatively $α$ away from our lower bound. In simulations, our algorithm almost always performs very close to the lower bound, while the approach based on separating systems for complete graphs is significantly worse for random chordal graphs.

preprint2015arXiv

On Approximating the Sum-Rate for Multiple-Unicasts

We study upper bounds on the sum-rate of multiple-unicasts. We approximate the Generalized Network Sharing Bound (GNS cut) of the multiple-unicasts network coding problem with $k$ independent sources. Our approximation algorithm runs in polynomial time and yields an upper bound on the joint source entropy rate, which is within an $O(\log^2 k)$ factor from the GNS cut. It further yields a vector-linear network code that achieves joint source entropy rate within an $O(\log^2 k)$ factor from the GNS cut, but \emph{not} with independent sources: the code induces a correlation pattern among the sources. Our second contribution is establishing a separation result for vector-linear network codes: for any given field $\mathbb{F}$ there exist networks for which the optimum sum-rate supported by vector-linear codes over $\mathbb{F}$ for independent sources can be multiplicatively separated by a factor of $k^{1-δ}$, for any constant ${δ>0}$, from the optimum joint entropy rate supported by a code that allows correlation between sources. Finally, we establish a similar separation result for the asymmetric optimum vector-linear sum-rates achieved over two distinct fields $\mathbb{F}_{p}$ and $\mathbb{F}_{q}$ for independent sources, revealing that the choice of field can heavily impact the performance of a linear network code.

preprint2014arXiv

Bounding Multiple Unicasts through Index Coding and Locally Repairable Codes

We establish a duality result between linear index coding and Locally Repairable Codes (LRCs). Specifically, we show that a natural extension of LRCs we call Generalized Locally Repairable Codes (GLCRs) are exactly dual to linear index codes. In a GLRC, every node is decodable from a specific set of other nodes and these sets induce a recoverability directed graph. We show that the dual linear subspace of a GLRC is a solution to an index coding instance where the side information graph is this GLRC recoverability graph. We show that the GLRC rate is equivalent to the complementary index coding rate, i.e. the number of transmissions saved by coding. Our second result uses this duality to establish a new upper bound for the multiple unicast network coding problem. In multiple unicast network coding, we are given a directed acyclic graph and r sources that want to send independent messages to r corresponding destinations. Our new upper bound is efficiently computable and relies on a strong approximation result for complementary index coding. We believe that our bound could lead to a logarithmic approximation factor for multiple unicast network coding if a plausible connection we state is verified.

preprint2014arXiv

Graph Theory versus Minimum Rank for Index Coding

We obtain novel index coding schemes and show that they provably outperform all previously known graph theoretic bounds proposed so far. Further, we establish a rather strong negative result: all known graph theoretic bounds are within a logarithmic factor from the chromatic number. This is in striking contrast to minrank since prior work has shown that it can outperform the chromatic number by a polynomial factor in some cases. The conclusion is that all known graph theoretic bounds are not much stronger than the chromatic number.

preprint2014arXiv

On the Information Theoretic Limits of Learning Ising Models

We provide a general framework for computing lower-bounds on the sample complexity of recovering the underlying graphs of Ising models, given i.i.d samples. While there have been recent results for specific graph classes, these involve fairly extensive technical arguments that are specialized to each specific graph class. In contrast, we isolate two key graph-structural ingredients that can then be used to specify sample complexity lower-bounds. Presence of these structural properties makes the graph class hard to learn. We derive corollaries of our main result that not only recover existing recent results, but also provide lower bounds for novel graph classes not considered previously. We also extend our framework to the random graph setting and derive corollaries for Erdős-Rényi graphs in a certain dense setting.

preprint2014arXiv

Sparse Polynomial Learning and Graph Sketching

Let $f:\{-1,1\}^n$ be a polynomial with at most $s$ non-zero real coefficients. We give an algorithm for exactly reconstructing f given random examples from the uniform distribution on $\{-1,1\}^n$ that runs in time polynomial in $n$ and $2s$ and succeeds if the function satisfies the unique sign property: there is one output value which corresponds to a unique set of values of the participating parities. This sufficient condition is satisfied when every coefficient of f is perturbed by a small random noise, or satisfied with high probability when s parity functions are chosen randomly or when all the coefficients are positive. Learning sparse polynomials over the Boolean domain in time polynomial in $n$ and $2s$ is considered notoriously hard in the worst-case. Our result shows that the problem is tractable for almost all sparse polynomials. Then, we show an application of this result to hypergraph sketching which is the problem of learning a sparse (both in the number of hyperedges and the size of the hyperedges) hypergraph from uniformly drawn random cuts. We also provide experimental results on a real world dataset.

preprint2013arXiv

A Repair Framework for Scalar MDS Codes

Several works have developed vector-linear maximum-distance separable (MDS) storage codes that min- imize the total communication cost required to repair a single coded symbol after an erasure, referred to as repair bandwidth (BW). Vector codes allow communicating fewer sub-symbols per node, instead of the entire content. This allows non trivial savings in repair BW. In sharp contrast, classic codes, like Reed- Solomon (RS), used in current storage systems, are deemed to suffer from naive repair, i.e. downloading the entire stored message to repair one failed node. This mainly happens because they are scalar-linear. In this work, we present a simple framework that treats scalar codes as vector-linear. In some cases, this allows significant savings in repair BW. We show that vectorized scalar codes exhibit properties that simplify the design of repair schemes. Our framework can be seen as a finite field analogue of real interference alignment. Using our simplified framework, we design a scheme that we call clique-repair which provably identifies the best linear repair strategy for any scalar 2-parity MDS code, under some conditions on the sub-field chosen for vectorization. We specify optimal repair schemes for specific (5,3)- and (6,4)-Reed- Solomon (RS) codes. Further, we present a repair strategy for the RS code currently deployed in the Facebook Analytics Hadoop cluster that leads to 20% of repair BW savings over naive repair which is the repair scheme currently used for this code.

preprint2013arXiv

FemtoCaching: Wireless Video Content Delivery through Distributed Caching Helpers

Video on-demand streaming from Internet-based servers is becoming one of the most important services offered by wireless networks today. In order to improve the area spectral efficiency of video transmission in cellular systems, small cells heterogeneous architectures (e.g., femtocells, WiFi off-loading) are being proposed, such that video traffic to nomadic users can be handled by short-range links to the nearest small cell access points (referred to as "helpers"). As the helper deployment density increases, the backhaul capacity becomes the system bottleneck. In order to alleviate such bottleneck we propose a system where helpers with low-rate backhaul but high storage capacity cache popular video files. Files not available from helpers are transmitted by the cellular base station. We analyze the optimum way of assigning files to the helpers, in order to minimize the expected downloading time for files. We distinguish between the uncoded case (where only complete files are stored) and the coded case, where segments of Fountain-encoded versions of the video files are stored at helpers. We show that the uncoded optimum file assignment is NP-hard, and develop a greedy strategy that is provably within a factor 2 of the optimum. Further, for a special case we provide an efficient algorithm achieving a provably better approximation ratio of $1-(1-1/d)^d$, where $d$ is the maximum number of helpers a user can be connected to. We also show that the coded optimum cache assignment problem is convex that can be further reduced to a linear program. We present numerical results comparing the proposed schemes.

preprint2013arXiv

Index Coding Problem with Side Information Repositories

To tackle the expected enormous increase in mobile video traffic in cellular networks, an architecture involving a base station along with caching femto stations (referred to as helpers), storing popular files near users, has been proposed [1]. The primary benefit of caching is the enormous increase in downloading rate when a popular file is available at helpers near a user requesting that file. In this work, we explore a secondary benefit of caching in this architecture through the lens of index coding. We assume a system with n users and constant number of caching helpers. Only helpers store files, i.e. have side information. We investigate the following scenario: Each user requests a distinct file that is not found in the set of helpers nearby. Users are served coded packets (through an index code) by an omniscient base station. Every user decodes its desired packet from the coded packets and the side information packets from helpers nearby. We assume that users can obtain any file stored in their neighboring helpers without incurring transmission costs. With respect to the index code employed, we investigate two achievable schemes: 1) XOR coloring based on coloring of the side information graph associated with the problem and 2)Vector XOR coloring based on fractional coloring of the side information graph. We show that the general problem reduces to a canonical problem where every user is connected to exactly one helper under some topological constraints. For the canonical problem, with constant number of helpers (k), we show that the complexity of computing the best XOR/vector XOR coloring schemes are polynomial in the number of users n. The result exploits a special complete bi-partite structure that the side information graphs exhibit for any finite k.

preprint2013arXiv

Local Graph Coloring and Index Coding

We present a novel upper bound for the optimal index coding rate. Our bound uses a graph theoretic quantity called the local chromatic number. We show how a good local coloring can be used to create a good index code. The local coloring is used as an alignment guide to assign index coding vectors from a general position MDS code. We further show that a natural LP relaxation yields an even stronger index code. Our bounds provably outperform the state of the art on index coding but at most by a constant factor.

Karthikeyan Shanmugam

What is connected

Connect this record

See the researcher in context

Building this map preview

33 published item(s)

Inferring Asteroseismic Parameters from Short Observations Using Deep Learning: Application to TESS and K2 Red Giants

Fairness under Covariate Shift: Improving Fairness-Accuracy tradeoff with few Unlabeled Test Samples

Selective classification using a robust meta-learning approach

Auto-Transfer: Learning to Route Transferrable Representations

Causal Feature Selection for Algorithmic Fairness

Empirical or Invariant Risk Minimization? A Sample Complexity Perspective

Finding Valid Adjustments under Non-ignorability with Minimal DAG Knowledge

Fourier Representations for Black-Box Optimization over Categorical Variables

PAC Generalization via Invariant Representations

AI Explainability 360: Impact and Design

Contextual Bandits with Stochastic Experts

Efficient Encrypted Inference on Ensembles of Decision Trees

Stochastic Linear Bandits with Protected Subspace

A Multi-Channel Neural Graphical Event Model with Negative Evidence

Differentially Private Distributed Data Summarization under Covariate Shift

Enhancing Simple Models by Exploiting What They Already Know

Invariant Risk Minimization Games

Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions

Contextual Bandits with Latent Confounders: An NMF Approach

Distributed Estimation of Graph 4-Profiles

An Efficient Multiple-Groupcast Coded Multicasting Scheme for Finite Fractional Caching

Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs

Finite Length Analysis of Caching-Aided Coded Multicasting

Learning Causal Graphs with Small Interventions

On Approximating the Sum-Rate for Multiple-Unicasts

Bounding Multiple Unicasts through Index Coding and Locally Repairable Codes

Graph Theory versus Minimum Rank for Index Coding

On the Information Theoretic Limits of Learning Ising Models

Sparse Polynomial Learning and Graph Sketching

A Repair Framework for Scalar MDS Codes

FemtoCaching: Wireless Video Content Delivery through Distributed Caching Helpers

Index Coding Problem with Side Information Repositories

Local Graph Coloring and Index Coding