Source author record

Sampath Kannan

Sampath Kannan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Science and Game Theory Data Structures and Algorithms Machine Learning cs.CY econ.TH Artificial Intelligence Cryptography and Security Discrete Mathematics econ.EM Multiagent Systems physics.soc-ph Populations and Evolution

Catalog footprint

What is connected

13works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Quantifying the Burden of Exploration and the Unfairness of Free Riding

We consider the multi-armed bandit setting with a twist. Rather than having just one decision maker deciding which arm to pull in each round, we have $n$ different decision makers (agents). In the simple stochastic setting, we show that a "free-riding" agent observing another "self-reliant" agent can achieve just $O(1)$ regret, as opposed to the regret lower bound of $Ω(\log t)$ when one decision maker is playing in isolation. This result holds whenever the self-reliant agent's strategy satisfies either one of two assumptions: (1) each arm is pulled at least $γ\ln t$ times in expectation for a constant $γ$ that we compute, or (2) the self-reliant agent achieves $o(t)$ realized regret with high probability. Both of these assumptions are satisfied by standard zero-regret algorithms. Under the second assumption, we further show that the free rider only needs to observe the number of times each arm is pulled by the self-reliant agent, and not the rewards realized. In the linear contextual setting, each arm has a distribution over parameter vectors, each agent has a context vector, and the reward realized when an agent pulls an arm is the inner product of that agent's context vector with a parameter vector sampled from the pulled arm's distribution. We show that the free rider can achieve $O(1)$ regret in this setting whenever the free rider's context is a small (in $L_2$-norm) linear combination of other agents' contexts and all other agents pull each arm $Ω(\log t)$ times with high probability. Again, this condition on the self-reliant players is satisfied by standard zero-regret algorithms like UCB. We also prove a number of lower bounds.

preprint2022arXiv

Reconstructing Ultrametric Trees from Noisy Experiments

The problem of reconstructing evolutionary trees or phylogenies is of great interest in computational biology. A popular model for this problem assumes that we are given the set of leaves (current species) of an unknown binary tree and the results of `experiments' on triples of leaves (a,b,c), which return the pair with the deepest least common ancestor. If the tree is assumed to be an ultrametric (i.e., all root-leaf paths have the same length), the experiment can be equivalently seen to return the closest pair of leaves. In this model, efficient algorithms are known for tree reconstruction. In reality, since the data on which these `experiments' are run is itself generated by the stochastic process of evolution, these experiments are noisy. In all reasonable models of evolution, if the branches leading to the leaves in a triple separate from each other at common ancestors that are very close to each other in the tree, the result of the experiment should be close to uniformly random. Motivated by this, we consider a model where the noise on any triple is just dependent on the three pairwise distances (referred to as distance based noise). Our results are the following: 1. Suppose the length of every edge in the unknown tree is at least $\tilde{O}(\frac{1}{\sqrt n})$ fraction of the length of a root-leaf path. Then, we give an efficient algorithm to reconstruct the topology of the tree for a broad family of distance-based noise models. Further, we show that if the edges are asymptotically shorter, then topology reconstruction is information-theoretically impossible. 2. Further, for a specific distance-based noise model--which we refer to as the homogeneous noise model--we show that the edge weights can also be approximately reconstructed under the same quantitative lower bound on the edge lengths.

preprint2021arXiv

Best vs. All: Equity and Accuracy of Standardized Test Score Reporting

We study a game theoretic model of standardized testing for college admissions. Students are of two types; High and Low. There is a college that would like to admit the High type students. Students take a potentially costly standardized exam which provides a noisy signal of their type. The students come from two populations, which are identical in talent (i.e. the type distribution is the same), but differ in their access to resources: the higher resourced population can at their option take the exam multiple times, whereas the lower resourced population can only take the exam once. We study two models of score reporting, which capture existing policies used by colleges. The first policy (sometimes known as "super-scoring") allows students to report the max of the scores they achieve. The other policy requires that all scores be reported. We find in our model that requiring that all scores be reported results in superior outcomes in equilibrium, both from the perspective of the college (the admissions rule is more accurate), and from the perspective of equity across populations: a student's probability of admission is independent of their population, conditional on their type. In particular, the false positive rates and false negative rates are identical in this setting, across the highly and poorly resourced student populations. This is the case despite the fact that the more highly resourced students can -- at their option -- either report a more accurate signal of their type, or pool with the lower resourced population under this policy.

preprint2020arXiv

Fair Prediction with Endogenous Behavior

There is increasing regulatory interest in whether machine learning algorithms deployed in consequential domains (e.g. in criminal justice) treat different demographic groups "fairly." However, there are several proposed notions of fairness, typically mutually incompatible. Using criminal justice as an example, we study a model in which society chooses an incarceration rule. Agents of different demographic groups differ in their outside options (e.g. opportunity for legal employment) and decide whether to commit crimes. We show that equalizing type I and type II errors across groups is consistent with the goal of minimizing the overall crime rate; other popular notions of fairness are not.

preprint2020arXiv

Near-Perfect Recovery in the One-Dimensional Latent Space Model

Suppose a graph $G$ is stochastically created by uniformly sampling vertices along a line segment and connecting each pair of vertices with a probability that is a known decreasing function of their distance. We ask if it is possible to reconstruct the actual positions of the vertices in $G$ by only observing the generated unlabeled graph. We study this question for two natural edge probability functions -- one where the probability of an edge decays exponentially with the distance and another where this probability decays only linearly. We initiate our study with the weaker goal of recovering only the order in which vertices appear on the line segment. For a segment of length $n$ and a precision parameter $δ$, we show that for both exponential and linear decay edge probability functions, there is an efficient algorithm that correctly recovers (up to reflection symmetry) the order of all vertices that are at least $δ$ apart, using only $\tilde{O}(\frac{n}{δ^ 2})$ samples (vertices). Building on this result, we then show that $O(\frac{n^2 \log n}{δ^2})$ vertices (samples) are sufficient to additionally recover the location of each vertex on the line to within a precision of $δ$. We complement this result with an $Ω(\frac{n^{1.5}}δ)$ lower bound on samples needed for reconstructing positions (even by a computationally unbounded algorithm), showing that the task of recovering positions is information-theoretically harder than recovering the order. We give experimental results showing that our algorithm recovers the positions of almost all points with high accuracy.

preprint2020arXiv

Pipeline Interventions

We introduce the \emph{pipeline intervention} problem, defined by a layered directed acyclic graph and a set of stochastic matrices governing transitions between successive layers. The graph is a stylized model for how people from different populations are presented opportunities, eventually leading to some reward. In our model, individuals are born into an initial position (i.e. some node in the first layer of the graph) according to a fixed probability distribution, and then stochastically progress through the graph according to the transition matrices, until they reach a node in the final layer of the graph; each node in the final layer has a \emph{reward} associated with it. The pipeline intervention problem asks how to best make costly changes to the transition matrices governing people's stochastic transitions through the graph, subject to a budget constraint. We consider two objectives: social welfare maximization, and a fairness-motivated maximin objective that seeks to maximize the value to the population (starting node) with the \emph{least} expected value. We consider two variants of the maximin objective that turn out to be distinct, depending on whether we demand a deterministic solution or allow randomization. For each objective, we give an efficient approximation algorithm (an additive FPTAS) for constant width networks. We also tightly characterize the "price of fairness" in our setting: the ratio between the highest achievable social welfare and the highest social welfare consistent with a maximin optimal solution. Finally we show that for polynomial width networks, even approximating the maximin objective to any constant factor is NP hard, even for networks with constant depth. This shows that the restriction on the width in our positive results is essential.

preprint2020arXiv

Sublinear Algorithms and Lower Bounds for Metric TSP Cost Estimation

We consider the problem of designing sublinear time algorithms for estimating the cost of a minimum metric traveling salesman (TSP) tour. Specifically, given access to a $n \times n$ distance matrix $D$ that specifies pairwise distances between $n$ points, the goal is to estimate the TSP cost by performing only sublinear (in the size of $D$) queries. For the closely related problem of estimating the weight of a metric minimum spanning tree (MST), it is known that for any $\varepsilon > 0$, there exists an $\tilde{O}(n/\varepsilon^{O(1)})$ time algorithm that returns a $(1 + \varepsilon)$-approximate estimate of the MST cost. This result immediately implies an $\tilde{O}(n/\varepsilon^{O(1)})$ time algorithm to estimate the TSP cost to within a $(2 + \varepsilon)$ factor for any $\varepsilon > 0$. However, no $o(n^2)$ time algorithms are known to approximate metric TSP to a factor that is strictly better than $2$. On the other hand, there were also no known barriers that rule out the existence of $(1 + \varepsilon)$-approximate estimation algorithms for metric TSP with $\tilde{O}(n)$ time for any fixed $\varepsilon > 0$. In this paper, we make progress on both algorithms and lower bounds for estimating metric TSP cost. We also show that the problem of estimating metric TSP cost is closely connected to the problem of estimating the size of a maximum matching in a graph.

preprint2017arXiv

Privacy-Preserving Data Analysis for the Federal Statistical Agencies

Government statistical agencies collect enormously valuable data on the nation's population and business activities. Wide access to these data enables evidence-based policy making, supports new research that improves society, facilitates training for students in data science, and provides resources for the public to better understand and participate in their society. These data also affect the private sector. For example, the Employment Situation in the United States, published by the Bureau of Labor Statistics, moves markets. Nonetheless, government agencies are under increasing pressure to limit access to data because of a growing understanding of the threats to data privacy and confidentiality. "De-identification" - stripping obvious identifiers like names, addresses, and identification numbers - has been found inadequate in the face of modern computational and informational resources. Unfortunately, the problem extends even to the release of aggregate data statistics. This counter-intuitive phenomenon has come to be known as the Fundamental Law of Information Recovery. It says that overly accurate estimates of too many statistics can completely destroy privacy. One may think of this as death by a thousand cuts. Every statistic computed from a data set leaks a small amount of information about each member of the data set - a tiny cut. This is true even if the exact value of the statistic is distorted a bit in order to preserve privacy. But while each statistical release is an almost harmless little cut in terms of privacy risk for any individual, the cumulative effect can be to completely compromise the privacy of some individuals.

preprint2016arXiv

Linear Sketching over $\mathbb F_2$

We initiate a systematic study of linear sketching over $\mathbb F_2$. For a given Boolean function $f \colon \{0,1\}^n \to \{0,1\}$ a randomized $\mathbb F_2$-sketch is a distribution $\mathcal M$ over $d \times n$ matrices with elements over $\mathbb F_2$ such that $\mathcal Mx$ suffices for computing $f(x)$ with high probability. We study a connection between $\mathbb F_2$-sketching and a two-player one-way communication game for the corresponding XOR-function. Our results show that this communication game characterizes $\mathbb F_2$-sketching under the uniform distribution (up to dependence on error). Implications of this result include: 1) a composition theorem for $\mathbb F_2$-sketching complexity of a recursive majority function, 2) a tight relationship between $\mathbb F_2$-sketching complexity and Fourier sparsity, 3) lower bounds for a certain subclass of symmetric functions. We also fully resolve a conjecture of Montanaro and Osborne regarding one-way communication complexity of linear threshold functions by designing an $\mathbb F_2$-sketch of optimal size. Furthermore, we show that (non-uniform) streaming algorithms that have to process random updates over $\mathbb F_2$ can be constructed as $\mathbb F_2$-sketches for the uniform distribution with only a minor loss. In contrast with the previous work of Li, Nguyen and Woodruff (STOC'14) who show an analogous result for linear sketches over integers in the adversarial setting our result doesn't require the stream length to be triply exponential in $n$ and holds for streams of length $\tilde O(n)$ constructed through uniformly random updates. Finally, we state a conjecture that asks whether optimal one-way communication protocols for XOR-functions can be constructed as $\mathbb F_2$-sketches with only a small loss.

preprint2015arXiv

Private Pareto Optimal Exchange

We consider the problem of implementing an individually rational, asymptotically Pareto optimal allocation in a barter-exchange economy where agents are endowed with goods and have preferences over the goods of others, but may not use money as a medium of exchange. Because one of the most important instantiations of such economies is kidney exchange -- where the "input"to the problem consists of sensitive patient medical records -- we ask to what extent such exchanges can be carried out while providing formal privacy guarantees to the participants. We show that individually rational allocations cannot achieve any non-trivial approximation to Pareto optimality if carried out under the constraint of differential privacy -- or even the relaxation of \emph{joint} differential privacy, under which it is known that asymptotically optimal allocations can be computed in two-sided markets, where there is a distinction between buyers and sellers and we are concerned only with privacy of the buyers~\citep{Matching}. We therefore consider a further relaxation that we call \emph{marginal} differential privacy -- which promises, informally, that the privacy of every agent $i$ is protected from every other agent $j \neq i$ so long as $j$ does not collude or share allocation information with other agents. We show that, under marginal differential privacy, it is possible to compute an individually rational and asymptotically Pareto optimal allocation in such exchange economies.

preprint2014arXiv

Approximately Stable, School Optimal, and Student-Truthful Many-to-One Matchings (via Differential Privacy)

We present a mechanism for computing asymptotically stable school optimal matchings, while guaranteeing that it is an asymptotic dominant strategy for every student to report their true preferences to the mechanism. Our main tool in this endeavor is differential privacy: we give an algorithm that coordinates a stable matching using differentially private signals, which lead to our truthfulness guarantee. This is the first setting in which it is known how to achieve nontrivial truthfulness guarantees for students when computing school optimal matchings, assuming worst- case preferences (for schools and students) in large markets.

preprint2014arXiv

Variance on the Leaves of a Tree Markov Random Field: Detecting Character Dependencies in Phylogenies

Stochastic models of evolution (Markov random fields on trivalent trees) generally assume that different characters (different runs of the stochastic process) are independent and identically distributed. In this paper we take the first steps towards addressing dependent characters. Specifically we show that, under certain technical assumptions regarding the evolution of individual characters, we can detect any significant, history independent, correlation between any pair of multistate characters. For the special case of the Cavender-Farris-Neyman (CFN) model on two states with symmetric transition matrices, our analysis needs milder assumptions. To perform the analysis, we need to prove a new concentration result for multistate random variables of a Markov random field on arbitrary trivalent trees: we show that the random variable counting the number of leaves in any particular subset of states has variance that is subquadratic in the number of leaves.

preprint2012arXiv

The Exponential Mechanism for Social Welfare: Private, Truthful, and Nearly Optimal

In this paper we show that for any mechanism design problem with the objective of maximizing social welfare, the exponential mechanism can be implemented as a truthful mechanism while still preserving differential privacy. Our instantiation of the exponential mechanism can be interpreted as a generalization of the VCG mechanism in the sense that the VCG mechanism is the extreme case when the privacy parameter goes to infinity. To our knowledge, this is the first general tool for designing mechanisms that are both truthful and differentially private.

Sampath Kannan

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

Quantifying the Burden of Exploration and the Unfairness of Free Riding

Reconstructing Ultrametric Trees from Noisy Experiments

Best vs. All: Equity and Accuracy of Standardized Test Score Reporting

Fair Prediction with Endogenous Behavior

Near-Perfect Recovery in the One-Dimensional Latent Space Model

Pipeline Interventions

Sublinear Algorithms and Lower Bounds for Metric TSP Cost Estimation

Privacy-Preserving Data Analysis for the Federal Statistical Agencies

Linear Sketching over $\mathbb F_2$

Private Pareto Optimal Exchange

Approximately Stable, School Optimal, and Student-Truthful Many-to-One Matchings (via Differential Privacy)

Variance on the Leaves of a Tree Markov Random Field: Detecting Character Dependencies in Phylogenies

The Exponential Mechanism for Social Welfare: Private, Truthful, and Nearly Optimal