Source author record

Ardhendu Tripathy

Ardhendu Tripathy appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory Machine Learning math.IT math.ST Statistics Theory

Catalog footprint

What is connected

7works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Chernoff Sampling for Active Testing and Extension to Active Regression

Active learning can reduce the number of samples needed to perform a hypothesis test and to estimate the parameters of a model. In this paper, we revisit the work of Chernoff that described an asymptotically optimal algorithm for performing a hypothesis test. We obtain a novel sample complexity bound for Chernoff's algorithm, with a non-asymptotic term that characterizes its performance at a fixed confidence level. We also develop an extension of Chernoff sampling that can be used to estimate the parameters of a wide variety of models and we obtain a non-asymptotic bound on the estimation error. We apply our extension of Chernoff sampling to actively learn neural network models and to estimate parameters in real-data linear and non-linear regression problems, where our approach performs favorably to state-of-the-art methods.

preprint2021arXiv

Nearest Neighbor Search Under Uncertainty

Nearest Neighbor Search (NNS) is a central task in knowledge representation, learning, and reasoning. There is vast literature on efficient algorithms for constructing data structures and performing exact and approximate NNS. This paper studies NNS under Uncertainty (NNSU). Specifically, consider the setting in which an NNS algorithm has access only to a stochastic distance oracle that provides a noisy, unbiased estimate of the distance between any pair of points, rather than the exact distance. This models many situations of practical importance, including NNS based on human similarity judgements, physical measurements, or fast, randomized approximations to exact distances. A naive approach to NNSU could employ any standard NNS algorithm and repeatedly query and average results from the stochastic oracle (to reduce noise) whenever it needs a pairwise distance. The problem is that a sufficient number of repeated queries is unknown in advance; e.g., a point maybe distant from all but one other point (crude distance estimates suffice) or it may be close to a large number of other points (accurate estimates are necessary). This paper shows how ideas from cover trees and multi-armed bandits can be leveraged to develop an NNSU algorithm that has optimal dependence on the dataset size and the (unknown)geometry of the dataset.

preprint2021arXiv

Optimal Confidence Regions for the Multinomial Parameter

Construction of tight confidence regions and intervals is central to statistical inference and decision making. This paper develops new theory showing minimum average volume confidence regions for categorical data. More precisely, consider an empirical distribution $\widehat{\boldsymbol{p}}$ generated from $n$ iid realizations of a random variable that takes one of $k$ possible values according to an unknown distribution $\boldsymbol{p}$. This is analogous to a single draw from a multinomial distribution. A confidence region is a subset of the probability simplex that depends on $\widehat{\boldsymbol{p}}$ and contains the unknown $\boldsymbol{p}$ with a specified confidence. This paper shows how one can construct minimum average volume confidence regions, answering a long standing question. We also show the optimality of the regions directly translates to optimal confidence intervals of linear functionals such as the mean, implying sample complexity and regret improvements for adaptive machine learning algorithms.

preprint2020arXiv

Finding All ε-Good Arms in Stochastic Bandits

The pure-exploration problem in stochastic multi-armed bandits aims to find one or more arms with the largest (or near largest) means. Examples include finding an ε-good arm, best-arm identification, top-k arm identification, and finding all arms with means above a specified threshold. However, the problem of finding all ε-good arms has been overlooked in past work, although arguably this may be the most natural objective in many applications. For example, a virologist may conduct preliminary laboratory experiments on a large candidate set of treatments and move all ε-good treatments into more expensive clinical trials. Since the ultimate clinical efficacy is uncertain, it is important to identify all ε-good candidates. Mathematically, the all-ε-good arm identification problem presents significant new challenges and surprises that do not arise in the pure-exploration objectives studied in the past. We introduce two algorithms to overcome these and demonstrate their great empirical performance on a large-scale crowd-sourced dataset of 2.2M ratings collected by the New Yorker Caption Contest as well as a dataset testing hundreds of possible cancer drugs.

preprint2016arXiv

On Computation Rates for Arithmetic Sum

For zero-error function computation over directed acyclic networks, existing upper and lower bounds on the computation capacity are known to be loose. In this work we consider the problem of computing the arithmetic sum over a specific directed acyclic network that is not a tree. We assume the sources to be i.i.d. Bernoulli with parameter $1/2$. Even in this simple setting, we demonstrate that upper bounding the computation rate is quite nontrivial. In particular, it requires us to consider variable length network codes and relate the upper bound to equivalently lower bounding the entropy of descriptions observed by the terminal conditioned on the function value. This lower bound is obtained by further lower bounding the entropy of a so-called \textit{clumpy distribution}. We also demonstrate an achievable scheme that uses variable length network codes and in-network compression.

preprint2016arXiv

Sum-networks from undirected graphs: construction and capacity analysis

We consider a directed acyclic network with multiple sources and multiple terminals where each terminal is interested in decoding the sum of independent sources generated at the source nodes. We describe a procedure whereby a simple undirected graph can be used to construct such a sum-network and demonstrate an upper bound on its computation rate. Furthermore, we show sufficient conditions for the construction of a linear network code that achieves this upper bound. Our procedure allows us to construct sum-networks that have any arbitrary computation rate $\frac{p}{q}$ (where $p,q$ are non-negative integers). Our work significantly generalizes a previous approach for constructing sum-networks with arbitrary capacities. Specifically, we answer an open question in prior work by demonstrating sum-networks with significantly fewer number of sources and terminals.

preprint2015arXiv

Capacity of Sum-networks for Different Message Alphabets

A sum-network is a directed acyclic network in which all terminal nodes demand the `sum' of the independent information observed at the source nodes. Many characteristics of the well-studied multiple-unicast network communication problem also hold for sum-networks due to a known reduction between instances of these two problems. Our main result is that unlike a multiple unicast network, the coding capacity of a sum-network is dependent on the message alphabet. We demonstrate this using a construction procedure and show that the choice of a message alphabet can reduce the coding capacity of a sum-network from $1$ to close to $0$.

Ardhendu Tripathy

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Chernoff Sampling for Active Testing and Extension to Active Regression

Nearest Neighbor Search Under Uncertainty

Optimal Confidence Regions for the Multinomial Parameter

Finding All ε-Good Arms in Stochastic Bandits

On Computation Rates for Arithmetic Sum

Sum-networks from undirected graphs: construction and capacity analysis

Capacity of Sum-networks for Different Message Alphabets