Source author record

Kaushik Sinha

Kaushik Sinha appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Data Structures and Algorithms Cryptography and Security math.CO physics.soc-ph Social and Information Networks

Catalog footprint

What is connected

6works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Neural Neighborhood Encoding for Classification

Inspired by the fruit-fly olfactory circuit, the Fly Bloom Filter [Dasgupta et al., 2018] is able to efficiently summarize the data with a single pass and has been used for novelty detection. We propose a new classifier (for binary and multi-class classification) that effectively encodes the different local neighborhoods for each class with a per-class Fly Bloom Filter. The inference on test data requires an efficient {\tt FlyHash} [Dasgupta, et al., 2017] operation followed by a high-dimensional, but {\em sparse}, dot product with the per-class Bloom Filters. The learning is trivially parallelizable. On the theoretical side, we establish conditions under which the prediction of our proposed classifier on any test example agrees with the prediction of the nearest neighbor classifier with high probability. We extensively evaluate our proposed scheme with over $50$ data sets of varied data dimensionality to demonstrate that the predictive performance of our proposed neuroscience inspired classifier is competitive the the nearest-neighbor classifiers and other single-pass classifiers.

preprint2016arXiv

Matrix Energy as a Measure of Topological Complexity of a Graph

The complexity of highly interconnected systems is rooted in the interwoven architecture defined by its connectivity structure. In this paper, we develop matrix energy of the underlying connectivity structure as a measure of topological complexity and highlight interpretations about certain global features of underlying system connectivity patterns. The proposed complexity metric is shown to satisfy the Weyuker criteria as a measure of its validity as a formal complexity metric. We also introduce the notion of P point in the graph density space. The P point acts as a boundary between multiple connectivity regimes for finite-size graphs.

preprint2013arXiv

Near-Optimal Algorithms for Differentially-Private Principal Components

Principal components analysis (PCA) is a standard tool for identifying good low-dimensional approximations to data in high dimension. Many data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. We show that the sample complexity of the proposed method differs from the existing procedure in the scaling with the data dimension, and that our method is nearly optimal in terms of this scaling. We furthermore illustrate our results, showing that on real data there is a large performance gap between the existing method and our method.

preprint2013arXiv

Randomized partition trees for exact nearest neighbor search

The k-d tree was one of the first spatial data structures proposed for nearest neighbor search. Its efficacy is diminished in high-dimensional spaces, but several variants, with randomization and overlapping cells, have proved to be successful in practice. We analyze three such schemes. We show that the probability that they fail to find the nearest neighbor, for any data set and any query point, is directly related to a simple potential function that captures the difficulty of the point configuration. We then bound this potential function in two situations of interest: the first, when data come from a doubling measure, and the second, when the data are documents from a topic model.

preprint2010arXiv

Learning Gaussian Mixtures with Arbitrary Separation

In this paper we present a method for learning the parameters of a mixture of $k$ identical spherical Gaussians in $n$-dimensional space with an arbitrarily small separation between the components. Our algorithm is polynomial in all parameters other than $k$. The algorithm is based on an appropriate grid search over the space of parameters. The theoretical analysis of the algorithm hinges on a reduction of the problem to 1 dimension and showing that two 1-dimensional mixtures whose densities are close in the $L^2$ norm must have similar means and mixing coefficients. To produce such a lower bound for the $L^2$ norm in terms of the distances between the corresponding means, we analyze the behavior of the Fourier transform of a mixture of Gaussians in 1 dimension around the origin, which turns out to be closely related to the properties of the Vandermonde matrix obtained from the component means. Analysis of this matrix together with basic function approximation results allows us to provide a lower bound for the norm of the mixture in the Fourier domain. In recent years much research has been aimed at understanding the computational aspects of learning parameters of Gaussians mixture distributions in high dimension. To the best of our knowledge all existing work on learning parameters of Gaussian mixtures assumes minimum separation between components of the mixture which is an increasing function of either the dimension of the space $n$ or the number of components $k$. In our paper we prove the first result showing that parameters of a $n$-dimensional Gaussian mixture model with arbitrarily small component separation can be learned in time polynomial in $n$.

preprint2010arXiv

Polynomial Learning of Distribution Families

The question of polynomial learnability of probability distributions, particularly Gaussian mixture distributions, has recently received significant attention in theoretical computer science and machine learning. However, despite major progress, the general question of polynomial learnability of Gaussian mixture distributions still remained open. The current work resolves the question of polynomial learnability for Gaussian mixtures in high dimension with an arbitrary fixed number of components. The result on learning Gaussian mixtures relies on an analysis of distributions belonging to what we call "polynomial families" in low dimension. These families are characterized by their moments being polynomial in parameters and include almost all common probability distributions as well as their mixtures and products. Using tools from real algebraic geometry, we show that parameters of any distribution belonging to such a family can be learned in polynomial time and using a polynomial number of sample points. The result on learning polynomial families is quite general and is of independent interest. To estimate parameters of a Gaussian mixture distribution in high dimensions, we provide a deterministic algorithm for dimensionality reduction. This allows us to reduce learning a high-dimensional mixture to a polynomial number of parameter estimations in low dimension. Combining this reduction with the results on polynomial families yields our result on learning arbitrary Gaussian mixtures in high dimensions.

Kaushik Sinha

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Neural Neighborhood Encoding for Classification

Matrix Energy as a Measure of Topological Complexity of a Graph

Near-Optimal Algorithms for Differentially-Private Principal Components

Randomized partition trees for exact nearest neighbor search

Learning Gaussian Mixtures with Arbitrary Separation

Polynomial Learning of Distribution Families