Source author record

Ravindran Kannan

Ravindran Kannan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Machine Learning Distributed, Parallel, and Cluster Computing math.CO math.PR Numerical Analysis

Catalog footprint

What is connected

7works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Finding a latent k-simplex in O(k . nnz(data)) time via Subset Smoothing

In this paper we show that a large class of Latent variable models, such as Mixed Membership Stochastic Block(MMSB) Models, Topic Models, and Adversarial Clustering, can be unified through a geometric perspective, replacing model specific assumptions and algorithms for individual models. The geometric perspective leads to the formulation: \emph{find a latent $k-$ polytope $K$ in ${\bf R}^d$ given $n$ data points, each obtained by perturbing a latent point in $K$}. This problem does not seem to have been considered in the literature. The most important contribution of this paper is to show that the latent $k-$polytope problem admits an efficient algorithm under deterministic assumptions which naturally hold in Latent variable models considered in this paper. ur algorithm runs in time $O^*(k\; \mbox{nnz})$ matching the best running time of algorithms in special cases considered here and is better when the data is sparse, as is the case in applications. An important novelty of the algorithm is the introduction of \emph{subset smoothed polytope}, $K'$, the convex hull of the ${n\choose δn}$ points obtained by averaging all $δn$ subsets of the data points, for a given $δ\in (0,1)$. We show that $K$ and $K'$ are close in Hausdroff distance. Among the consequences of our algorithm are the following: (a) MMSB Models and Topic Models: the first quasi-input-sparsity time algorithm for parameter estimation for $k \in O^*(1)$, (b) Adversarial Clustering: In $k-$means, if, an adversary is allowed to move many data points from each cluster an arbitrary amount towards the convex hull of the centers of other clusters, our algorithm still estimates cluster centers well.

preprint2014arXiv

A provable SVD-based algorithm for learning topics in dominant admixture corpus

Topic models, such as Latent Dirichlet Allocation (LDA), posit that documents are drawn from admixtures of distributions over words, known as topics. The inference problem of recovering topics from admixtures, is NP-hard. Assuming separability, a strong assumption, [4] gave the first provable algorithm for inference. For LDA model, [6] gave a provable algorithm using tensor-methods. But [4,6] do not learn topic vectors with bounded $l_1$ error (a natural measure for probability vectors). Our aim is to develop a model which makes intuitive and empirically supported assumptions and to design an algorithm with natural, simple components such as SVD, which provably solves the inference problem for the model with bounded $l_1$ error. A topic in LDA and other models is essentially characterized by a group of co-occurring words. Motivated by this, we introduce topic specific Catchwords, group of words which occur with strictly greater frequency in a topic than any other topic individually and are required to have high frequency together rather than individually. A major contribution of the paper is to show that under this more realistic assumption, which is empirically verified on real corpora, a singular value decomposition (SVD) based algorithm with a crucial pre-processing step of thresholding, can provably recover the topics from a collection of documents drawn from Dominant admixtures. Dominant admixtures are convex combination of distributions in which one distribution has a significantly higher contribution than others. Apart from the simplicity of the algorithm, the sample complexity has near optimal dependence on $w_0$, the lowest probability that a topic is dominant, and is better than [4]. Empirical evidence shows that on several real world corpora, both Catchwords and Dominant admixture assumptions hold and the proposed algorithm substantially outperforms the state of the art [5].

preprint2014arXiv

Principal Component Analysis and Higher Correlations for Distributed Data

We consider algorithmic problems in the setting in which the input data has been partitioned arbitrarily on many servers. The goal is to compute a function of all the data, and the bottleneck is the communication used by the algorithm. We present algorithms for two illustrative problems on massive data sets: (1) computing a low-rank approximation of a matrix $A=A^1 + A^2 + \ldots + A^s$, with matrix $A^t$ stored on server $t$ and (2) computing a function of a vector $a_1 + a_2 + \ldots + a_s$, where server $t$ has the vector $a_t$; this includes the well-studied special case of computing frequency moments and separable functions, as well as higher-order correlations such as the number of subgraphs of a specified type occurring in a graph. For both problems we give algorithms with nearly optimal communication, and in particular the only dependence on $n$, the size of the data, is in the number of bits needed to represent indices and words ($O(\log n)$).

preprint2014arXiv

Spectral Approaches to Nearest Neighbor Search

We study spectral algorithms for the high-dimensional Nearest Neighbor Search problem (NNS). In particular, we consider a semi-random setting where a dataset $P$ in $\mathbb{R}^d$ is chosen arbitrarily from an unknown subspace of low dimension $k\ll d$, and then perturbed by fully $d$-dimensional Gaussian noise. We design spectral NNS algorithms whose query time depends polynomially on $d$ and $\log n$ (where $n=|P|$) for large ranges of $k$, $d$ and $n$. Our algorithms use a repeated computation of the top PCA vector/subspace, and are effective even when the random-noise magnitude is {\em much larger} than the interpoint distances in $P$. Our motivation is that in practice, a number of spectral NNS algorithms outperform the random-projection methods that seem otherwise theoretically optimal on worst case datasets. In this paper we aim to provide theoretical justification for this disparity.

preprint2010arXiv

Clustering with Spectral Norm and the k-means Algorithm

There has been much progress on efficient algorithms for clustering data points generated by a mixture of $k$ probability distributions under the assumption that the means of the distributions are well-separated, i.e., the distance between the means of any two distributions is at least $Ω(k)$ standard deviations. These results generally make heavy use of the generative model and particular properties of the distributions. In this paper, we show that a simple clustering algorithm works without assuming any generative (probabilistic) model. Our only assumption is what we call a "proximity condition": the projection of any data point onto the line joining its cluster center to any other cluster center is $Ω(k)$ standard deviations closer to its own center than the other center. Here the notion of standard deviations is based on the spectral norm of the matrix whose rows represent the difference between a point and the mean of the cluster to which it belongs. We show that in the generative models studied, our proximity condition is satisfied and so we are able to derive most known results for generative models as corollaries of our main result. We also prove some new results for generative models - e.g., we can cluster all but a small fraction of points only assuming a bound on the variance. Our algorithm relies on the well known $k$-means algorithm, and along the way, we prove a result of independent interest -- that the $k$-means algorithm converges to the "true centers" even in the presence of spurious points provided the initial (estimated) centers are close enough to the corresponding actual centers and all but a small fraction of the points satisfy the proximity condition. Finally, we present a new technique for boosting the ratio of inter-center separation to standard deviation.

preprint2010arXiv

Spectral Methods for Matrices and Tensors

While Spectral Methods have long been used for Principal Component Analysis, this survey focusses on work over the last 15 years with three salient features: (i) Spectral methods are useful not only for numerical problems, but also discrete optimization problems (Constraint Optimization Problems - CSP's) like the max. cut problem and similar mathematical considerations underlie both areas. (ii) Spectral methods can be extended to tensors. The theory and algorithms for tensors are not as simple/clean as for matrices, but the survey describes methods for low-rank approximation which extend to tensors. These tensor approximations help us solve Max-$r$-CSP's for $r>2$ as well as numerical tensor problems. (iii) Sampling on the fly plays a prominent role in these methods. A primary result is that for any matrix, a random submatrix of rows/columns picked with probabilities proportional to the squared lengths (of rows/columns), yields estimates of the singular values as well as an approximation to the whole matrix.

preprint2010arXiv

Two new Probability inequalities and Concentration Results

Concentration results and probabilistic analysis for combinatorial problems like the TSP, MWST, graph coloring have received much attention, but generally, for i.i.d. samples (i.i.d. points in the unit square for the TSP, for example). Here, we prove two probability inequalities which generalize and strengthen Martingale inequalities. The inequalities provide the tools to deal with more general heavy-tailed and inhomogeneous distributions for combinatorial problems. We prove a wide range of applications - in addition to the TSP, MWST, graph coloring, we also prove more general results than known previously for concentration in bin-packing, sub-graph counts, Johnson-Lindenstrauss random projection theorem. It is hoped that the strength of the inequalities will serve many more purposes.

Ravindran Kannan

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Finding a latent k-simplex in O(k . nnz(data)) time via Subset Smoothing

A provable SVD-based algorithm for learning topics in dominant admixture corpus

Principal Component Analysis and Higher Correlations for Distributed Data

Spectral Approaches to Nearest Neighbor Search

Clustering with Spectral Norm and the k-means Algorithm

Spectral Methods for Matrices and Tensors

Two new Probability inequalities and Concentration Results