Researcher profile

Moses Charikar

Moses Charikar contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2024arXiv

A Quasi-Monte Carlo Data Structure for Smooth Kernel Evaluations

In the kernel density estimation (KDE) problem one is given a kernel $K(x, y)$ and a dataset $P$ of points in a Euclidean space, and must prepare a data structure that can quickly answer density queries: given a point $q$, output a $(1+ε)$-approximation to $μ:=\frac1{|P|}\sum_{p\in P} K(p, q)$. The classical approach to KDE is the celebrated fast multipole method of [Greengard and Rokhlin]. The fast multipole method combines a basic space partitioning approach with a multidimensional Taylor expansion, which yields a $\approx \log^d (n/ε)$ query time (exponential in the dimension $d$). A recent line of work initiated by [Charikar and Siminelakis] achieved polynomial dependence on $d$ via a combination of random sampling and randomized space partitioning, with [Backurs et al.] giving an efficient data structure with query time $\approx \mathrm{poly}{\log(1/μ)}/ε^2$ for smooth kernels. Quadratic dependence on $ε$, inherent to the sampling methods, is prohibitively expensive for small $ε$. This issue is addressed by quasi-Monte Carlo methods in numerical analysis. The high level idea in quasi-Monte Carlo methods is to replace random sampling with a discrepancy based approach -- an idea recently applied to coresets for KDE by [Phillips and Tai]. The work of Phillips and Tai gives a space efficient data structure with query complexity $\approx 1/(εμ)$. This is polynomially better in $1/ε$, but exponentially worse in $1/μ$. We achieve the best of both: a data structure with $\approx \mathrm{poly}{\log(1/μ)}/ε$ query time for smooth kernel KDE. Our main insight is a new way to combine discrepancy theory with randomized space partitioning inspired by, but significantly more efficient than, that of the fast multipole methods. We hope that our techniques will find further applications to linear algebra for kernel matrices.

preprint2022arXiv

Almost 3-Approximate Correlation Clustering in Constant Rounds

We study parallel algorithms for correlation clustering. Each pair among $n$ objects is labeled as either "similar" or "dissimilar". The goal is to partition the objects into arbitrarily many clusters while minimizing the number of disagreements with the labels. Our main result is an algorithm that for any $ε> 0$ obtains a $(3+ε)$-approximation in $O(1/ε)$ rounds (of models such as massively parallel computation, local, and semi-streaming). This is a culminating point for the rich literature on parallel correlation clustering. On the one hand, the approximation (almost) matches a natural barrier of 3 for combinatorial algorithms. On the other hand, the algorithm's round-complexity is essentially constant. To achieve this result, we introduce a simple $O(1/ε)$-round parallel algorithm. Our main result is to provide an analysis of this algorithm, showing that it achieves a $(3+ε)$-approximation. Our analysis draws on new connections to sublinear-time algorithms. Specifically, it builds on the work of Yoshida, Yamamoto, and Ito [STOC'09] on bounding the "query complexity" of greedy maximal independent set. To our knowledge, this is the first application of this method in analyzing the approximation ratio of any algorithm.

preprint2022arXiv

Polylogarithmic Sketches for Clustering

Given $n$ points in $\ell_p^d$, we consider the problem of partitioning points into $k$ clusters with associated centers. The cost of a clustering is the sum of $p^{\text{th}}$ powers of distances of points to their cluster centers. For $p \in [1,2]$, we design sketches of size poly$(\log(nd),k,1/ε)$ such that the cost of the optimal clustering can be estimated to within factor $1+ε$, despite the fact that the compressed representation does not contain enough information to recover the cluster centers or the partition into clusters. This leads to a streaming algorithm for estimating the clustering cost with space poly$(\log(nd),k,1/ε)$. We also obtain a distributed memory algorithm, where the $n$ points are arbitrarily partitioned amongst $m$ machines, each of which sends information to a central party who then computes an approximation of the clustering cost. Prior to this work, no such streaming or distributed-memory algorithm was known with sublinear dependence on $d$ for $p \in [1,2)$.

preprint2021arXiv

Approximation Algorithms for Orthogonal Non-negative Matrix Factorization

In the non-negative matrix factorization (NMF) problem, the input is an $m\times n$ matrix $M$ with non-negative entries and the goal is to factorize it as $M\approx AW$. The $m\times k$ matrix $A$ and the $k\times n$ matrix $W$ are both constrained to have non-negative entries. This is in contrast to singular value decomposition, where the matrices $A$ and $W$ can have negative entries but must satisfy the orthogonality constraint: the columns of $A$ are orthogonal and the rows of $W$ are also orthogonal. The orthogonal non-negative matrix factorization (ONMF) problem imposes both the non-negativity and the orthogonality constraints, and previous work showed that it leads to better performances than NMF on many clustering tasks. We give the first constant-factor approximation algorithm for ONMF when one or both of $A$ and $W$ are subject to the orthogonality constraint. We also show an interesting connection to the correlation clustering problem on bipartite graphs. Our experiments on synthetic and real-world data show that our algorithm achieves similar or smaller errors compared to previous ONMF algorithms while ensuring perfect orthogonality (many previous algorithms do not satisfy the hard orthogonality constraint).

preprint2020arXiv

A General Framework for Symmetric Property Estimation

In this paper we provide a general framework for estimating symmetric properties of distributions from i.i.d. samples. For a broad class of symmetric properties we identify the easy region where empirical estimation works and the difficult region where more complex estimators are required. We show that by approximately computing the profile maximum likelihood (PML) distribution \cite{ADOS16} in this difficult region we obtain a symmetric property estimation framework that is sample complexity optimal for many properties in a broader parameter regime than previous universal estimation approaches based on PML. The resulting algorithms based on these pseudo PML distributions are also more practical.

preprint2020arXiv

A Simple Sublinear Algorithm for Gap Edit Distance

We study the problem of estimating the edit distance between two $n$-character strings. While exact computation in the worst case is believed to require near-quadratic time, previous work showed that in certain regimes it is possible to solve the following {\em gap edit distance} problem in sub-linear time: distinguish between inputs of distance $\le k$ and $>k^2$. Our main result is a very simple algorithm for this benchmark that runs in time $\tilde O(n/\sqrt{k})$, and in particular settles the open problem of obtaining a truly sublinear time for the entire range of relevant $k$. Building on the same framework, we also obtain a $k$-vs-$k^2$ algorithm for the one-sided preprocessing model with $\tilde O(n)$ preprocessing time and $\tilde O(n/k)$ query time (improving over a recent $\tilde O(n/k+k^2)$-query time algorithm for the same problem [GRS'20].

preprint2020arXiv

Nearest Neighbor Search for Hyperbolic Embeddings

Embedding into hyperbolic space is emerging as an effective representation technique for datasets that exhibit hierarchical structure. This development motivates the need for algorithms that are able to effectively extract knowledge and insights from datapoints embedded in negatively curved spaces. We focus on the problem of nearest neighbor search, a fundamental problem in data analysis. We present efficient algorithmic solutions that build upon established methods for nearest neighbor search in Euclidean space, allowing for easy adoption and integration with existing systems. We prove theoretical guarantees for our techniques and our experiments demonstrate the effectiveness of our approach on real datasets over competing algorithms.

preprint2020arXiv

New lower bounds for Massively Parallel Computation from query complexity

Roughgarden, Vassilvitskii, and Wang (JACM 18) recently introduced a novel framework for proving lower bounds for Massively Parallel Computation using techniques from boolean function complexity. We extend their framework in two different ways, to capture two common features of Massively Parallel Computation: $\circ$ Adaptivity, where machines can write to and adaptively read from shared memory throughout the execution of the computation. Recent work of Behnezhad et al. (SPAA 19) showed that adaptivity enables significantly improved round complexities for a number of central graph problems. $\circ$ Promise problems, where the algorithm only has to succeed on certain inputs. These inputs may have special structure that is of particular interest, or they may be representative of hard instances of the overall problem. Using this extended framework, we give the first unconditional lower bounds on the complexity of distinguishing whether an input graph is a cycle of length $n$ or two cycles of length $n/2$. This promise problem, 1v2-Cycle, has emerged as a central problem in the study of Massively Parallel Computation. We prove that any adaptive algorithm for the 1v2-Cycle problem with I/O capacity $O(n^{\varepsilon})$ per machine requires $Ω(1/\varepsilon)$ rounds, matching a recent upper bound of Behnezhad et al. In addition to strengthening the connections between Massively Parallel Computation and boolean function complexity, we also develop new machinery to reason about the latter. At the heart of our proofs are optimal lower bounds on the query complexity and approximate certificate complexity of the 1v2-Cycle problem.

preprint2020arXiv

Storyboard: Optimizing Precomputed Summaries for Aggregation

An emerging class of data systems partition their data and precompute approximate summaries (i.e., sketches and samples) for each segment to reduce query costs. They can then aggregate and combine the segment summaries to estimate results without scanning the raw data. However, given limited storage space each summary introduces approximation errors that affect query accuracy. For instance, systems that use existing mergeable summaries cannot reduce query error below the error of an individual precomputed summary. We introduce Storyboard, a query system that optimizes item frequency and quantile summaries for accuracy when aggregating over multiple segments. Compared to conventional mergeable summaries, Storyboard leverages additional memory available for summary construction and aggregation to derive a more precise combined result. This reduces error by up to 25x over interval aggregations and 4.4x over data cube aggregations on industrial datasets compared to standard summarization methods, with provable worst-case error guarantees.

preprint2020arXiv

The Bethe and Sinkhorn Permanents of Low Rank Matrices and Implications for Profile Maximum Likelihood

In this paper we consider the problem of computing the likelihood of the profile of a discrete distribution, i.e., the probability of observing the multiset of element frequencies, and computing a profile maximum likelihood (PML) distribution, i.e., a distribution with the maximum profile likelihood. For each problem we provide polynomial time algorithms that given $n$ i.i.d.\ samples from a discrete distribution, achieve an approximation factor of $\exp\left(-O(\sqrt{n} \log n) \right)$, improving upon the previous best-known bound achievable in polynomial time of $\exp(-O(n^{2/3} \log n))$ (Charikar, Shiragur and Sidford, 2019). Through the work of Acharya, Das, Orlitsky and Suresh (2016), this implies a polynomial time universal estimator for symmetric properties of discrete distributions in a broader range of error parameter. We achieve these results by providing new bounds on the quality of approximation of the Bethe and Sinkhorn permanents (Vontobel, 2012 and 2014). We show that each of these are $\exp(O(k \log(N/k)))$ approximations to the permanent of $N \times N$ matrices with non-negative rank at most $k$, improving upon the previous known bounds of $\exp(O(N))$. To obtain our results on PML, we exploit the fact that the PML objective is proportional to the permanent of a certain Vandermonde matrix with $\sqrt{n}$ distinct columns, i.e. with non-negative rank at most $\sqrt{n}$. As a by-product of our work we establish a surprising connection between the convex relaxation in prior work (CSS19) and the well-studied Bethe and Sinkhorn approximations.