Source author record

Michael Kapralov

Michael Kapralov appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Machine Learning Discrete Mathematics Distributed, Parallel, and Cluster Computing Networking and Internet Architecture

Catalog footprint

What is connected

17works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Provable Quantization with Randomized Hadamard Transform

Vector quantization via random projection followed by scalar quantization is a fundamental primitive in machine learning, with applications ranging from similarity search to federated learning and KV cache compression. While dense random rotations yield clean theoretical guarantees, they require $Θ(d^2)$ time. The randomized Hadamard transform $HD$ reduces this cost to $O(d \log d)$, but its discrete structure complicates analysis and leads to weaker or purely empirical compression guarantees. In this work, we study a variant of this approach: dithered quantization with a single randomized Hadamard transform. Specifically, the quantizer applies $HD$ to the input vector and subtracts a random scalar offset before quantizing, injecting additional randomness at negligible cost. We prove that this approach is unbiased and provides mean squared error bounds that asymptotically match those achievable with truly random rotation matrices. In particular, we prove that a dithered version of TurboQuant achieves mean squared error $\bigl(π\sqrt{3}/2 + o(1)\bigr) \cdot 4^{-b}$ at $b$ bits per coordinate, where the $o(1)$ term vanishes uniformly over all unit vectors and all dimensions as the number of quantization levels grows.

preprint2026arXiv

Spectral Clustering in Birthday Paradox Time

Given a vertex in a $(k, φ, ε)$-clusterable graph, i.e. a graph whose vertex set can be partitioned into a disjoint union of $φ$-expanders of size $\approx n/k$ with outer conductance bounded by $ε$, can one quickly tell which cluster it belongs to? This question goes back to the expansion testing problem of Goldreich and Ron'11. For $k=2$ a sample of $\approx n^{1/2+O(ε/φ^2)}$ logarithmic length walks from a given vertex approximately determines its cluster membership by the birthday paradox: two vertices whose random walk samples are `close' are likely in the same cluster. The study of the general case $k>2$ was initiated by Czumaj, Peng and Sohler [STOC'15], and the works of Chiplunkar et al. [FOCS'18], Gluch et al. [SODA'21] showed that $\approx \text{poly}(k)\cdot n^{1/2+O(ε/φ^2)}$ random walk samples suffice for general $k$. This matches the $k=2$ result up to polynomial factors in $k$, but creates a conceptual inconsistency: if the birthday paradox is the guiding phenomenon, then the query complexity should decrease with the number of clusters $k$! Since clusters have size $\approx n/k$, we expect to need $\approx (n/k)^{1/2+O(ε/φ^2)}$ random walk samples, which decreases with $k$. We design a novel representation of vertices in a $(k, φ, ε)$-clusterable graph by a mixture of logarithmic length walks. This representation uses the optimal $\approx (n/k)^{1/2+O(ε/φ^2)}$ walks per vertex, and allows for a fast nearest neighbor search: given $k$ vertices representing the clusters, we can find the cluster of a given query vertex $x$ using nearly linear time in the representation size of $x$. This gives a clustering oracle with query time $\approx (n/k)^{1/2+O(ε/φ^2)}$ and space complexity $k\cdot (n/k)^{1/2+O(ε/φ^2)}$, matching the birthday paradox bound.

preprint2024arXiv

A Quasi-Monte Carlo Data Structure for Smooth Kernel Evaluations

In the kernel density estimation (KDE) problem one is given a kernel $K(x, y)$ and a dataset $P$ of points in a Euclidean space, and must prepare a data structure that can quickly answer density queries: given a point $q$, output a $(1+ε)$-approximation to $μ:=\frac1{|P|}\sum_{p\in P} K(p, q)$. The classical approach to KDE is the celebrated fast multipole method of [Greengard and Rokhlin]. The fast multipole method combines a basic space partitioning approach with a multidimensional Taylor expansion, which yields a $\approx \log^d (n/ε)$ query time (exponential in the dimension $d$). A recent line of work initiated by [Charikar and Siminelakis] achieved polynomial dependence on $d$ via a combination of random sampling and randomized space partitioning, with [Backurs et al.] giving an efficient data structure with query time $\approx \mathrm{poly}{\log(1/μ)}/ε^2$ for smooth kernels. Quadratic dependence on $ε$, inherent to the sampling methods, is prohibitively expensive for small $ε$. This issue is addressed by quasi-Monte Carlo methods in numerical analysis. The high level idea in quasi-Monte Carlo methods is to replace random sampling with a discrepancy based approach -- an idea recently applied to coresets for KDE by [Phillips and Tai]. The work of Phillips and Tai gives a space efficient data structure with query complexity $\approx 1/(εμ)$. This is polynomially better in $1/ε$, but exponentially worse in $1/μ$. We achieve the best of both: a data structure with $\approx \mathrm{poly}{\log(1/μ)}/ε$ query time for smooth kernel KDE. Our main insight is a new way to combine discrepancy theory with randomized space partitioning inspired by, but significantly more efficient than, that of the fast multipole methods. We hope that our techniques will find further applications to linear algebra for kernel matrices.

preprint2022arXiv

Motif Cut Sparsifiers

A motif is a frequently occurring subgraph of a given directed or undirected graph $G$. Motifs capture higher order organizational structure of $G$ beyond edge relationships, and, therefore, have found wide applications such as in graph clustering, community detection, and analysis of biological and physical networks to name a few. In these applications, the cut structure of motifs plays a crucial role as vertices are partitioned into clusters by cuts whose conductance is based on the number of instances of a particular motif, as opposed to just the number of edges, crossing the cuts. In this paper, we introduce the concept of a motif cut sparsifier. We show that one can compute in polynomial time a sparse weighted subgraph $G'$ with only $\widetilde{O}(n/ε^2)$ edges such that for every cut, the weighted number of copies of $M$ crossing the cut in $G'$ is within a $1+ε$ factor of the number of copies of $M$ crossing the cut in $G$, for every constant size motif $M$. Our work carefully combines the viewpoints of both graph sparsification and hypergraph sparsification. We sample edges which requires us to extend and strengthen the concept of cut sparsifiers introduced in the seminal work of to the motif setting. We adapt the importance sampling framework through the viewpoint of hypergraph sparsification by deriving the edge sampling probabilities from the strong connectivity values of a hypergraph whose hyperedges represent motif instances. Finally, an iterative sparsification primitive inspired by both viewpoints is used to reduce the number of edges in $G$ to nearly linear. In addition, we present a strong lower bound ruling out a similar result for sparsification with respect to induced occurrences of motifs.

preprint2022arXiv

Noisy Boolean Hidden Matching with Applications

The Boolean Hidden Matching (BHM) problem, introduced in a seminal paper of Gavinsky et. al. [STOC'07], has played an important role in the streaming lower bounds for graph problems such as triangle and subgraph counting, maximum matching, MAX-CUT, Schatten $p$-norm approximation, maximum acyclic subgraph, testing bipartiteness, $k$-connectivity, and cycle-freeness. The one-way communication complexity of the Boolean Hidden Matching problem on a universe of size $n$ is $Θ(\sqrt{n})$, resulting in $Ω(\sqrt{n})$ lower bounds for constant factor approximations to several of the aforementioned graph problems. The related (and, in fact, more general) Boolean Hidden Hypermatching (BHH) problem introduced by Verbin and Yu [SODA'11] provides an approach to proving higher lower bounds of $Ω(n^{1-1/t})$ for integer $t\geq 2$. Reductions based on Boolean Hidden Hypermatching generate distributions on graphs with connected components of diameter about $t$, and basically show that long range exploration is hard in the streaming model of computation with adversarial arrivals. In this paper we introduce a natural variant of the BHM problem, called noisy BHM (and its natural noisy BHH variant), that we use to obtain higher than $Ω(\sqrt{n})$ lower bounds for approximating several of the aforementioned problems in graph streams when the input graphs consist only of components of diameter bounded by a fixed constant. We also use the noisy BHM problem to show that the problem of classifying whether an underlying graph is isomorphic to a complete binary tree in insertion-only streams requires $Ω(n)$ space, which seems challenging to show using BHM or BHH alone.

preprint2020arXiv

Scaling up Kernel Ridge Regression via Locality Sensitive Hashing

Random binning features, introduced in the seminal paper of Rahimi and Recht (2007), are an efficient method for approximating a kernel matrix using locality sensitive hashing. Random binning features provide a very simple and efficient way of approximating the Laplace kernel but unfortunately do not apply to many important classes of kernels, notably ones that generate smooth Gaussian processes, such as the Gaussian kernel and Matern kernel. In this paper, we introduce a simple weighted version of random binning features and show that the corresponding kernel function generates Gaussian processes of any desired smoothness. We show that our weighted random binning features provide a spectral approximation to the corresponding kernel matrix, leading to efficient algorithms for kernel ridge regression. Experiments on large scale regression datasets show that our method outperforms the accuracy of random Fourier features method.

preprint2016arXiv

Sparse Fourier Transform in Any Constant Dimension with Nearly-Optimal Sample Complexity in Sublinear Time

We consider the problem of computing a $k$-sparse approximation to the Fourier transform of a length $N$ signal. Our main result is a randomized algorithm for computing such an approximation (i.e. achieving the $\ell_2/\ell_2$ sparse recovery guarantees using Fourier measurements) using $O_d(k\log N\log\log N)$ samples of the signal in time domain that runs in time $O_d(k\log^{d+3} N)$, where $d\geq 1$ is the dimensionality of the Fourier transform. The sample complexity matches the lower bound of $Ω(k\log (N/k))$ for non-adaptive algorithms due to \cite{DIPW} for any $k\leq N^{1-δ}$ for a constant $δ>0$ up to an $O(\log\log N)$ factor. Prior to our work a result with comparable sample complexity $k\log N \log^{O(1)}\log N$ and sublinear runtime was known for the Fourier transform on the line \cite{IKP}, but for any dimension $d\geq 2$ previously known techniques either suffered from a polylogarithmic factor loss in sample complexity or required $Ω(N)$ runtime.

preprint2016arXiv

Subgraph Counting: Color Coding Beyond Trees

The problem of counting occurrences of query graphs in a large data graph, known as subgraph counting, is fundamental to several domains such as genomics and social network analysis. Many important special cases (e.g. triangle counting) have received significant attention. Color coding is a very general and powerful algorithmic technique for subgraph counting. Color coding has been shown to be effective in several applications, but scalable implementations are only known for the special case of {\em tree queries} (i.e. queries of treewidth one). In this paper we present the first efficient distributed implementation for color coding that goes beyond tree queries: our algorithm applies to any query graph of treewidth $2$. Since tree queries can be solved in time linear in the size of the data graph, our contribution is the first step into the realm of colour coding for queries that require superlinear running time in the worst case. This superlinear complexity leads to significant load balancing problems on graphs with heavy tailed degree distributions. Our algorithm structures the computation to work around high degree nodes in the data graph, and achieves very good runtime and scalability on a diverse collection of data and query graph pairs as a result. We also provide theoretical analysis of our algorithmic techniques, showing asymptotic improvements in runtime on random graphs with power law degree distributions, a popular model for real world graphs.

preprint2015arXiv

Single Pass Spectral Sparsification in Dynamic Streams

We present the first single pass algorithm for computing spectral sparsifiers of graphs in the dynamic semi-streaming model. Given a single pass over a stream containing insertions and deletions of edges to a graph G, our algorithm maintains a randomized linear sketch of the incidence matrix of G into dimension O((1/epsilon^2) n polylog(n)). Using this sketch, at any point, the algorithm can output a (1 +/- epsilon) spectral sparsifier for G with high probability. While O((1/epsilon^2) n polylog(n)) space algorithms are known for computing "cut sparsifiers" in dynamic streams [AGM12b, GKP12] and spectral sparsifiers in "insertion-only" streams [KL11], prior to our work, the best known single pass algorithm for maintaining spectral sparsifiers in dynamic streams required sketches of dimension Omega((1/epsilon^2) n^(5/3)) [AGM14]. To achieve our result, we show that, using a coarse sparsifier of G and a linear sketch of G's incidence matrix, it is possible to sample edges by effective resistance, obtaining a spectral sparsifier of arbitrary precision. Sampling from the sketch requires a novel application of ell_2/ell_2 sparse recovery, a natural extension of the ell_0 methods used for cut sparsifiers in [AGM12b]. Recent work of [MP12] on row sampling for matrix approximation gives a recursive approach for obtaining the required coarse sparsifiers. Under certain restrictions, our approach also extends to the problem of maintaining a spectral approximation for a general matrix A^T A given a stream of updates to rows in A.

preprint2014arXiv

Sample-Optimal Fourier Sampling in Any Constant Dimension -- Part I

We give an algorithm for $\ell_2/\ell_2$ sparse recovery from Fourier measurements using $O(k\log N)$ samples, matching the lower bound of \cite{DIPW} for non-adaptive algorithms up to constant factors for any $k\leq N^{1-δ}$. The algorithm runs in $\tilde O(N)$ time. Our algorithm extends to higher dimensions, leading to sample complexity of $O_d(k\log N)$, which is optimal up to constant factors for any $d=O(1)$. These are the first sample optimal algorithms for these problems. A preliminary experimental evaluation indicates that our algorithm has empirical sampling complexity comparable to that of other recovery methods known in the literature, while providing strong provable guarantees on the recovery quality.

preprint2014arXiv

Streaming Lower Bounds for Approximating MAX-CUT

We consider the problem of estimating the value of max cut in a graph in the streaming model of computation. At one extreme, there is a trivial $2$-approximation for this problem that uses only $O(\log n)$ space, namely, count the number of edges and output half of this value as the estimate for max cut value. On the other extreme, if one allows $\tilde{O}(n)$ space, then a near-optimal solution to the max cut value can be obtained by storing an $\tilde{O}(n)$-size sparsifier that essentially preserves the max cut. An intriguing question is if poly-logarithmic space suffices to obtain a non-trivial approximation to the max-cut value (that is, beating the factor $2$). It was recently shown that the problem of estimating the size of a maximum matching in a graph admits a non-trivial approximation in poly-logarithmic space. Our main result is that any streaming algorithm that breaks the $2$-approximation barrier requires $\tildeΩ(\sqrt{n})$ space even if the edges of the input graph are presented in random order. Our result is obtained by exhibiting a distribution over graphs which are either bipartite or $\frac{1}{2}$-far from being bipartite, and establishing that $\tildeΩ(\sqrt{n})$ space is necessary to differentiate between these two cases. Thus as a direct corollary we obtain that $\tildeΩ(\sqrt{n})$ space is also necessary to test if a graph is bipartite or $\frac{1}{2}$-far from being bipartite. We also show that for any $ε> 0$, any streaming algorithm that obtains a $(1 + ε)$-approximation to the max cut value when edges arrive in adversarial order requires $n^{1 - O(ε)}$ space, implying that $Ω(n)$ space is necessary to obtain an arbitrarily good approximation to the max cut value.

preprint2013arXiv

Online submodular welfare maximization: Greedy is optimal

We prove that no online algorithm (even randomized, against an oblivious adversary) is better than 1/2-competitive for welfare maximization with coverage valuations, unless $NP = RP$. Since the Greedy algorithm is known to be 1/2-competitive for monotone submodular valuations, of which coverage is a special case, this proves that Greedy provides the optimal competitive ratio. On the other hand, we prove that Greedy in a stochastic setting with i.i.d.items and valuations satisfying diminishing returns is $(1-1/e)$-competitive, which is optimal even for coverage valuations, unless $NP=RP$. For online budget-additive allocation, we prove that no algorithm can be 0.612-competitive with respect to a natural LP which has been used previously for this problem.

preprint2012arXiv

Optimal bandwidth-aware VM allocation for Infrastructure-as-a-Service

Infrastructure-as-a-Service (IaaS) providers need to offer richer services to be competitive while optimizing their resource usage to keep costs down. Richer service offerings include new resource request models involving bandwidth guarantees between virtual machines (VMs). Thus we consider the following problem: given a VM request graph (where nodes are VMs and edges represent virtual network connectivity between the VMs) and a real data center topology, find an allocation of VMs to servers that satisfies the bandwidth guarantees for every virtual network edge---which maps to a path in the physical network---and minimizes congestion of the network. Previous work has shown that for arbitrary networks and requests, finding the optimal embedding satisfying bandwidth requests is $\mathcal{NP}$-hard. However, in most data center architectures, the routing protocols employed are based on a spanning tree of the physical network. In this paper, we prove that the problem remains $\mathcal{NP}$-hard even when the physical network topology is restricted to be a tree, and the request graph topology is also restricted. We also present a dynamic programming algorithm for computing the optimal embedding in a tree network which runs in time $O(3^kn)$, where $n$ is the number of nodes in the physical topology and $k$ is the size of the request graph, which is well suited for practical requests which have small $k$. Such requests form a large class of web-service and enterprise workloads. Also, if we restrict the requests topology to a clique (all VMs connected to a virtual switch with uniform bandwidth requirements), we show that the dynamic programming algorithm can be modified to output the minimum congestion embedding in time $O(k^2n)$.

preprint2012arXiv

Prediction strategies without loss

Consider a sequence of bits where we are trying to predict the next bit from the previous bits. Assume we are allowed to say 'predict 0' or 'predict 1', and our payoff is +1 if the prediction is correct and -1 otherwise. We will say that at each point in time the loss of an algorithm is the number of wrong predictions minus the number of right predictions so far. In this paper we are interested in algorithms that have essentially zero (expected) loss over any string at any point in time and yet have small regret with respect to always predicting 0 or always predicting 1. For a sequence of length $T$ our algorithm has regret $14εT $ and loss $2\sqrt{T}e^{-ε^2 T} $ in expectation for all strings. We show that the tradeoff between loss and regret is optimal up to constant factors. Our techniques extend to the general setting of $N$ experts, where the related problem of trading off regret to the best expert for regret to the `special' expert has been studied by Even-Dar et al. (COLT'07). We obtain essentially zero loss with respect to the special expert and optimal loss/regret tradeoff, improving upon the results of Even-Dar et al and settling the main question left open in their paper. The strong loss bounds of the algorithm have some surprising consequences. A simple iterative application of our algorithm gives essentially optimal regret bounds at multiple time scales, bounds with respect to $k$-shifting optima as well as regret bounds with respect to higher norms of the input sequence.

preprint2012arXiv

Single pass sparsification in the streaming model with edge deletions

In this paper we give a construction of cut sparsifiers of Benczur and Karger in the {\em dynamic} streaming setting in a single pass over the data stream. Previous constructions either required multiple passes or were unable to handle edge deletions. We use $\tilde{O}(1/\e^2)$ time for each stream update and $\tilde{O}(n/\e^2)$ time to construct a sparsifier. Our $\e$-sparsifiers have $O(n\log^3 n/\e^2)$ edges. The main tools behind our result are an application of sketching techniques of Ahn et al.[SODA'12] to estimate edge connectivity together with a novel application of sampling with limited independence and sparse recovery to produce the edges of the sparsifier.

preprint2010arXiv

Graph Sparsification via Refinement Sampling

A graph G'(V,E') is an \eps-sparsification of G for some \eps>0, if every (weighted) cut in G' is within (1\pm \eps) of the corresponding cut in G. A celebrated result of Benczur and Karger shows that for every undirected graph G, an \eps-sparsification with O(n\log n/\e^2) edges can be constructed in O(m\log^2n) time. Applications to modern massive data sets often constrain algorithms to use computation models that restrict random access to the input. The semi-streaming model, in which the algorithm is constrained to use \tilde O(n) space, has been shown to be a good abstraction for analyzing graph algorithms in applications to large data sets. Recently, a semi-streaming algorithm for graph sparsification was presented by Anh and Guha; the total running time of their implementation is Ω(mn), too large for applications where both space and time are important. In this paper, we introduce a new technique for graph sparsification, namely refinement sampling, that gives an \tilde{O}(m) time semi-streaming algorithm for graph sparsification. Specifically, we show that refinement sampling can be used to design a one-pass streaming algorithm for sparsification that takes O(\log\log n) time per edge, uses O(\log^2 n) space per node, and outputs an \eps-sparsifier with O(n\log^3 n/\eps^2) edges. At a slightly increased space and time complexity, we can reduce the sparsifier size to O(n \log n/\e^2) edges matching the Benczur-Karger result, while improving upon the Benczur-Karger runtime for m=ω(n\log^3 n). Finally, we show that an \eps-sparsifier with O(n \log n/\eps^2) edges can be constructed in two passes over the data and O(m) time whenever m =Ω(n^{1+δ}) for some constant δ>0. As a by-product of our approach, we also obtain an O(m\log\log n+n \log n) time streaming algorithm to compute a sparse k-connectivity certificate of a graph.

preprint2010arXiv

Perfect Matchings in O(n \log n) Time in Regular Bipartite Graphs

In this paper we consider the well-studied problem of finding a perfect matching in a d-regular bipartite graph on 2n nodes with m=nd edges. The best-known algorithm for general bipartite graphs (due to Hopcroft and Karp) takes time O(m\sqrt{n}). In regular bipartite graphs, however, a matching is known to be computable in O(m) time (due to Cole, Ost and Schirra). In a recent line of work by Goel, Kapralov and Khanna the O(m) time algorithm was improved first to \tilde O(min{m, n^{2.5}/d}) and then to \tilde O(min{m, n^2/d}). It was also shown that the latter algorithm is optimal up to polylogarithmic factors among all algorithms that use non-adaptive uniform sampling to reduce the size of the graph as a first step. In this paper, we give a randomized algorithm that finds a perfect matching in a d-regular graph and runs in O(n\log n) time (both in expectation and with high probability). The algorithm performs an appropriately truncated random walk on a modified graph to successively find augmenting paths. Our algorithm may be viewed as using adaptive uniform sampling, and is thus able to bypass the limitations of (non-adaptive) uniform sampling established in earlier work. We also show that randomization is crucial for obtaining o(nd) time algorithms by establishing an Ω(nd) lower bound for any deterministic algorithm. Our techniques also give an algorithm that successively finds a matching in the support of a doubly stochastic matrix in expected time O(n\log^2 n) time, with O(m) pre-processing time; this gives a simple O(m+mn\log^2 n) time algorithm for finding the Birkhoff-von Neumann decomposition of a doubly stochastic matrix.

Michael Kapralov

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Provable Quantization with Randomized Hadamard Transform

Spectral Clustering in Birthday Paradox Time

A Quasi-Monte Carlo Data Structure for Smooth Kernel Evaluations

Motif Cut Sparsifiers

Noisy Boolean Hidden Matching with Applications

Scaling up Kernel Ridge Regression via Locality Sensitive Hashing

Sparse Fourier Transform in Any Constant Dimension with Nearly-Optimal Sample Complexity in Sublinear Time

Subgraph Counting: Color Coding Beyond Trees

Single Pass Spectral Sparsification in Dynamic Streams

Sample-Optimal Fourier Sampling in Any Constant Dimension -- Part I

Streaming Lower Bounds for Approximating MAX-CUT

Online submodular welfare maximization: Greedy is optimal

Optimal bandwidth-aware VM allocation for Infrastructure-as-a-Service

Prediction strategies without loss

Single pass sparsification in the streaming model with edge deletions

Graph Sparsification via Refinement Sampling

Perfect Matchings in O(n \log n) Time in Regular Bipartite Graphs