Researcher profile

Robert Krauthgamer

Robert Krauthgamer contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2022arXiv

Almost-Smooth Histograms and Sliding-Window Graph Algorithms

We study algorithms for the sliding-window model, an important variant of the data-stream model, in which the goal is to compute some function of a fixed-length suffix of the stream. We extend the smooth-histogram framework of Braverman and Ostrovsky (FOCS 2007) to almost-smooth functions, which includes all subadditive functions. Specifically, we show that if a subadditive function can be $(1+ε)$-approximated in the insertion-only streaming model, then it can be $(2+ε)$-approximated also in the sliding-window model with space complexity larger by factor $O(ε^{-1}\log w)$, where $w$ is the window size. We demonstrate how our framework yields new approximation algorithms with relatively little effort for a variety of problems that do not admit the smooth-histogram technique. For example, in the frequency-vector model, a symmetric norm is subadditive and thus we obtain a sliding-window $(2+ε)$-approximation algorithm for it. Another example is for streaming matrices, where we derive a new sliding-window $(\sqrt{2}+ε)$-approximation algorithm for Schatten $4$-norm. We then consider graph streams and show that many graph problems are subadditive, including maximum submodular matching, minimum vertex-cover, and maximum $k$-cover, thereby deriving sliding-window $O(1)$-approximation algorithms for them almost for free (using known insertion-only algorithms). Finally, we design for every $d\in (1,2]$ an artificial function, based on the maximum-matching size, whose almost-smoothness parameter is exactly $d$.

preprint2022arXiv

Breaking the Cubic Barrier for All-Pairs Max-Flow: Gomory-Hu Tree in Nearly Quadratic Time

In 1961, Gomory and Hu showed that the All-Pairs Max-Flow problem of computing the max-flow between all $n\choose 2$ pairs of vertices in an undirected graph can be solved using only $n-1$ calls to any (single-pair) max-flow algorithm. Even assuming a linear-time max-flow algorithm, this yields a running time of $O(mn)$, which is $O(n^3)$ when $m = Θ(n^2)$. While subsequent work has improved this bound for various special graph classes, no subcubic-time algorithm has been obtained in the last 60 years for general graphs. We break this longstanding barrier by giving an $\tilde{O}(n^{2})$-time algorithm on general, weighted graphs. Combined with a popular complexity assumption, we establish a counter-intuitive separation: all-pairs max-flows are strictly easier to compute than all-pairs shortest-paths. Our algorithm produces a cut-equivalent tree, known as the Gomory-Hu tree, from which the max-flow value for any pair can be retrieved in near-constant time. For unweighted graphs, we refine our techniques further to produce a Gomory-Hu tree in the time of a poly-logarithmic number of calls to any max-flow algorithm. This shows an equivalence between the all-pairs and single-pair max-flow problems, and is optimal up to poly-logarithmic factors. Using the recently announced $m^{1+o(1)}$-time max-flow algorithm (Chen et al., March 2022), our Gomory-Hu tree algorithm for unweighted graphs also runs in $m^{1+o(1)}$-time.

preprint2022arXiv

Distributed Sparse Normal Means Estimation with Sublinear Communication

We consider the problem of sparse normal means estimation in a distributed setting with communication constraints. We assume there are $M$ machines, each holding $d$-dimensional observations of a $K$-sparse vector $μ$ corrupted by additive Gaussian noise. The $M$ machines are connected in a star topology to a fusion center, whose goal is to estimate the vector $μ$ with a low communication budget. Previous works have shown that to achieve the centralized minimax rate for the $\ell_2$ risk, the total communication must be high - at least linear in the dimension $d$. This phenomenon occurs, however, at very weak signals. We show that at signal-to-noise ratios (SNRs) that are sufficiently high - but not enough for recovery by any individual machine - the support of $μ$ can be correctly recovered with significantly less communication. Specifically, we present two algorithms for distributed estimation of a sparse mean vector corrupted by either Gaussian or sub-Gaussian noise. We then prove that above certain SNR thresholds, with high probability, these algorithms recover the correct support with total communication that is sublinear in the dimension $d$. Furthermore, the communication decreases exponentially as a function of signal strength. If in addition $KM\ll \tfrac{d}{\log d}$, then with an additional round of sublinear communication, our algorithms achieve the centralized rate for the $\ell_2$ risk. Finally, we present simulations that illustrate the performance of our algorithms in different parameter regimes.

preprint2022arXiv

Exact Flow Sparsification Requires Unbounded Size

Given a large edge-capacitated network $G$ and a subset of $k$ vertices called terminals, an (exact) flow sparsifier is a small network $G'$ that preserves (exactly) all multicommodity flows that can be routed between the terminals. Flow sparsifiers were introduced by Leighton and Moitra [STOC 2010], and have been studied and used in many algorithmic contexts. A fundamental question that remained open for over a decade, asks whether every $k$-terminal network admits an exact flow sparsifier whose size is bounded by some function $f(k)$ (regardless of the size of $G$ or its capacities). We resolve this question in the negative by proving that there exist $6$-terminal networks $G$ whose flow sparsifiers $G'$ must have arbitrarily large size. This unboundedness is perhaps surprising, since the analogous sparsification that preserves all terminal cuts (called exact cut sparsifier or mimicking network) admits sparsifiers of size $f_0(k)\leq 2^{2^k}$ [Hagerup, Katajainen, Nishimura, and Ragde, JCSS 1998]. We prove our results by analyzing the set of all feasible demands in the network, known as the demand polytope. We identify an invariant of this polytope, essentially the slope of certain facets, that can be made arbitrarily large even for $k=6$, and implies an explicit lower bound on the size of the network. We further use this technique to answer, again in the negative, an open question of Seymour [JCTB 2015] regarding flow-sparsification that uses only contractions and preserves the infeasibility of one demand vector.

preprint2022arXiv

Faster Algorithms for Orienteering and $k$-TSP

We consider the rooted orienteering problem in Euclidean space: Given $n$ points $P$ in $\mathbb R^d$, a root point $s\in P$ and a budget $\mathcal B>0$, find a path that starts from $s$, has total length at most $\mathcal B$, and visits as many points of $P$ as possible. This problem is known to be NP-hard, hence we study $(1-δ)$-approximation algorithms. The previous Polynomial-Time Approximation Scheme (PTAS) for this problem, due to Chen and Har-Peled (2008), runs in time $n^{O(d\sqrt{d}/δ)}(\log n)^{(d/δ)^{O(d)}}$, and improving on this time bound was left as an open problem. Our main contribution is a PTAS with a significantly improved time complexity of $n^{O(1/δ)}(\log n)^{(d/δ)^{O(d)}}$. A known technique for approximating the orienteering problem is to reduce it to solving $1/δ$ correlated instances of rooted $k$-TSP (a $k$-TSP tour is one that visits at least $k$ points). However, the $k$-TSP tours in this reduction must achieve a certain excess guarantee (namely, their length can surpass the optimum length only in proportion to a parameter of the optimum called excess) that is stronger than the usual $(1+δ)$-approximation. Our main technical contribution is to improve the running time of these $k$-TSP variants, particularly in its dependence on the dimension $d$. Indeed, our running time is polynomial even for a moderately large dimension, roughly up to $d=O(\log\log n)$ instead of $d=O(1)$.

preprint2022arXiv

Near-Linear $\varepsilon$-Emulators for Planar Graphs

We study vertex sparsification for distances, in the setting of planar graphs with distortion: Given a planar graph $G$ (with edge weights) and a subset of $k$ terminal vertices, the goal is to construct an $\varepsilon$-emulator, which is a small planar graph $G'$ that contains the terminals and preserves the distances between the terminals up to factor $1+\varepsilon$. We construct the first $\varepsilon$-emulators for planar graphs of near-linear size $\tilde O(k/\varepsilon^{O(1)})$. In terms of $k$, this is a dramatic improvement over the previous quadratic upper bound of Cheung, Goranci and Henzinger, and breaks below known quadratic lower bounds for exact emulators (the case when $\varepsilon=0$). Moreover, our emulators can be computed in (near-)linear time, which lead to fast $(1+\varepsilon)$-approximation algorithms for basic optimization problems on planar graphs, including multiple-source shortest paths, minimum $(s,t)$-cut, graph diameter, and dynamic distace oracle.

preprint2022arXiv

Optimal Vertex-Cut Sparsification of Quasi-Bipartite Graphs

In vertex-cut sparsification, given a graph $G=(V,E)$ with a terminal set $T\subseteq V$, we wish to construct a graph $G'=(V',E')$ with $T\subseteq V'$, such that for every two sets of terminals $A,B\subseteq T$, the size of a minimum $(A,B)$-vertex-cut in $G'$ is the same as in $G$. In the most basic setting, $G$ is unweighted and undirected, and we wish to bound the size of $G'$ by a function of $k=|T|$. Kratsch and Wahlström [JACM 2020] proved that every graph $G$ (possibly directed), admits a vertex-cut sparsifier $G'$ with $O(k^3)$ vertices, which can in fact be constructed in randomized polynomial time. We study (possibly directed) graphs $G$ that are quasi-bipartite, i.e., every edge has at least one endpoint in $T$, and prove that they admit a vertex-cut sparsifier with $O(k^2)$ edges and vertices, which can in fact be constructed in deterministic polynomial time. In fact, this bound naturally extends to all graphs with a small separator into bounded-size sets. Finally, we prove information-theoretically a nearly-matching lower bound, i.e., that $\tildeΩ(k^2)$ edges are required to sparsify quasi-bipartite undirected graphs.

preprint2020arXiv

Coresets for Clustering in Excluded-minor Graphs and Beyond

Coresets are modern data-reduction tools that are widely used in data analysis to improve efficiency in terms of running time, space and communication complexity. Our main result is a fast algorithm to construct a small coreset for k-Median in (the shortest-path metric of) an excluded-minor graph. Specifically, we give the first coreset of size that depends only on $k$, $ε$ and the excluded-minor size, and our running time is quasi-linear (in the size of the input graph). The main innovation in our new algorithm is that is iterative; it first reduces the $n$ input points to roughly $O(\log n)$ reweighted points, then to $O(\log\log n)$, and so forth until the size is independent of $n$. Each step in this iterative size reduction is based on the importance sampling framework of Feldman and Langberg (STOC 2011), with a crucial adaptation that reduces the number of \emph{distinct points}, by employing a terminal embedding (where low distortion is guaranteed only for the distance from every terminal to all other points). Our terminal embedding is technically involved and relies on shortest-path separators, a standard tool in planar and excluded-minor graphs. Furthermore, our new algorithm is applicable also in Euclidean metrics, by simply using a recent terminal embedding result of Narayanan and Nelson, (STOC 2019), which extends the Johnson-Lindenstrauss Lemma. We thus obtain an efficient coreset construction in high-dimensional Euclidean spaces, thereby matching and simplifying state-of-the-art results (Sohler and Woodruff, FOCS 2018; Huang and Vishnoi, STOC 2020). In addition, we also employ terminal embedding with additive distortion to obtain small coresets in graphs with bounded highway dimension, and use applications of our coresets to obtain improved approximation schemes, e.g., an improved PTAS for planar k-Median via a new centroid set.

preprint2020arXiv

Cut-Equivalent Trees are Optimal for Min-Cut Queries

Min-Cut queries are fundamental: Preprocess an undirected edge-weighted graph, to quickly report a minimum-weight cut that separates a query pair of nodes $s,t$. The best data structure known for this problem simply builds a cut-equivalent tree, discovered 60 years ago by Gomory and Hu, who also showed how to construct it using $n-1$ minimum $st$-cut computations. Using state-of-the-art algorithms for minimum $st$-cut (Lee and Sidford, FOCS 2014) arXiv:1312.6713, one can construct the tree in time $\tilde{O}(mn^{3/2})$, which is also the preprocessing time of the data structure. (Throughout, we focus on polynomially-bounded edge weights, noting that faster algorithms are known for small/unit edge weights.) Our main result shows the following equivalence: Cut-equivalent trees can be constructed in near-linear time if and only if there is a data structure for Min-Cut queries with near-linear preprocessing time and polylogarithmic (amortized) query time, and even if the queries are restricted to a fixed source. That is, equivalent trees are an essentially optimal solution for Min-Cut queries. This equivalence holds even for every minor-closed family of graphs, such as bounded-treewidth graphs, for which a two-decade old data structure (Arikati et al., J.~Algorithms 1998) implies the first near-linear time construction of cut-equivalent trees. Moreover, unlike all previous techniques for constructing cut-equivalent trees, ours is robust to relying on approximation algorithms. In particular, using the almost-linear time algorithm for $(1+ε)$-approximate minimum $st$-cut (Kelner et al., SODA 2014), we can construct a $(1+ε)$-approximate flow-equivalent tree (which is a slightly weaker notion) in time $n^{2+o(1)}$. This leads to the first $(1+ε)$-approximation for All-Pairs Max-Flow that runs in time $n^{2+o(1)}$, and matches the output size almost-optimally.

preprint2020arXiv

Schatten Norms in Matrix Streams: Hello Sparsity, Goodbye Dimension

Spectral functions of large matrices contains important structural information about the underlying data, and is thus becoming increasingly important. Many times, large matrices representing real-world data are \emph{sparse} or \emph{doubly sparse} (i.e., sparse in both rows and columns), and are accessed as a \emph{stream} of updates, typically organized in \emph{row-order}. In this setting, where space (memory) is the limiting resource, all known algorithms require space that is polynomial in the dimension of the matrix, even for sparse matrices. We address this challenge by providing the first algorithms whose space requirement is \emph{independent of the matrix dimension}, assuming the matrix is doubly-sparse and presented in row-order. Our algorithms approximate the Schatten $p$-norms, which we use in turn to approximate other spectral functions, such as logarithm of the determinant, trace of matrix inverse, and Estrada index. We validate these theoretical performance bounds by numerical experiments on real-world matrices representing social networks. We further prove that multiple passes are unavoidable in this setting, and show extensions of our primary technique, including a trade-off between space requirements and number of passes.

preprint2020arXiv

Tight Recovery Guarantees for Orthogonal Matching Pursuit Under Gaussian Noise

Orthogonal Matching pursuit (OMP) is a popular algorithm to estimate an unknown sparse vector from multiple linear measurements of it. Assuming exact sparsity and that the measurements are corrupted by additive Gaussian noise, the success of OMP is often formulated as exactly recovering the support of the sparse vector. Several authors derived a sufficient condition for exact support recovery by OMP with high probability depending on the signal-to-noise ratio, defined as the magnitude of the smallest non-zero coefficient of the vector divided by the noise level. We make two contributions. First, we derive a slightly sharper sufficient condition for two variants of OMP, in which either the sparsity level or the noise level is known. Next, we show that this sharper sufficient condition is tight, in the following sense: for a wide range of problem parameters, there exist a dictionary of linear measurements and a sparse vector with a signal-to-noise ratio slightly below that of the sufficient condition, for which with high probability OMP fails to recover its support. Finally, we present simulations which illustrate that our condition is tight for a much broader range of dictionaries.

preprint2020arXiv

Universal Streaming of Subset Norms

Most known algorithms in the streaming model of computation aim to approximate a single function such as an $\ell_p$-norm. In 2009, Nelson [\url{https://sublinear.info}, Open Problem 30] asked if it possible to design \emph{universal algorithms}, that simultaneously approximate multiple functions of the stream. In this paper we answer the question of Nelson for the class of \emph{subset $\ell_0$-norms} in the insertion-only frequency-vector model. Given a family of subsets $\mathcal{S}\subset 2^{[n]}$, we provide a single streaming algorithm that can $(1\pm ε)$-approximate the subset-norm for every $S\in\mathcal{S}$. Here, the subset-$\ell_p$-norm of $v\in \mathbb{R}^n$ with respect to set $S\subseteq [n]$ is the $\ell_p$-norm of vector $v_{|S}$ (which denotes restricting $v$ to $S$, by zeroing all other coordinates). Our main result is a near-tight characterization of the space complexity of every family $\mathcal{S}\subset 2^{[n]}$ of subset-$\ell_0$-norms in insertion-only streams, expressed in terms of the "heavy-hitter dimension" of $\mathcal{S}$, a new combinatorial quantity that is related to the VC-dimension of $\mathcal{S}$. In contrast, we show that the more general turnstile and sliding-window models require a much larger space usage. All these results easily extend to $\ell_1$. In addition, we design algorithms for two other subset-$\ell_p$-norm variants. These can be compared to the Priority Sampling algorithm of Duffield, Lund and Thorup [JACM 2007], which achieves additive approximation $ε\|{v}\|$ for all possible subsets ($\mathcal{S}=2^{[n]}$) in the entry-wise update model. One of our algorithms extends this algorithm to handle turnstile updates, and another one achieves multiplicative approximation given a family $\mathcal{S}$.