Source author record

Christoph Grunau

Christoph Grunau appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Distributed, Parallel, and Cluster Computing Machine Learning Discrete Mathematics math.CO

Catalog footprint

What is connected

7works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Nearly Tight Analysis of Greedy k-means++

The famous $k$-means++ algorithm of Arthur and Vassilvitskii [SODA 2007] is the most popular way of solving the $k$-means problem in practice. The algorithm is very simple: it samples the first center uniformly at random and each of the following $k-1$ centers is then always sampled proportional to its squared distance to the closest center so far. Afterward, Lloyd's iterative algorithm is run. The $k$-means++ algorithm is known to return a $Θ(\log k)$ approximate solution in expectation. In their seminal work, Arthur and Vassilvitskii [SODA 2007] asked about the guarantees for its following \emph{greedy} variant: in every step, we sample $\ell$ candidate centers instead of one and then pick the one that minimizes the new cost. This is also how $k$-means++ is implemented in e.g. the popular Scikit-learn library [Pedregosa et al.; JMLR 2011]. We present nearly matching lower and upper bounds for the greedy $k$-means++: We prove that it is an $O(\ell^3 \log^3 k)$-approximation algorithm. On the other hand, we prove a lower bound of $Ω(\ell^3 \log^3 k / \log^2(\ell\log k))$. Previously, only an $Ω(\ell \log k)$ lower bound was known [Bhattacharya, Eube, Röglin, Schmidt; ESA 2020] and there was no known upper bound.

preprint2022arXiv

Deterministic Low-Diameter Decompositions for Weighted Graphs and Distributed and Parallel Applications

This paper presents new deterministic and distributed low-diameter decomposition algorithms for weighted graphs. In particular, we show that if one can efficiently compute approximate distances in a parallel or a distributed setting, one can also efficiently compute low-diameter decompositions. This consequently implies solutions to many fundamental distance based problems using a polylogarithmic number of approximate distance computations. Our low-diameter decomposition generalizes and extends the line of work starting from [Rozhoň, Ghaffari STOC 2020] to weighted graphs in a very model-independent manner. Moreover, our clustering results have additional useful properties, including strong-diameter guarantees, separation properties, restricting cluster centers to specified terminals, and more. Applications include: -- The first near-linear work and polylogarithmic depth randomized and deterministic parallel algorithm for low-stretch spanning trees (LSST) with polylogarithmic stretch. Previously, the best parallel LSST algorithm required $m \cdot n^{o(1)}$ work and $n^{o(1)}$ depth and was inherently randomized. No deterministic LSST algorithm with truly sub-quadratic work and sub-linear depth was known. -- The first near-linear work and polylogarithmic depth deterministic algorithm for computing an $\ell_1$-embedding into polylogarithmic dimensional space with polylogarithmic distortion. The best prior deterministic algorithms for $\ell_1$-embeddings either require large polynomial work or are inherently sequential. Even when we apply our techniques to the classical problem of computing a ball-carving with strong-diameter $O(\log^2 n)$ in an unweighted graph, our new clustering algorithm still leads to an improvement in round complexity from $O(\log^{10} n)$ rounds [Chang, Ghaffari PODC 21] to $O(\log^{4} n)$.

preprint2022arXiv

Improved Deterministic Connectivity in Massively Parallel Computation

A long line of research about connectivity in the Massively Parallel Computation model has culminated in the seminal works of Andoni et al. [FOCS'18] and Behnezhad et al. [FOCS'19]. They provide a randomized algorithm for low-space MPC with conjectured to be optimal round complexity $O(\log D + \log \log_{\frac m n} n)$ and $O(m)$ space, for graphs on $n$ vertices with $m$ edges and diameter $D$. Surprisingly, a recent result of Coy and Czumaj [STOC'22] shows how to achieve the same deterministically. Unfortunately, however, their algorithm suffers from large local computation time. We present a deterministic connectivity algorithm that matches all the parameters of the randomized algorithm and, in addition, significantly reduces the local computation time to nearly linear. Our derandomization method is based on reducing the amount of randomness needed to allow for a simpler efficient search. While similar randomness reduction approaches have been used before, our result is not only strikingly simpler, but it is the first to have efficient local computation. This is why we believe it to serve as a starting point for the systematic development of computation-efficient derandomization approaches in low-memory MPC.

preprint2020arXiv

Generalizing the Sharp Threshold Phenomenon for the Distributed Complexity of the Lovász Local Lemma

Recently, Brandt, Maus and Uitto [PODC'19] showed that, in a restricted setting, the dependency of the complexity of the distributed Lovász Local Lemma (LLL) on the chosen LLL criterion exhibits a sharp threshold phenomenon: They proved that, under the LLL criterion $p2^d < 1$, if each random variable affects at most $3$ events, the deterministic complexity of the LLL in the LOCAL model is $O(d^2 + \log^* n)$. In stark contrast, under the criterion $p2^d \leq 1$, there is a randomized lower bound of $Ω(\log \log n)$ by Brandt et al. [STOC'16] and a deterministic lower bound of $Ω(\log n)$ by Chang, Kopelowitz and Pettie [FOCS'16]. Brandt, Maus and Uitto conjectured that the same behavior holds for the unrestricted setting where each random variable affects arbitrarily many events. We prove their conjecture, by providing an algorithm that solves the LLL in time $O(d^2 + \log^* n)$ under the LLL criterion $p2^d < 1$, which is tight in bounded-degree graphs due to an $Ω(\log^* n)$ lower bound by Chung, Pettie and Su [PODC'14]. By the work of Brandt, Maus and Uitto, obtaining such an algorithm can be reduced to proving that all members in a certain family of functions in arbitrarily high dimensions are convex on some specific domain. Unfortunately, an analytical description of these functions is known only for dimension at most $3$, which led to the aforementioned restriction of their result. While obtaining those descriptions for functions of (substantially) higher dimension seems out of the reach of current techniques, we show that their convexity can be inferred by combinatorial means.

preprint2020arXiv

Improved Deterministic Network Decomposition

Network decomposition is a central tool in distributed graph algorithms. We present two improvements on the state of the art for network decomposition, which thus lead to improvements in the (deterministic and randomized) complexity of several well-studied graph problems. - We provide a deterministic distributed network decomposition algorithm with $O(\log^5 n)$ round complexity, using $O(\log n)$-bit messages. This improves on the $O(\log^7 n)$-round algorithm of Rozhoň and Ghaffari [STOC'20], which used large messages, and their $O(\log^8 n)$-round algorithm with $O(\log n)$-bit messages. This directly leads to similar improvements for a wide range of deterministic and randomized distributed algorithms, whose solution relies on network decomposition, including the general distributed derandomization of Ghaffari, Kuhn, and Harris [FOCS'18]. - One drawback of the algorithm of Rozhoň and Ghaffari, in the $\mathsf{CONGEST}$ model, was its dependence on the length of the identifiers. Because of this, for instance, the algorithm could not be used in the shattering framework in the $\mathsf{CONGEST}$ model. Thus, the state of the art randomized complexity of several problems in this model remained with an additive $2^{O(\sqrt{\log\log n})}$ term, which was a clear leftover of the older network decomposition complexity [Panconesi and Srinivasan STOC'92]. We present a modified version that remedies this, constructing a decomposition whose quality does not depend on the identifiers, and thus improves the randomized round complexity for various problems.

preprint2020arXiv

Improved MPC Algorithms for MIS, Matching, and Coloring on Trees and Beyond

We present $O(\log\log n)$ round scalable Massively Parallel Computation algorithms for maximal independent set and maximal matching, in trees and more generally graphs of bounded arboricity, as well as for constant coloring trees. Following the standards, by a scalable MPC algorithm, we mean that these algorithms can work on machines that have capacity/memory as small as $n^δ$ for any positive constant $δ<1$. Our results improve over the $O(\log^2\log n)$ round algorithms of Behnezhad et al. [PODC'19]. Moreover, our matching algorithm is presumably optimal as its bound matches an $Ω(\log\log n)$ conditional lower bound of Ghaffari, Kuhn, and Uitto [FOCS'19].

preprint2020arXiv

k-means++: few more steps yield constant approximation

The k-means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is a state-of-the-art algorithm for solving the k-means clustering problem and is known to give an O(log k)-approximation in expectation. Recently, Lattanzi and Sohler (ICML 2019) proposed augmenting k-means++ with O(k log log k) local search steps to yield a constant approximation (in expectation) to the k-means clustering problem. In this paper, we improve their analysis to show that, for any arbitrarily small constant $\eps > 0$, with only $\eps k$ additional local search steps, one can achieve a constant approximation guarantee (with high probability in k), resolving an open problem in their paper.

Christoph Grunau

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

A Nearly Tight Analysis of Greedy k-means++

Deterministic Low-Diameter Decompositions for Weighted Graphs and Distributed and Parallel Applications

Improved Deterministic Connectivity in Massively Parallel Computation

Generalizing the Sharp Threshold Phenomenon for the Distributed Complexity of the Lovász Local Lemma

Improved Deterministic Network Decomposition

Improved MPC Algorithms for MIS, Matching, and Coloring on Trees and Beyond

k-means++: few more steps yield constant approximation