Source author record

Václav Rozhoň

Václav Rozhoň appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms math.CO Distributed, Parallel, and Cluster Computing Discrete Mathematics Machine Learning

Catalog footprint

What is connected

9works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Nearly Tight Analysis of Greedy k-means++

The famous $k$-means++ algorithm of Arthur and Vassilvitskii [SODA 2007] is the most popular way of solving the $k$-means problem in practice. The algorithm is very simple: it samples the first center uniformly at random and each of the following $k-1$ centers is then always sampled proportional to its squared distance to the closest center so far. Afterward, Lloyd's iterative algorithm is run. The $k$-means++ algorithm is known to return a $Θ(\log k)$ approximate solution in expectation. In their seminal work, Arthur and Vassilvitskii [SODA 2007] asked about the guarantees for its following \emph{greedy} variant: in every step, we sample $\ell$ candidate centers instead of one and then pick the one that minimizes the new cost. This is also how $k$-means++ is implemented in e.g. the popular Scikit-learn library [Pedregosa et al.; JMLR 2011]. We present nearly matching lower and upper bounds for the greedy $k$-means++: We prove that it is an $O(\ell^3 \log^3 k)$-approximation algorithm. On the other hand, we prove a lower bound of $Ω(\ell^3 \log^3 k / \log^2(\ell\log k))$. Previously, only an $Ω(\ell \log k)$ lower bound was known [Bhattacharya, Eube, Röglin, Schmidt; ESA 2020] and there was no known upper bound.

preprint2022arXiv

Deterministic Low-Diameter Decompositions for Weighted Graphs and Distributed and Parallel Applications

This paper presents new deterministic and distributed low-diameter decomposition algorithms for weighted graphs. In particular, we show that if one can efficiently compute approximate distances in a parallel or a distributed setting, one can also efficiently compute low-diameter decompositions. This consequently implies solutions to many fundamental distance based problems using a polylogarithmic number of approximate distance computations. Our low-diameter decomposition generalizes and extends the line of work starting from [Rozhoň, Ghaffari STOC 2020] to weighted graphs in a very model-independent manner. Moreover, our clustering results have additional useful properties, including strong-diameter guarantees, separation properties, restricting cluster centers to specified terminals, and more. Applications include: -- The first near-linear work and polylogarithmic depth randomized and deterministic parallel algorithm for low-stretch spanning trees (LSST) with polylogarithmic stretch. Previously, the best parallel LSST algorithm required $m \cdot n^{o(1)}$ work and $n^{o(1)}$ depth and was inherently randomized. No deterministic LSST algorithm with truly sub-quadratic work and sub-linear depth was known. -- The first near-linear work and polylogarithmic depth deterministic algorithm for computing an $\ell_1$-embedding into polylogarithmic dimensional space with polylogarithmic distortion. The best prior deterministic algorithms for $\ell_1$-embeddings either require large polynomial work or are inherently sequential. Even when we apply our techniques to the classical problem of computing a ball-carving with strong-diameter $O(\log^2 n)$ in an unweighted graph, our new clustering algorithm still leads to an improvement in round complexity from $O(\log^{10} n)$ rounds [Chang, Ghaffari PODC 21] to $O(\log^{4} n)$.

preprint2021arXiv

Cut distance identifying graphon parameters over weak* limits

The theory of graphons comes with the so-called cut norm and the derived cut distance. The cut norm is finer than the weak* topology (when considering the predual of $L^{1}$-functions). Doležal and Hladký [J. Combin. Theory Ser. B 137 (2019), 232-263] showed, that given a sequence of graphons, a cut distance accumulation graphon can be pinpointed in the set of weak* accumulation points as a minimizer of the entropy. Motivated by this, we study graphon parameters with the property that their minimizers or maximizers identify cut distance accumulation points over the set of weak* accumulation points. We call such parameters cut distance identifying. Of particular importance are cut distance identifying parameters coming from homomorphism densities, $t(H,\cdot)$. This concept is closely related to the emerging field of graph norms, and the notions of the step Sidorenko property and the step forcing property introduced by Kráľ, Martins, Pach and Wrochna [J. Combin. Theory Ser. A 162 (2019), 34-54]. We prove that a connected graph is weakly norming if and only if it is step Sidorenko, and that if a graph is norming then it is step forcing. Further, we study convexity properties of cut distance identifying graphon parameters, and find a way to identify cut distance limits using spectra of graphons. We also show that continuous cut distance identifying graphon parameters have the «pumping property», and thus can be used in the proof of the Frieze-Kannan regularity lemma.

preprint2020arXiv

Generalizing the Sharp Threshold Phenomenon for the Distributed Complexity of the Lovász Local Lemma

Recently, Brandt, Maus and Uitto [PODC'19] showed that, in a restricted setting, the dependency of the complexity of the distributed Lovász Local Lemma (LLL) on the chosen LLL criterion exhibits a sharp threshold phenomenon: They proved that, under the LLL criterion $p2^d < 1$, if each random variable affects at most $3$ events, the deterministic complexity of the LLL in the LOCAL model is $O(d^2 + \log^* n)$. In stark contrast, under the criterion $p2^d \leq 1$, there is a randomized lower bound of $Ω(\log \log n)$ by Brandt et al. [STOC'16] and a deterministic lower bound of $Ω(\log n)$ by Chang, Kopelowitz and Pettie [FOCS'16]. Brandt, Maus and Uitto conjectured that the same behavior holds for the unrestricted setting where each random variable affects arbitrarily many events. We prove their conjecture, by providing an algorithm that solves the LLL in time $O(d^2 + \log^* n)$ under the LLL criterion $p2^d < 1$, which is tight in bounded-degree graphs due to an $Ω(\log^* n)$ lower bound by Chung, Pettie and Su [PODC'14]. By the work of Brandt, Maus and Uitto, obtaining such an algorithm can be reduced to proving that all members in a certain family of functions in arbitrarily high dimensions are convex on some specific domain. Unfortunately, an analytical description of these functions is known only for dimension at most $3$, which led to the aforementioned restriction of their result. While obtaining those descriptions for functions of (substantially) higher dimension seems out of the reach of current techniques, we show that their convexity can be inferred by combinatorial means.

preprint2020arXiv

Improved Deterministic Network Decomposition

Network decomposition is a central tool in distributed graph algorithms. We present two improvements on the state of the art for network decomposition, which thus lead to improvements in the (deterministic and randomized) complexity of several well-studied graph problems. - We provide a deterministic distributed network decomposition algorithm with $O(\log^5 n)$ round complexity, using $O(\log n)$-bit messages. This improves on the $O(\log^7 n)$-round algorithm of Rozhoň and Ghaffari [STOC'20], which used large messages, and their $O(\log^8 n)$-round algorithm with $O(\log n)$-bit messages. This directly leads to similar improvements for a wide range of deterministic and randomized distributed algorithms, whose solution relies on network decomposition, including the general distributed derandomization of Ghaffari, Kuhn, and Harris [FOCS'18]. - One drawback of the algorithm of Rozhoň and Ghaffari, in the $\mathsf{CONGEST}$ model, was its dependence on the length of the identifiers. Because of this, for instance, the algorithm could not be used in the shattering framework in the $\mathsf{CONGEST}$ model. Thus, the state of the art randomized complexity of several problems in this model remained with an additive $2^{O(\sqrt{\log\log n})}$ term, which was a clear leftover of the older network decomposition complexity [Panconesi and Srinivasan STOC'92]. We present a modified version that remedies this, constructing a decomposition whose quality does not depend on the identifiers, and thus improves the randomized round complexity for various problems.

preprint2020arXiv

k-means++: few more steps yield constant approximation

The k-means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is a state-of-the-art algorithm for solving the k-means clustering problem and is known to give an O(log k)-approximation in expectation. Recently, Lattanzi and Sohler (ICML 2019) proposed augmenting k-means++ with O(k log log k) local search steps to yield a constant approximation (in expectation) to the k-means clustering problem. In this paper, we improve their analysis to show that, for any arbitrarily small constant $\eps > 0$, with only $\eps k$ additional local search steps, one can achieve a constant approximation guarantee (with high probability in k), resolving an open problem in their paper.

preprint2020arXiv

Polylogarithmic-Time Deterministic Network Decomposition and Distributed Derandomization

We present a simple polylogarithmic-time deterministic distributed algorithm for network decomposition. This improves on a celebrated $2^{O(\sqrt{\log n})}$-time algorithm of Panconesi and Srinivasan [STOC'92] and settles a central and long-standing question in distributed graph algorithms. It also leads to the first polylogarithmic-time deterministic distributed algorithms for numerous other problems, hence resolving several well-known and decades-old open problems, including Linial's question about the deterministic complexity of maximal independent set [FOCS'87; SICOMP'92]---which had been called the most outstanding problem in the area. The main implication is a more general distributed derandomization theorem: Put together with the results of Ghaffari, Kuhn, and Maus [STOC'17] and Ghaffari, Harris, and Kuhn [FOCS'18], our network decomposition implies that $$\mathsf{P}\textit{-}\mathsf{RLOCAL} = \mathsf{P}\textit{-}\mathsf{LOCAL}.$$ That is, for any problem whose solution can be checked deterministically in polylogarithmic-time, any polylogarithmic-time randomized algorithm can be derandomized to a polylogarithmic-time deterministic algorithm. Informally, for the standard first-order interpretation of efficiency as polylogarithmic-time, distributed algorithms do not need randomness for efficiency. By known connections, our result leads also to substantially faster randomized distributed algorithms for a number of well-studied problems including $(Δ+1)$-coloring, maximal independent set, and Lovász Local Lemma, as well as massively parallel algorithms for $(Δ+1)$-coloring.

preprint2020arXiv

Relating the cut distance and the weak* topology for graphons

The theory of graphons is ultimately connected with the so-called cut norm. In this paper, we approach the cut norm topology via the weak* topology (when considering a predual of $L^{1}$-functions). We prove that a sequence $W_1,W_2,W_3,\ldots$ of graphons converges in the cut distance if and only if we have equality of the sets of weak* accumulation points and of weak* limit points of all sequences of graphons $W_1',W_2',W_3',\ldots$ that are weakly isomorphic to $W_1,W_2,W_3,\ldots$. We further give a short descriptive set theoretic argument that each sequence of graphons contains a subsequence with the property above. This in particular provides an alternative proof of the theorem of Lovász and Szegedy about compactness of the space of graphons. We connect these results to "multiway cut" characterization of cut distance convergence from [Ann. of Math. (2) 176 (2012), no. 1, 151-219]. These results are more naturally phrased in the Vietoris hyperspace $K$ over graphons with the weak* topology. We show that graphons with the cut distance topology are homeomorphic to a closed subset of $K$, and deduce several consequences of this fact. From these concepts a new order on the space of graphons emerges. This order allows to compare how structured two graphons are. We establish basic properties of this "structurdness order".

preprint2020arXiv

Simple and sharp analysis of k-means||

We present a simple analysis of k-means|| (Bahmani et al., PVLDB 2012) -- a distributed variant of the k-means++ algorithm (Arthur and Vassilvitskii, SODA 2007). Moreover, the bound on the number of rounds is improved from $O(\log n)$ to $O(\log n / \log\log n)$, which we show to be tight.

Václav Rozhoň

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

A Nearly Tight Analysis of Greedy k-means++

Deterministic Low-Diameter Decompositions for Weighted Graphs and Distributed and Parallel Applications

Cut distance identifying graphon parameters over weak* limits

Generalizing the Sharp Threshold Phenomenon for the Distributed Complexity of the Lovász Local Lemma

Improved Deterministic Network Decomposition

k-means++: few more steps yield constant approximation

Polylogarithmic-Time Deterministic Network Decomposition and Distributed Derandomization

Relating the cut distance and the weak* topology for graphons

Simple and sharp analysis of k-means||