Source author record

Jakub Łącki

Jakub Łącki appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Distributed, Parallel, and Cluster Computing Discrete Mathematics Logic in Computer Science Machine Learning math.CO Social and Information Networks

Catalog footprint

What is connected

13works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Constant Approximation for Normalized Modularity and Associations Clustering

We study the problem of graph clustering under a broad class of objectives in which the quality of a cluster is defined based on the ratio between the number of edges in the cluster, and the total weight of vertices in the cluster. We show that our definition is closely related to popular clustering measures, namely normalized associations, which is a dual of the normalized cut objective, and normalized modularity. We give a linear time constant-approximate algorithm for our objective, which implies the first constant-factor approximation algorithms for normalized modularity and normalized associations.

preprint2022arXiv

Hierarchical Agglomerative Graph Clustering in Poly-Logarithmic Depth

Obtaining scalable algorithms for hierarchical agglomerative clustering (HAC) is of significant interest due to the massive size of real-world datasets. At the same time, efficiently parallelizing HAC is difficult due to the seemingly sequential nature of the algorithm. In this paper, we address this issue and present ParHAC, the first efficient parallel HAC algorithm with sublinear depth for the widely-used average-linkage function. In particular, we provide a $(1+ε)$-approximation algorithm for this problem on $m$ edge graphs using $\tilde{O}(m)$ work and poly-logarithmic depth. Moreover, we show that obtaining similar bounds for exact average-linkage HAC is not possible under standard complexity-theoretic assumptions. We complement our theoretical results with a comprehensive study of the ParHAC algorithm in terms of its scalability, performance, and quality, and compare with several state-of-the-art sequential and parallel baselines. On a broad set of large publicly-available real-world datasets, we find that ParHAC obtains a 50.1x speedup on average over the best sequential baseline, while achieving quality similar to the exact HAC algorithm. We also show that ParHAC can cluster one of the largest publicly available graph datasets with 124 billion edges in a little over three hours using a commodity multicore machine.

preprint2022arXiv

Hierarchical Clustering in Graph Streams: Single-Pass Algorithms and Space Lower Bounds

The Hierarchical Clustering (HC) problem consists of building a hierarchy of clusters to represent a given dataset. Motivated by the modern large-scale applications, we study the problem in the \streaming model, in which the memory is heavily limited and only a single or very few passes over the input are allowed. Specifically, we investigate whether a good hierarchical clustering can be obtained, or at least whether we can approximately estimate the value of the optimal hierarchy. To measure the quality of a hierarchy, we use the HC minimization objective introduced by Dasgupta. Assuming that the input is an $n$-vertex weighted graph whose edges arrive in a stream, we derive the following results on space-vs-accuracy tradeoffs: * With $O(n\cdot \text{polylog}\,{n})$ space, we develop a single-pass algorithm, whose approximation ratio matches the currently best offline algorithm. * When the space is more limited, namely, $n^{1-o(1)}$, we prove that no algorithm can even estimate the value of optimum HC tree to within an $o(\frac{\log{n}}{\log\log{n}})$ factor, even when allowed $\text{polylog}{\,{n}}$ passes over the input. * In the most stringent setting of $\text{polylog}\,{n}$ space, we rule out algorithms that can even distinguish between "highly"-vs-"poorly" clusterable graphs, namely, graphs that have an $n^{1/2-o(1)}$ factor gap between their HC objective value. * Finally, we prove that any single-pass streaming algorithm that computes an optimal HC tree requires to store almost the entire input even if allowed exponential time. Our algorithmic results establish a general structural result that proves that cut sparsifiers of input graph can preserve cost of "balanced" HC trees to within a constant factor. Our lower bound results include a new streaming lower bound for a novel problem "One-vs-Many-Expanders", which can be of independent interest.

preprint2022arXiv

Near-Optimal Decremental Hopsets with Applications

Given a weighted undirected graph $G=(V,E,w)$, a hopset $H$ of hopbound $β$ and stretch $(1+ε)$ is a set of edges such that for any pair of nodes $u, v \in V$, there is a path in $G \cup H$ of at most $β$ hops, whose length is within a $(1+ε)$ factor from the distance between $u$ and $v$ in $G$. We show the first efficient decremental algorithm for maintaining hopsets with a polylogarithmic hopbound. The update time of our algorithm matches the best known static algorithm up to polylogarithmic factors. All the previous decremental hopset constructions had a superpolylogarithmic (but subpolynomial) hopbound of $2^{\log^{Ω(1)} n}$ [Bernstein, FOCS'09; HKN, FOCS'14; Chechik, FOCS'18]. By applying our decremental hopset construction, we get improved or near optimal bounds for several distance problems. Most importantly, we show how to decrementally maintain $(2k-1)(1+ε)$-approximate all-pairs shortest paths (for any constant $k \geq 2)$, in $\tilde{O}(n^{1/k})$ amortized update time and $O(k)$ query time. This improves (by a polynomial factor) over the update-time of the best previously known decremental algorithm in the constant query time regime. Moreover, it improves over the result of [Chechik, FOCS'18] that has a query time of $O(\log \log(nW))$, where $W$ is the aspect ratio, and the amortized update time is $n^{1/k}\cdot(\frac{1}ε)^{\tilde{O}(\sqrt{\log n})}$. For sparse graphs our construction nearly matches the best known static running time / query time tradeoff. We also obtain near-optimal bounds for maintaining approximate multi-source shortest paths and distance sketches, and get improved bounds for approximate single-source shortest paths. Our algorithms are randomized and our bounds hold with high probability against an oblivious adversary.

preprint2020arXiv

Near-Optimal Massively Parallel Graph Connectivity

Identifying the connected components of a graph, apart from being a fundamental problem with countless applications, is a key primitive for many other algorithms. In this paper, we consider this problem in parallel settings. Particularly, we focus on the Massively Parallel Computations (MPC) model, which is the standard theoretical model for modern parallel frameworks such as MapReduce, Hadoop, or Spark. We consider the truly sublinear regime of MPC for graph problems where the space per machine is $n^δ$ for some desirably small constant $δ\in (0, 1)$. We present an algorithm that for graphs with diameter $D$ in the wide range $[\log^ε n, n]$, takes $O(\log D)$ rounds to identify the connected components and takes $O(\log \log n)$ rounds for all other graphs. The algorithm is randomized, succeeds with high probability, does not require prior knowledge of $D$, and uses an optimal total space of $O(m)$. We complement this by showing a conditional lower-bound based on the widely believed TwoCycle conjecture that $Ω(\log D)$ rounds are indeed necessary in this setting. Studying parallel connectivity algorithms received a resurgence of interest after the pioneering work of Andoni et al. [FOCS 2018] who presented an algorithm with $O(\log D \cdot \log \log n)$ round-complexity. Our algorithm improves this result for the whole range of values of $D$ and almost settles the problem due to the conditional lower-bound. Additionally, we show that with minimal adjustments, our algorithm can also be implemented in a variant of the (CRCW) PRAM in asymptotically the same number of rounds.

preprint2016arXiv

Optimal Dynamic Strings

In this paper we study the fundamental problem of maintaining a dynamic collection of strings under the following operations: concat - concatenates two strings, split - splits a string into two at a given position, compare - finds the lexicographical order (less, equal, greater) between two strings, LCP - calculates the longest common prefix of two strings. We present an efficient data structure for this problem, where an update requires only $O(\log n)$ worst-case time with high probability, with $n$ being the total length of all strings in the collection, and a query takes constant worst-case time. On the lower bound side, we prove that even if the only possible query is checking equality of two strings, either updates or queries take amortized $Ω(\log n)$ time; hence our implementation is optimal. Such operations can be used as a basic building block to solve other string problems. We provide two examples. First, we can augment our data structure to provide pattern matching queries that may locate occurrences of a specified pattern $p$ in the strings in our collection in optimal $O(|p|)$ time, at the expense of increasing update time to $O(\log^2 n)$. Second, we show how to maintain a history of an edited text, processing updates in $O(\log t \log \log t)$ time, where $t$ is the number of edits, and how to support pattern matching queries against the whole history in $O(|p| \log t \log \log t)$ time. Finally, we note that our data structure can be applied to test dynamic tree isomorphism and to compare strings generated by dynamic straight-line grammars.

preprint2016arXiv

The Power of Dynamic Distance Oracles: Efficient Dynamic Algorithms for the Steiner Tree

In this paper we study the Steiner tree problem over a dynamic set of terminals. We consider the model where we are given an $n$-vertex graph $G=(V,E,w)$ with positive real edge weights, and our goal is to maintain a tree which is a good approximation of the minimum Steiner tree spanning a terminal set $S \subseteq V$, which changes over time. The changes applied to the terminal set are either terminal additions (incremental scenario), terminal removals (decremental scenario), or both (fully dynamic scenario). Our task here is twofold. We want to support updates in sublinear $o(n)$ time, and keep the approximation factor of the algorithm as small as possible. We show that we can maintain a $(6+\varepsilon)$-approximate Steiner tree of a general graph in $\tilde{O}(\sqrt{n} \log D)$ time per terminal addition or removal. Here, $D$ denotes the stretch of the metric induced by $G$. For planar graphs we achieve the same running time and the approximation ratio of $(2+\varepsilon)$. Moreover, we show faster algorithms for incremental and decremental scenarios. Finally, we show that if we allow higher approximation ratio, even more efficient algorithms are possible. In particular we show a polylogarithmic time $(4+\varepsilon)$-approximate algorithm for planar graphs. One of the main building blocks of our algorithms are dynamic distance oracles for vertex-labeled graphs, which are of independent interest. We also improve and use the online algorithms for the Steiner tree problem.

preprint2015arXiv

Algorithmic Complexity of Power Law Networks

It was experimentally observed that the majority of real-world networks follow power law degree distribution. The aim of this paper is to study the algorithmic complexity of such "typical" networks. The contribution of this work is twofold. First, we define a deterministic condition for checking whether a graph has a power law degree distribution and experimentally validate it on real-world networks. This definition allows us to derive interesting properties of power law networks. We observe that for exponents of the degree distribution in the range $[1,2]$ such networks exhibit double power law phenomenon that was observed for several real-world networks. Our observation indicates that this phenomenon could be explained by just pure graph theoretical properties. The second aim of our work is to give a novel theoretical explanation why many algorithms run faster on real-world data than what is predicted by algorithmic worst-case analysis. We show how to exploit the power law degree distribution to design faster algorithms for a number of classical P-time problems including transitive closure, maximum matching, determinant, PageRank and matrix inverse. Moreover, we deal with the problems of counting triangles and finding maximum clique. Previously, it has been only shown that these problems can be solved very efficiently on power law graphs when these graphs are random, e.g., drawn at random from some distribution. However, it is unclear how to relate such a theoretical analysis to real-world graphs, which are fixed. Instead of that, we show that the randomness assumption can be replaced with a simple condition on the degrees of adjacent vertices, which can be used to obtain similar results. As a result, in some range of power law exponents, we are able to solve the maximum clique problem in polynomial time, although in general power law networks the problem is NP-complete.

preprint2015arXiv

Fast and simple connectivity in graph timelines

In this paper we study the problem of answering connectivity queries about a \emph{graph timeline}. A graph timeline is a sequence of undirected graphs $G_1,\ldots,G_t$ on a common set of vertices of size $n$ such that each graph is obtained from the previous one by an addition or a deletion of a single edge. We present data structures, which preprocess the timeline and can answer the following queries: - forall$(u,v,a,b)$ -- does the path $u\to v$ exist in each of $G_a,\ldots,G_b$? - exists$(u,v,a,b)$ -- does the path $u\to v$ exist in any of $G_a,\ldots,G_b$? - forall2$(u,v,a,b)$ -- do there exist two edge-disjoint paths connecting $u$ and $v$ in each of $G_a,\ldots,G_b$ We show data structures that can answer forall and forall2 queries in $O(\log n)$ time after preprocessing in $O(m+t\log n)$ time. Here by $m$ we denote the number of edges that remain unchanged in each graph of the timeline. For the case of exists queries, we show how to extend an existing data structure to obtain a preprocessing/query trade-off of $\langle O(m+\min(nt, t^{2-α})), O(t^α)\rangle$ and show a matching conditional lower bound.

preprint2014arXiv

Optimal decremental connectivity in planar graphs

We show an algorithm for dynamic maintenance of connectivity information in an undirected planar graph subject to edge deletions. Our algorithm may answer connectivity queries of the form `Are vertices $u$ and $v$ connected with a path?' in constant time. The queries can be intermixed with any sequence of edge deletions, and the algorithm handles all updates in $O(n)$ time. This results improves over previously known $O(n \log n)$ time algorithm.

preprint2013arXiv

Faster Algorithms for Markov Decision Processes with Low Treewidth

We consider two core algorithmic problems for probabilistic verification: the maximal end-component decomposition and the almost-sure reachability set computation for Markov decision processes (MDPs). For MDPs with treewidth $k$, we present two improved static algorithms for both the problems that run in time $O(n \cdot k^{2.38} \cdot 2^k)$ and $O(m \cdot \log n \cdot k)$, respectively, where $n$ is the number of states and $m$ is the number of edges, significantly improving the previous known $O(n\cdot k \cdot \sqrt{n\cdot k})$ bound for low treewidth. We also present decremental algorithms for both problems for MDPs with constant treewidth that run in amortized logarithmic time, which is a huge improvement over the previously known algorithms that require amortized linear time.

preprint2012arXiv

Single Source - All Sinks Max Flows in Planar Digraphs

Let G = (V,E) be a planar n-vertex digraph. Consider the problem of computing max st-flow values in G from a fixed source s to all sinks t in V\{s}. We show how to solve this problem in near-linear O(n log^3 n) time. Previously, no better solution was known than running a single-source single-sink max flow algorithm n-1 times, giving a total time bound of O(n^2 log n) with the algorithm of Borradaile and Klein. An important implication is that all-pairs max st-flow values in G can be computed in near-quadratic time. This is close to optimal as the output size is Theta(n^2). We give a quadratic lower bound on the number of distinct max flow values and an Omega(n^3) lower bound for the total size of all min cut-sets. This distinguishes the problem from the undirected case where the number of distinct max flow values is O(n). Previous to our result, no algorithm which could solve the all-pairs max flow values problem faster than the time of Theta(n^2) max-flow computations for every planar digraph was known. This result is accompanied with a data structure that reports min cut-sets. For fixed s and all t, after O(n^{3/2} log^{3/2} n) preprocessing time, it can report the set of arcs C crossing a min st-cut in time roughly proportional to the size of C.

preprint2011arXiv

Min-cuts and Shortest Cycles in Planar Graphs in O(n log log n) Time

We present a deterministic O(n log log n) time algorithm for finding shortest cycles and minimum cuts in planar graphs. The algorithm improves the previously known fastest algorithm by Italiano et al. in STOC'11 by a factor of log n. This speedup is obtained through the use of dense distance graphs combined with a divide-and-conquer approach.

Jakub Łącki

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

Constant Approximation for Normalized Modularity and Associations Clustering

Hierarchical Agglomerative Graph Clustering in Poly-Logarithmic Depth

Hierarchical Clustering in Graph Streams: Single-Pass Algorithms and Space Lower Bounds

Near-Optimal Decremental Hopsets with Applications

Near-Optimal Massively Parallel Graph Connectivity

Optimal Dynamic Strings

The Power of Dynamic Distance Oracles: Efficient Dynamic Algorithms for the Steiner Tree

Algorithmic Complexity of Power Law Networks

Fast and simple connectivity in graph timelines

Optimal decremental connectivity in planar graphs

Faster Algorithms for Markov Decision Processes with Low Treewidth

Single Source - All Sinks Max Flows in Planar Digraphs

Min-cuts and Shortest Cycles in Planar Graphs in O(n log log n) Time