Researcher profile

Paweł Gawrychowski

Paweł Gawrychowski contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
16works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

16 published item(s)

preprint2022arXiv

Cut query algorithms with star contraction

We study the complexity of determining the edge connectivity of a simple graph with cut queries. We show that (i) there is a bounded-error randomized algorithm that computes edge connectivity with $O(n)$ cut queries, and (ii) there is a bounded-error quantum algorithm that computes edge connectivity with $Õ(\sqrt{n})$ cut queries. We prove these results using a new technique called "star contraction" to randomly contract edges of a graph while preserving non-trivial minimum cuts. In star contraction vertices randomly contract an edge incident on a small set of randomly chosen vertices. In contrast to the related 2-out contraction technique of Ghaffari, Nowicki, and Thorup [SODA'20], star contraction only contracts vertex-disjoint star subgraphs, which allows it to be efficiently implemented via cut queries. The $O(n)$ bound from item (i) was not known even for the simpler problem of connectivity, and improves the $O(n\log^3 n)$ bound by Rubinstein, Schramm, and Weinberg [ITCS'18]. The bound is tight under the reasonable conjecture that the randomized communication complexity of connectivity is $Ω(n\log n)$, an open question since the seminal work of Babai, Frankl, and Simon [FOCS'86]. The bound also excludes using edge connectivity on simple graphs to prove a superlinear randomized query lower bound for minimizing a symmetric submodular function. Item (ii) gives a nearly-quadratic separation with the randomized complexity and addresses an open question of Lee, Santha, and Zhang [SODA'21]. The algorithm can also be viewed as making $Õ(\sqrt{n})$ matrix-vector multiplication queries to the adjacency matrix. Finally, we demonstrate the use of star contraction outside of the cut query setting by designing a one-pass semi-streaming algorithm for computing edge connectivity in the vertex arrival setting. This contrasts with the edge arrival setting where two passes are required.

preprint2022arXiv

Matching Patterns with Variables Under Edit Distance

A pattern $α$ is a string of variables and terminal letters. We say that $α$ matches a word $w$, consisting only of terminal letters, if $w$ can be obtained by replacing the variables of $α$ by terminal words. The matching problem, i.e., deciding whether a given pattern matches a given word, was heavily investigated: it is NP-complete in general, but can be solved efficiently for classes of patterns with restricted structure. If we are interested in what is the minimum Hamming distance between $w$ and any word $u$ obtained by replacing the variables of $α$ by terminal words (so matching under Hamming distance), one can devise efficient algorithms and matching conditional lower bounds for the class of regular patterns (in which no variable occurs twice), as well as for classes of patterns where we allow unbounded repetitions of variables, but restrict the structure of the pattern, i.e., the way the occurrences of different variables can be interleaved. Moreover, under Hamming distance, if a variable occurs more than once and its occurrences can be interleaved arbitrarily with those of other variables, even if each of these occurs just once, the matching problem is intractable. In this paper, we consider the problem of matching patterns with variables under edit distance. We still obtain efficient algorithms and matching conditional lower bounds for the class of regular patterns, but show that the problem becomes, in this case, intractable already for unary patterns, consisting of repeated occurrences of a single variable interleaved with terminals.

preprint2022arXiv

The Dynamic k-Mismatch Problem

The text-to-pattern Hamming distances problem asks to compute the Hamming distances between a given pattern of length $m$ and all length-$m$ substrings of a given text of length $n\ge m$. We focus on the $k$-mismatch version of the problem, where a distance needs to be returned only if it does not exceed a threshold $k$. We assume $n\le 2m$ (in general, one can partition the text into overlapping blocks). In this work, we show data structures for the dynamic version of this problem supporting two operations: An update performs a single-letter substitution in the pattern or the text, and a query, given an index $i$, returns the Hamming distance between the pattern and the text substring starting at position $i$, or reports that it exceeds $k$. First, we show a data structure with $\tilde{O}(1)$ update and $\tilde{O}(k)$ query time. Then we show that $\tilde{O}(k)$ update and $\tilde{O}(1)$ query time is also possible. These two provide an optimal trade-off for the dynamic $k$-mismatch problem with $k \le \sqrt{n}$: we prove that, conditioned on the strong 3SUM conjecture, one cannot simultaneously achieve $k^{1-Ω(1)}$ time for all operations. For $k\ge \sqrt{n}$, we give another lower bound, conditioned on the Online Matrix-Vector conjecture, that excludes algorithms taking $n^{1/2-Ω(1)}$ time per operation. This is tight for constant-sized alphabets: Clifford et al. (STACS 2018) achieved $\tilde{O}(\sqrt{n})$ time per operation in that case, but with $\tilde{O}(n^{3/4})$ time per operation for large alphabets. We improve and extend this result with an algorithm that, given $1\le x\le k$, achieves update time $\tilde{O}(\frac{n}{k} +\sqrt{\frac{nk}{x}})$ and query time $\tilde{O}(x)$. In particular, for $k\ge \sqrt{n}$, an appropriate choice of $x$ yields $\tilde{O}(\sqrt[3]{nk})$ time per operation, which is $\tilde{O}(n^{2/3})$ when no threshold $k$ is provided.

preprint2021arXiv

An Almost Optimal Edit Distance Oracle

We consider the problem of preprocessing two strings $S$ and $T$, of lengths $m$ and $n$, respectively, in order to be able to efficiently answer the following queries: Given positions $i,j$ in $S$ and positions $a,b$ in $T$, return the optimal alignment of $S[i \mathinner{.\,.} j]$ and $T[a \mathinner{.\,.} b]$. Let $N=mn$. We present an oracle with preprocessing time $N^{1+o(1)}$ and space $N^{1+o(1)}$ that answers queries in $\log^{2+o(1)}N$ time. In other words, we show that we can query the alignment of every two substrings in almost the same time it takes to compute just the alignment of $S$ and $T$. Our oracle uses ideas from our distance oracle for planar graphs [STOC 2019] and exploits the special structure of the alignment graph. Conditioned on popular hardness conjectures, this result is optimal up to subpolynomial factors. Our results apply to both edit distance and longest common subsequence (LCS). The best previously known oracle with construction time and size $\mathcal{O}(N)$ has slow $Ω(\sqrt{N})$ query time [Sakai, TCS 2019], and the one with size $N^{1+o(1)}$ and query time $\log^{2+o(1)}N$ (using a planar graph distance oracle) has slow $Ω(N^{3/2})$ construction time [Long & Pettie, SODA 2021]. We improve both approaches by roughly a $\sqrt N$ factor.

preprint2021arXiv

Conditional Lower Bounds for Variants of Dynamic LIS

In this note, we consider the complexity of maintaining the longest increasing subsequence (LIS) of an array under (i) inserting an element, and (ii) deleting an element of an array. We show that no algorithm can support queries and updates in time $\mathcal{O}(n^{1/2-ε})$ and $\mathcal{O}(n^{1/3-ε})$ for the dynamic LIS problem, for any constant $ε>0$, when the elements are weighted or the algorithm supports 1D-queries (on subarrays), respectively, assuming the All-Pairs Shortest Paths (APSP) conjecture or the Online Boolean Matrix-Vector Multiplication (OMv) conjecture. The main idea in our construction comes from the work of Abboud and Dahlgaard [FOCS 2016], who proved conditional lower bounds for dynamic planar graph algorithm. However, this needs to be appropriately adjusted and translated to obtain an instance of the dynamic LIS problem.

preprint2021arXiv

Fault-Tolerant Distance Labeling for Planar Graphs

In fault-tolerant distance labeling we wish to assign short labels to the vertices of a graph $G$ such that from the labels of any three vertices $u,v,f$ we can infer the $u$-to-$v$ distance in the graph $G\setminus \{f\}$. We show that any directed weighted planar graph (and in fact any graph in a graph family with $O(\sqrt{n})$-size separators, such as minor-free graphs) admits fault-tolerant distance labels of size $O(n^{2/3})$. We extend these labels in a way that allows us to also count the number of shortest paths, and provide additional upper and lower bounds for labels and oracles for counting shortest paths.

preprint2021arXiv

Strictly In-Place Algorithms for Permuting and Inverting Permutations

We revisit the problem of permuting an array of length $n$ according to a given permutation in place, that is, using only a small number of bits of extra storage. Fich, Munro and Poblete [FOCS 1990, SICOMP 1995] obtained an elegant $\mathcal{O}(n\log n)$-time algorithm using only $\mathcal{O}(\log^{2}n)$ bits of extra space for this basic problem by designing a procedure that scans the permutation and outputs exactly one element from each of its cycles. However, in the strict sense in place should be understood as using only an asymptotically optimal $\mathcal{O}(\log n)$ bits of extra space, or storing a constant number of indices. The problem of permuting in this version is, in fact, a well-known interview question, with the expected solution being a quadratic-time algorithm. Surprisingly, no faster algorithm seems to be known in the literature. Our first contribution is a strictly in-place generalisation of the method of Fich et al. that works in $\mathcal{O}_{\varepsilon}(n^{1+\varepsilon})$ time, for any $\varepsilon > 0$. Then, we build on this generalisation to obtain a strictly in-place algorithm for inverting a given permutation on $n$ elements working in the same complexity. This is a significant improvement on a recent result of Guśpiel [arXiv 2019], who designed an $\mathcal{O}(n^{1.5})$-time algorithm.

preprint2020arXiv

A Faster Subquadratic Algorithm for the Longest Common Increasing Subsequence Problem

The Longest Common Increasing Subsequence (LCIS) is a variant of the classical Longest Common Subsequence (LCS), in which we additionally require the common subsequence to be strictly increasing. While the well-known "Four Russians" technique can be used to find LCS in subquadratic time, it does not seem applicable to LCIS. Recently, Duraj [STACS 2020] used a completely different method based on the combinatorial properties of LCIS to design an $\mathcal{O}(n^2(\log\log n)^2/\log^{1/6}n)$ time algorithm. We show that an approach based on exploiting tabulation can be used to construct an asymptotically faster $\mathcal{O}(n^2 \log\log n/\sqrt{\log n})$ time algorithm. As our solution avoids using the specific combinatorial properties of LCIS, it can be also adapted for the Longest Common Weakly Increasing Subsequence (LCWIS).

preprint2020arXiv

A Note on a Recent Algorithm for Minimum Cut

Given an undirected edge-weighted graph $G=(V,E)$ with $m$ edges and $n$ vertices, the minimum cut problem asks to find a subset of vertices $S$ such that the total weight of all edges between $S$ and $V \setminus S$ is minimized. Karger's longstanding $O(m \log^3 n)$ time randomized algorithm for this problem was very recently improved in two independent works to $O(m \log^2 n)$ [ICALP'20] and to $O(m \log^2 n + n\log^5 n)$ [STOC'20]. These two algorithms use different approaches and techniques. In particular, while the former is faster, the latter has the advantage that it can be used to obtain efficient algorithms in the cut-query and in the streaming models of computation. In this paper, we show how to simplify and improve the algorithm of [STOC'20] to $O(m \log^2 n + n\log^3 n)$. We obtain this by replacing a randomized algorithm that, given a spanning tree $T$ of $G$, finds in $O(m \log n+n\log^4 n)$ time a minimum cut of $G$ that 2-respects (cuts two edges of) $T$ with a simple $O(m \log n+n\log^2 n)$ time deterministic algorithm for the same problem.

preprint2020arXiv

Efficient Labeling for Reachability in Digraphs

We consider labeling nodes of a directed graph for reachability queries. A reachability labeling scheme for such a graph assigns a binary string, called a label, to each node. Then, given the labels of nodes $u$ and $v$ and no other information about the underlying graph, it should be possible to determine whether there exists a directed path from $u$ to $v$. By a simple information theoretical argument and invoking the bound on the number of partial orders, in any scheme some labels need to consist of at least $n/4$ bits, where $n$ is the number of nodes. On the other hand, it is not hard to design a scheme with labels consisting of $n/2+O(\log n)$ bits. In the classical centralised setting, Munro and Nicholson designed a data structure for reachability queries consisting of $n^2/4+o(n^2)$ bits (which is optimal, up to the lower order term). We extend their approach to obtain a scheme with labels consisting of $n/3+o(n)$ bits.

preprint2020arXiv

Existential length universality

We study the following natural variation on the classical universality problem: given a language $L(M)$ represented by $M$ (e.g., a DFA/RE/NFA/PDA), does there exist an integer $\ell \geq 0$ such that $Σ^\ell \subseteq L(M)$? In the case of an NFA, we show that this problem is NEXPTIME-complete, and the smallest such $\ell$ can be doubly exponential in the number of states. This particular case was formulated as an open problem in 2009, and our solution uses a novel and involved construction. In the case of a PDA, we show that it is recursively unsolvable, while the smallest such $\ell$ is not bounded by any computable function of the number of states. In the case of a DFA, we show that the problem is NP-complete, and $e^{\sqrt{n \log n} (1+o(1))}$ is an asymptotically tight upper bound for the smallest such $\ell$, where $n$ is the number of states. Finally, we prove that in all these cases, the problem becomes computationally easier when the length $\ell$ is also given in binary in the input: it is polynomially solvable for a DFA, PSPACE-complete for an NFA, and co-NEXPTIME-complete for a PDA.

preprint2020arXiv

Generalised Pattern Matching Revisited

In the problem of $\texttt{Generalised Pattern Matching}\ (\texttt{GPM})$ [STOC'94, Muthukrishnan and Palem], we are given a text $T$ of length $n$ over an alphabet $Σ_T$, a pattern $P$ of length $m$ over an alphabet $Σ_P$, and a matching relationship $\subseteq Σ_T \times Σ_P$, and must return all substrings of $T$ that match $P$ (reporting) or the number of mismatches between each substring of $T$ of length $m$ and $P$ (counting). In this work, we improve over all previously known algorithms for this problem for various parameters describing the input instance: * $\mathcal{D}\,$ being the maximum number of characters that match a fixed character, * $\mathcal{S}\,$ being the number of pairs of matching characters, * $\mathcal{I}\,$ being the total number of disjoint intervals of characters that match the $m$ characters of the pattern $P$. At the heart of our new deterministic upper bounds for $\mathcal{D}\,$ and $\mathcal{S}\,$ lies a faster construction of superimposed codes, which solves an open problem posed in [FOCS'97, Indyk] and can be of independent interest. To conclude, we demonstrate first lower bounds for $\texttt{GPM}$. We start by showing that any deterministic or Monte Carlo algorithm for $\texttt{GPM}$ must use $Ω(\mathcal{S})$ time, and then proceed to show higher lower bounds for combinatorial algorithms. These bounds show that our algorithms are almost optimal, unless a radically new approach is developed.

preprint2020arXiv

Minimum Cut in $O(m\log^2 n)$ Time

We give a randomized algorithm that finds a minimum cut in an undirected weighted $m$-edge $n$-vertex graph $G$ with high probability in $O(m \log^2 n)$ time. This is the first improvement to Karger's celebrated $O(m \log^3 n)$ time algorithm from 1996. Our main technical contribution is a deterministic $O(m \log n)$ time algorithm that, given a spanning tree $T$ of $G$, finds a minimum cut of $G$ that 2-respects (cuts two edges of) $T$.

preprint2020arXiv

On Two Measures of Distance between Fully-Labelled Trees

The last decade brought a significant increase in the amount of data and a variety of new inference methods for reconstructing the detailed evolutionary history of various cancers. This brings the need of designing efficient procedures for comparing rooted trees representing the evolution of mutations in tumor phylogenies. Bernardini et al. [CPM 2019] recently introduced a notion of the rearrangement distance for fully-labelled trees motivated by this necessity. This notion originates from two operations: one that permutes the labels of the nodes, the other that affects the topology of the tree. Each operation alone defines a distance that can be computed in polynomial time, while the actual rearrangement distance, that combines the two, was proven to be NP-hard. We answer two open question left unanswered by the previous work. First, what is the complexity of computing the permutation distance? Second, is there a constant-factor approximation algorithm for estimating the rearrangement distance between two arbitrary trees? We answer the first one by showing, via a two-way reduction, that calculating the permutation distance between two trees on $n$ nodes is equivalent, up to polylogarithmic factors, to finding the largest cardinality matching in a sparse bipartite graph. In particular, by plugging in the algorithm of Liu and Sidford [ArXiv 2020], we obtain an $O(n^{4/3+o(1)})$ time algorithm for computing the permutation distance between two trees on $n$ nodes. Then we answer the second question positively, and design a linear-time constant-factor approximation algorithm that does not need any assumption on the trees.

preprint2020arXiv

Shorter Labels for Routing in Trees

A routing labeling scheme assigns a binary string, called a label, to each node in a network, and chooses a distinct port number from $\{1,\ldots,d\}$ for every edge outgoing from a node of degree $d$. Then, given the labels of $u$ and $w$ and no other information about the network, it should be possible to determine the port number corresponding to the first edge on the shortest path from $u$ to $w$. In their seminal paper, Thorup and Zwick [SPAA 2001] designed several routing methods for general weighted networks. An important technical ingredient in their paper that according to the authors ``may be of independent practical and theoretical interest'' is a routing labeling scheme for trees of arbitrary degrees. For a tree on $n$ nodes, their scheme constructs labels consisting of $(1+o(1))\log n$ bits such that the sought port number can be computed in constant time. Looking closer at their construction, the labels consist of $\log n + O(\log n\cdot \log\log\log n / \log\log n)$ bits. Given that the only known lower bound is $\log n+Ω(\log\log n)$, a natural question that has been asked for other labeling problems in trees is to determine the asymptotics of the smaller-order term. We make the first (and significant) progress in 19 years on determining the correct second-order term for the length of a label in a routing labeling scheme for trees on $n$ nodes. We design such a scheme with labels of length $\log n+O((\log\log n)^{2})$. Furthermore, we modify the scheme to allow for computing the port number in constant time at the expense of slightly increasing the length to $\log n+O((\log\log n)^{3})$.

preprint2020arXiv

Voronoi diagrams on planar graphs, and computing the diameter in deterministic $\tilde{O}(n^{5/3})$ time

We present an explicit and efficient construction of additively weighted Voronoi diagrams on planar graphs. Let $G$ be a planar graph with $n$ vertices and $b$ sites that lie on a constant number of faces. We show how to preprocess $G$ in $\tilde O(nb^2)$ time (footnote: The $\tilde O$ notation hides polylogarithmic factors.) so that one can compute any additively weighted Voronoi diagram for these sites in $\tilde O(b)$ time. We use this construction to compute the diameter of a directed planar graph with real arc lengths in $\tilde{O}(n^{5/3})$ time. This improves the recent breakthrough result of Cabello (SODA'17), both by improving the running time (from $\tilde{O}(n^{11/6})$), and by providing a deterministic algorithm. It is in fact the first truly subquadratic {\em deterministic} algorithm for this problem. Our use of Voronoi diagrams to compute the diameter follows that of Cabello, but he used abstract Voronoi diagrams, which makes his diameter algorithm more involved, more expensive, and randomized. As in Cabello's work, our algorithm can compute, for every vertex $v$, both the farthest vertex from $v$ (i.e., the eccentricity of $v$), and the sum of distances from $v$ to all other vertices. Hence, our algorithm can also compute the radius, median, and Wiener index (sum of all pairwise distances) of a planar graph within the same time bounds. Our construction of Voronoi diagrams for planar graphs is of independent interest.