Source author record

Tomasz Kociumaka

Tomasz Kociumaka appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Discrete Mathematics Computational Complexity Formal Languages and Automata Theory

Catalog footprint

What is connected

39works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

An Improved Algorithm for The $k$-Dyck Edit Distance Problem

A Dyck sequence is a sequence of opening and closing parentheses (of various types) that is balanced. The Dyck edit distance of a given sequence of parentheses $S$ is the smallest number of edit operations (insertions, deletions, and substitutions) needed to transform $S$ into a Dyck sequence. We consider the threshold Dyck edit distance problem, where the input is a sequence of parentheses $S$ and a positive integer $k$, and the goal is to compute the Dyck edit distance of $S$ only if the distance is at most $k$, and otherwise report that the distance is larger than $k$. Backurs and Onak [PODS'16] showed that the threshold Dyck edit distance problem can be solved in $O(n+k^{16})$ time. In this work, we design new algorithms for the threshold Dyck edit distance problem which costs $O(n+k^{4.544184})$ time with high probability or $O(n+k^{4.853059})$ deterministically. Our algorithms combine several new structural properties of the Dyck edit distance problem, a refined algorithm for fast $(\min,+)$ matrix product, and a careful modification of ideas used in Valiant's parsing algorithm.

preprint2022arXiv

Computing Longest (Common) Lyndon Subsequences

Given a string $T$ with length $n$ whose characters are drawn from an ordered alphabet of size $σ$, its longest Lyndon subsequence is a longest subsequence of $T$ that is a Lyndon word. We propose algorithms for finding such a subsequence in $O(n^3)$ time with $O(n)$ space, or online in $O(n^3 σ)$ space and time. Our first result can be extended to find the longest common Lyndon subsequence of two strings of length $n$ in $O(n^4 σ)$ time using $O(n^3)$ space.

preprint2022arXiv

Dynamic Suffix Array with Polylogarithmic Queries and Updates

The suffix array $SA[1..n]$ of a text $T$ of length $n$ is a permutation of $\{1,\ldots,n\}$ describing the lexicographical ordering of suffixes of $T$, and it is considered to be among of the most important data structures in string algorithms, with dozens of applications in data compression, bioinformatics, and information retrieval. One of the biggest drawbacks of the suffix array is that it is very difficult to maintain under text updates: even a single character substitution can completely change the contents of the suffix array. Thus, the suffix array of a dynamic text is modelled using suffix array queries, which return the value $SA[i]$ given any $i\in[1..n]$. Prior to this work, the fastest dynamic suffix array implementations were by Amir and Boneh. At ISAAC 2020, they showed how to answer suffix array queries in $\tilde{O}(k)$ time, where $k\in[1..n]$ is a trade-off parameter, with $\tilde{O}(\frac{n}{k})$-time text updates. In a very recent preprint [2021], they also provided a solution with $O(\log^5 n)$-time queries and $\tilde{O}(n^{2/3})$-time updates. We propose the first data structure that supports both suffix array queries and text updates in $O({\rm polylog}\,n)$ time (achieving $O(\log^4 n)$ and $O(\log^{3+o(1)} n)$ time, respectively). Our data structure is deterministic and the running times for all operations are worst-case. In addition to the standard single-character edits (character insertions, deletions, and substitutions), we support (also in $O(\log^{3+o(1)} n)$ time) the "cut-paste" operation that moves any (arbitrarily long) substring of $T$ to any place in $T$. We complement our structure by a hardness result: unless the Online Matrix-Vector Multiplication (OMv) Conjecture fails, no data structure with $O({\rm polylog}\,n)$-time suffix array queries can support the "copy-paste" operation in $O(n^{1-ε})$ time for any $ε>0$.

preprint2022arXiv

Faster Pattern Matching under Edit Distance

We consider the approximate pattern matching problem under the edit distance. Given a text $T$ of length $n$, a pattern $P$ of length $m$, and a threshold $k$, the task is to find the starting positions of all substrings of $T$ that can be transformed to $P$ with at most $k$ edits. More than 20 years ago, Cole and Hariharan [SODA'98, J. Comput.'02] gave an $\mathcal{O}(n+k^4 \cdot n/ m)$-time algorithm for this classic problem, and this runtime has not been improved since. Here, we present an algorithm that runs in time $\mathcal{O}(n+k^{3.5} \sqrt{\log m \log k} \cdot n/m)$, thus breaking through this long-standing barrier. In the case where $n^{1/4+\varepsilon} \leq k \leq n^{2/5-\varepsilon}$ for some arbitrarily small positive constant $\varepsilon$, our algorithm improves over the state-of-the-art by polynomial factors: it is polynomially faster than both the algorithm of Cole and Hariharan and the classic $\mathcal{O}(kn)$-time algorithm of Landau and Vishkin [STOC'86, J. Algorithms'89]. We observe that the bottleneck case of the alternative $\mathcal{O}(n+k^4 \cdot n/m)$-time algorithm of Charalampopoulos, Kociumaka, and Wellnitz [FOCS'20] is when the text and the pattern are (almost) periodic. Our new algorithm reduces this case to a new dynamic problem (Dynamic Puzzle Matching), which we solve by building on tools developed by Tiskin [SODA'10, Algorithmica'15] for the so-called seaweed monoid of permutation matrices. Our algorithm relies only on a small set of primitive operations on strings and thus also applies to the fully-compressed setting (where text and pattern are given as straight-line programs) and to the dynamic setting (where we maintain a collection of strings under creation, splitting, and concatenation), improving over the state of the art.

preprint2022arXiv

The Dynamic k-Mismatch Problem

The text-to-pattern Hamming distances problem asks to compute the Hamming distances between a given pattern of length $m$ and all length-$m$ substrings of a given text of length $n\ge m$. We focus on the $k$-mismatch version of the problem, where a distance needs to be returned only if it does not exceed a threshold $k$. We assume $n\le 2m$ (in general, one can partition the text into overlapping blocks). In this work, we show data structures for the dynamic version of this problem supporting two operations: An update performs a single-letter substitution in the pattern or the text, and a query, given an index $i$, returns the Hamming distance between the pattern and the text substring starting at position $i$, or reports that it exceeds $k$. First, we show a data structure with $\tilde{O}(1)$ update and $\tilde{O}(k)$ query time. Then we show that $\tilde{O}(k)$ update and $\tilde{O}(1)$ query time is also possible. These two provide an optimal trade-off for the dynamic $k$-mismatch problem with $k \le \sqrt{n}$: we prove that, conditioned on the strong 3SUM conjecture, one cannot simultaneously achieve $k^{1-Ω(1)}$ time for all operations. For $k\ge \sqrt{n}$, we give another lower bound, conditioned on the Online Matrix-Vector conjecture, that excludes algorithms taking $n^{1/2-Ω(1)}$ time per operation. This is tight for constant-sized alphabets: Clifford et al. (STACS 2018) achieved $\tilde{O}(\sqrt{n})$ time per operation in that case, but with $\tilde{O}(n^{3/4})$ time per operation for large alphabets. We improve and extend this result with an algorithm that, given $1\le x\le k$, achieves update time $\tilde{O}(\frac{n}{k} +\sqrt{\frac{nk}{x}})$ and query time $\tilde{O}(x)$. In particular, for $k\ge \sqrt{n}$, an appropriate choice of $x$ yields $\tilde{O}(\sqrt[3]{nk})$ time per operation, which is $\tilde{O}(n^{2/3})$ when no threshold $k$ is provided.

preprint2021arXiv

Towards a Definitive Compressibility Measure for Repetitive Sequences

Unlike in statistical compression, where Shannon's entropy is a definitive lower bound, no such clear measure exists for the compressibility of repetitive sequences. Since statistical entropy does not capture repetitiveness, ad-hoc measures like the size $z$ of the Lempel--Ziv parse are frequently used to estimate it. The size $b \le z$ of the smallest bidirectional macro scheme captures better what can be achieved via copy-paste processes, though it is NP-complete to compute and it is not monotonic upon symbol appends. Recently, a more principled measure, the size $γ$ of the smallest string \emph{attractor}, was introduced. The measure $γ\le b$ lower bounds all the previous relevant ones, yet length-$n$ strings can be represented and efficiently indexed within space $O(γ\log\frac{n}γ)$, which also upper bounds most measures. While $γ$ is certainly a better measure of repetitiveness than $b$, it is also NP-complete to compute and not monotonic, and it is unknown if one can always represent a string in $o(γ\log n)$ space. In this paper, we study an even smaller measure, $δ\le γ$, which can be computed in linear time, is monotonic, and allows encoding every string in $O(δ\log\frac{n}δ)$ space because $z = O(δ\log\frac{n}δ)$. We show that $δ$ better captures the compressibility of repetitive strings. Concretely, we show that (1) $δ$ can be strictly smaller than $γ$, by up to a logarithmic factor; (2) there are string families needing $Ω(δ\log\frac{n}δ)$ space to be encoded, so this space is optimal for every $n$ and $δ$; (3) one can build run-length context-free grammars of size $O(δ\log\frac{n}δ)$, whereas the smallest (non-run-length) grammar can be up to $Θ(\log n/\log\log n)$ times larger; and (4) within $O(δ\log\frac{n}δ)$ space we can not only...

preprint2020arXiv

Approximating longest common substring with $k$ mismatches: Theory and practice

In the problem of the longest common substring with $k$ mismatches we are given two strings $X, Y$ and must find the maximal length $\ell$ such that there is a length-$\ell$ substring of $X$ and a length-$\ell$ substring of $Y$ that differ in at most $k$ positions. The length $\ell$ can be used as a robust measure of similarity between $X, Y$. In this work, we develop new approximation algorithms for computing $\ell$ that are significantly more efficient that previously known solutions from the theoretical point of view. Our approach is simple and practical, which we confirm via an experimental evaluation, and is probably close to optimal as we demonstrate via a conditional lower bound.

preprint2020arXiv

Approximating Text-to-Pattern Hamming Distances

We revisit a fundamental problem in string matching: given a pattern of length m and a text of length n, both over an alphabet of size $σ$, compute the Hamming distance between the pattern and the text at every location. Several $(1+ε)$-approximation algorithms have been proposed in the literature, with running time of the form $O(ε^{-O(1)}n\log n\log m)$, all using fast Fourier transform (FFT). We describe a simple $(1+ε)$-approximation algorithm that is faster and does not need FFT. Combining our approach with additional ideas leads to numerous new results: - We obtain the first linear-time approximation algorithm; the running time is $O(ε^{-2}n)$. - We obtain a faster exact algorithm computing all Hamming distances up to a given threshold k; its running time improves previous results by logarithmic factors and is linear if $k\le\sqrt m$. - We obtain approximation algorithms with better $ε$-dependence using rectangular matrix multiplication. The time-bound is $Õ(n)$ when the pattern is sufficiently long: $m\ge ε^{-28}$. Previous algorithms require $Õ(ε^{-1}n)$ time. - When k is not too small, we obtain a truly sublinear-time algorithm to find all locations with Hamming distance approximately (up to a constant factor) less than k, in $O((n/k^{Ω(1)}+occ)n^{o(1)})$ time, where occ is the output size. The algorithm leads to a property tester, returning true if an exact match exists and false if the Hamming distance is more than $δm$ at every location, running in $Õ(δ^{-1/3}n^{2/3}+δ^{-1}n/m)$ time. - We obtain a streaming algorithm to report all locations with Hamming distance approximately less than k, using $Õ(ε^{-2}\sqrt k)$ space. Previously, streaming algorithms were known for the exact problem with Õ(k) space or for the approximate problem with $Õ(ε^{-O(1)}\sqrt m)$ space.

preprint2020arXiv

Circular Pattern Matching with $k$ Mismatches

The $k$-mismatch problem consists in computing the Hamming distance between a pattern $P$ of length $m$ and every length-$m$ substring of a text $T$ of length $n$, if this distance is no more than $k$. In many real-world applications, any cyclic rotation of $P$ is a relevant pattern, and thus one is interested in computing the minimal distance of every length-$m$ substring of $T$ and any cyclic rotation of $P$. This is the circular pattern matching with $k$ mismatches ($k$-CPM) problem. A multitude of papers have been devoted to solving this problem but, to the best of our knowledge, only average-case upper bounds are known. In this paper, we present the first non-trivial worst-case upper bounds for the $k$-CPM problem. Specifically, we show an $O(nk)$-time algorithm and an $O(n+\frac{n}{m}\,k^4)$-time algorithm. The latter algorithm applies in an extended way a technique that was very recently developed for the $k$-mismatch problem [Bringmann et al., SODA 2019]. A preliminary version of this work appeared at FCT 2019. In this version we improve the time complexity of the main algorithm from $O(n+\frac{n}{m}\,k^5)$ to $O(n+\frac{n}{m}\,k^4)$.

preprint2020arXiv

Counting Distinct Patterns in Internal Dictionary Matching

We consider the problem of preprocessing a text $T$ of length $n$ and a dictionary $\mathcal{D}$ in order to be able to efficiently answer queries $CountDistinct(i,j)$, that is, given $i$ and $j$ return the number of patterns from $\mathcal{D}$ that occur in the fragment $T[i \mathinner{.\,.} j]$. The dictionary is internal in the sense that each pattern in $\mathcal{D}$ is given as a fragment of $T$. This way, the dictionary takes space proportional to the number of patterns $d=|\mathcal{D}|$ rather than their total length, which could be $Θ(n\cdot d)$. An $\tilde{\mathcal{O}}(n+d)$-size data structure that answers $CountDistinct(i,j)$ queries $\mathcal{O}(\log n)$-approximately in $\tilde{\mathcal{O}}(1)$ time was recently proposed in a work that introduced internal dictionary matching [ISAAC 2019]. Here we present an $\tilde{\mathcal{O}}(n+d)$-size data structure that answers $CountDistinct(i,j)$ queries $2$-approximately in $\tilde{\mathcal{O}}(1)$ time. Using range queries, for any $m$, we give an $\tilde{\mathcal{O}}(\min(nd/m,n^2/m^2)+d)$-size data structure that answers $CountDistinct(i,j)$ queries exactly in $\tilde{\mathcal{O}}(m)$ time. We also consider the special case when the dictionary consists of all square factors of the string. We design an $\mathcal{O}(n \log^2 n)$-size data structure that allows us to count distinct squares in a text fragment $T[i \mathinner{.\,.} j]$ in $\mathcal{O}(\log n)$ time.

preprint2020arXiv

Efficient Representation and Counting of Antipower Factors in Words

A $k$-antipower (for $k \ge 2$) is a concatenation of $k$ pairwise distinct words of the same length. The study of fragments of a word being antipowers was initiated by Fici et al. (ICALP 2016) and first algorithms for computing such fragments were presented by Badkobeh et al. (Inf. Process. Lett., 2018). We address two open problems posed by Badkobeh et al. We propose efficient algorithms for counting and reporting fragments of a word which are $k$-antipowers. They work in $\mathcal{O}(nk \log k)$ time and $\mathcal{O}(nk \log k + C)$ time, respectively, where $C$ is the number of reported fragments. For $k=o(\sqrt{n/\log n})$, this improves the time complexity of $\mathcal{O}(n^2/k)$ of the solution by Badkobeh et al. We also show that the number of different $k$-antipower factors of a word of length $n$ can be computed in $\mathcal{O}(nk^4 \log k \log n)$ time. Our main algorithmic tools are runs and gapped repeats. Finally we present an improved data structure that checks, for a given fragment of a word and an integer $k$, if the fragment is a $k$-antipower. This is a full and extended version of a paper from LATA 2019. In particular, all results about counting different antipowers factors are completely new compared with the LATA proceedings version.

preprint2020arXiv

Improved Circular $k$-Mismatch Sketches

The shift distance $\mathsf{sh}(S_1,S_2)$ between two strings $S_1$ and $S_2$ of the same length is defined as the minimum Hamming distance between $S_1$ and any rotation (cyclic shift) of $S_2$. We study the problem of sketching the shift distance, which is the following communication complexity problem: Strings $S_1$ and $S_2$ of length $n$ are given to two identical players (encoders), who independently compute sketches (summaries) $\mathtt{sk}(S_1)$ and $\mathtt{sk}(S_2)$, respectively, so that upon receiving the two sketches, a third player (decoder) is able to compute (or approximate) $\mathsf{sh}(S_1,S_2)$ with high probability. This paper primarily focuses on the more general $k$-mismatch version of the problem, where the decoder is allowed to declare a failure if $\mathsf{sh}(S_1,S_2)>k$, where $k$ is a parameter known to all parties. Andoni et al. (STOC'13) introduced exact circular $k$-mismatch sketches of size $\widetilde{O}(k+D(n))$, where $D(n)$ is the number of divisors of $n$. Andoni et al. also showed that their sketch size is optimal in the class of linear homomorphic sketches. We circumvent this lower bound by designing a (non-linear) exact circular $k$-mismatch sketch of size $\widetilde{O}(k)$; this size matches communication-complexity lower bounds. We also design $(1\pm \varepsilon)$-approximate circular $k$-mismatch sketch of size $\widetilde{O}(\min(\varepsilon^{-2}\sqrt{k}, \varepsilon^{-1.5}\sqrt{n}))$, which improves upon an $\widetilde{O}(\varepsilon^{-2}\sqrt{n})$-size sketch of Crouch and McGregor (APPROX'11).

preprint2020arXiv

The Streaming k-Mismatch Problem: Tradeoffs between Space and Total Time

We revisit the $k$-mismatch problem in the streaming model on a pattern of length $m$ and a streaming text of length $n$, both over a size-$σ$ alphabet. The current state-of-the-art algorithm for the streaming $k$-mismatch problem, by Clifford et al. [SODA 2019], uses $\tilde O(k)$ space and $\tilde O\big(\sqrt k\big)$ worst-case time per character. The space complexity is known to be (unconditionally) optimal, and the worst-case time per character matches a conditional lower bound. However, there is a gap between the total time cost of the algorithm, which is $\tilde O(n\sqrt k)$, and the fastest known offline algorithm, which costs $\tilde O\big(n + \min\big(\frac{nk}{\sqrt m},σn\big)\big)$ time. Moreover, it is not known whether improvements over the $\tilde O(n\sqrt k)$ total time are possible when using more than $O(k)$ space. We address these gaps by designing a randomized streaming algorithm for the $k$-mismatch problem that, given an integer parameter $k\le s \le m$, uses $\tilde O(s)$ space and costs $\tilde O\big(n+\min\big(\frac {nk^2}m,\frac{nk}{\sqrt s},\frac{σnm}s\big)\big)$ total time. For $s=m$, the total runtime becomes $\tilde O\big(n + \min\big(\frac{nk}{\sqrt m},σn\big)\big)$, which matches the time cost of the fastest offline algorithm. Moreover, the worst-case time cost per character is still $\tilde O\big(\sqrt k\big)$.

preprint2020arXiv

Time-Space Tradeoffs for Finding a Long Common Substring

We consider the problem of finding, given two documents of total length $n$, a longest string occurring as a substring of both documents. This problem, known as the Longest Common Substring (LCS) problem, has a classic $O(n)$-time solution dating back to the discovery of suffix trees (Weiner, 1973) and their efficient construction for integer alphabets (Farach-Colton, 1997). However, these solutions require $Θ(n)$ space, which is prohibitive in many applications. To address this issue, Starikovskaya and Vildhøj (CPM 2013) showed that for $n^{2/3} \le s \le n^{1-o(1)}$, the LCS problem can be solved in $O(s)$ space and $O(\frac{n^2}{s})$ time. Kociumaka et al. (ESA 2014) generalized this tradeoff to $1 \leq s \leq n$, thus providing a smooth time-space tradeoff from constant to linear space. In this paper, we obtain a significant speed-up for instances where the length $L$ of the sought LCS is large. For $1 \leq s \leq n$, we show that the LCS problem can be solved in $O(s)$ space and $\tilde{O}(\frac{n^2}{L\cdot s}+n)$ time. The result is based on techniques originating from the LCS with Mismatches problem (Flouri et al., 2015; Charalampopoulos et al., CPM 2018), on space-efficient locally consistent parsing (Birenzwige et al., SODA 2020), and on the structure of maximal repetitions (runs) in the input documents.

preprint2016arXiv

Efficient Index for Weighted Sequences

The problem of finding factors of a text string which are identical or similar to a given pattern string is a central problem in computer science. A generalised version of this problem consists in implementing an index over the text to support efficient on-line pattern queries. We study this problem in the case where the text is weighted: for every position of the text and every letter of the alphabet a probability of occurrence of this letter at this position is given. Sequences of this type, also called position weight matrices, are commonly used to represent imprecise or uncertain data. A weighted sequence may represent many different strings, each with probability of occurrence equal to the product of probabilities of its letters at subsequent positions. Given a probability threshold $1/z$, we say that a pattern string $P$ matches a weighted text at position $i$ if the product of probabilities of the letters of $P$ at positions $i,\ldots,i+|P|-1$ in the text is at least $1/z$. In this article, we present an $O(nz)$-time construction of an $O(nz)$-sized index that can answer pattern matching queries in a weighted text in optimal time improving upon the state of the art by a factor of $z \log z$. Other applications of this data structure include an $O(nz)$-time construction of the weighted prefix table and an $O(nz)$-time computation of all covers of a weighted sequence, which improve upon the state of the art by the same factor.

preprint2016arXiv

Faster Longest Common Extension Queries in Strings over General Alphabets

Longest common extension queries (often called longest common prefix queries) constitute a fundamental building block in multiple string algorithms, for example computing runs and approximate pattern matching. We show that a sequence of $q$ LCE queries for a string of size $n$ over a general ordered alphabet can be realized in $O(q \log \log n+n\log^*n)$ time making only $O(q+n)$ symbol comparisons. Consequently, all runs in a string over a general ordered alphabet can be computed in $O(n \log \log n)$ time making $O(n)$ symbol comparisons. Our results improve upon a solution by Kosolobov (Information Processing Letters, 2016), who gave an algorithm with $O(n \log^{2/3} n)$ running time and conjectured that $O(n)$ time is possible. We make a significant progress towards resolving this conjecture. Our techniques extend to the case of general unordered alphabets, when the time increases to $O(q\log n + n\log^*n)$. The main tools are difference covers and the disjoint-sets data structure.

preprint2016arXiv

Hardness of approximation for strip packing

Strip packing is a classical packing problem, where the goal is to pack a set of rectangular objects into a strip of a given width, while minimizing the total height of the packing. The problem has multiple applications, e.g. in scheduling and stock-cutting, and has been studied extensively. When the dimensions of objects are allowed to be exponential in the total input size, it is known that the problem cannot be approximated within a factor better than $3/2$, unless $\mathrm{P}=\mathrm{NP}$. However, there was no corresponding lower bound for polynomially bounded input data. In fact, Nadiradze and Wiese [SODA 2016] have recently proposed a $(1.4 + ε)$ approximation algorithm for this variant, thus showing that strip packing with polynomially bounded data can be approximated better than when exponentially large values in the input data are allowed. Their result has subsequently been improved to a $(4/3 + ε)$ approximation by two independent research groups [FSTTCS 2016, arXiv:1610.04430]. This raises a question whether strip packing with polynomially bounded input data admits a quasi-polynomial time approximation scheme, as is the case for related two-dimensional packing problems like maximum independent set of rectangles or two-dimensional knapsack. In this paper we answer this question in negative by proving that it is NP-hard to approximate strip packing within a factor better than $12/11$, even when admitting only polynomially bounded input data. In particular, this shows that the strip packing problem admits no quasi-polynomial time approximation scheme, unless $\mathrm{NP} \subseteq \mathrm{DTIME}(2^{\mathrm{polylog}(n)})$.

preprint2016arXiv

Maximum Number of Distinct and Nonequivalent Nonstandard Squares in a Word

The combinatorics of squares in a word depends on how the equivalence of halves of the square is defined. We consider Abelian squares, parameterized squares, and order-preserving squares. The word $uv$ is an Abelian (parameterized, order-preserving) square if $u$ and $v$ are equivalent in the Abelian (parameterized, order-preserving) sense. The maximum number of ordinary squares in a word is known to be asymptotically linear, but the exact bound is still investigated. We present several results on the maximum number of distinct squares for nonstandard subword equivalence relations. Let $\mathit{SQ}_{\mathrm{Abel}}(n,σ)$ and $\mathit{SQ}'_{\mathrm{Abel}}(n,σ)$ denote the maximum number of Abelian squares in a word of length $n$ over an alphabet of size $σ$, which are distinct as words and which are nonequivalent in the Abelian sense, respectively. For $σ\ge 2$ we prove that $\mathit{SQ}_{\mathrm{Abel}}(n,σ)=Θ(n^2)$, $\mathit{SQ}'_{\mathrm{Abel}}(n,σ)=Ω(n^{3/2})$ and $\mathit{SQ}'_{\mathrm{Abel}}(n,σ) = O(n^{11/6})$. We also give linear bounds for parameterized and order-preserving squares for alphabets of constant size: $\mathit{SQ}_{\mathrm{param}}(n,O(1))=Θ(n)$, $\mathit{SQ}_{\mathrm{op}}(n,O(1))=Θ(n)$. The upper bounds have quadratic dependence on the alphabet size for order-preserving squares and exponential dependence for parameterized squares. As a side result we construct infinite words over the smallest alphabet which avoid nontrivial order-preserving squares and nontrivial parameterized cubes (nontrivial parameterized squares cannot be avoided in an infinite word).

preprint2016arXiv

Minimal Suffix and Rotation of a Substring in Optimal Time

For a text given in advance, the substring minimal suffix queries ask to determine the lexicographically minimal non-empty suffix of a substring specified by the location of its occurrence in the text. We develop a data structure answering such queries optimally: in constant time after linear-time preprocessing. This improves upon the results of Babenko et al. (CPM 2014), whose trade-off solution is characterized by $Θ(n\log n)$ product of these time complexities. Next, we extend our queries to support concatenations of $O(1)$ substrings, for which the construction and query time is preserved. We apply these generalized queries to compute lexicographically minimal and maximal rotations of a given substring in constant time after linear-time preprocessing. Our data structures mainly rely on properties of Lyndon words and Lyndon factorizations. We combine them with further algorithmic and combinatorial tools, such as fusion trees and the notion of order isomorphism of strings.

preprint2016arXiv

Near-Optimal Computation of Runs over General Alphabet via Non-Crossing LCE Queries

Longest common extension queries (LCE queries) and runs are ubiquitous in algorithmic stringology. Linear-time algorithms computing runs and preprocessing for constant-time LCE queries have been known for over a decade. However, these algorithms assume a linearly-sortable integer alphabet. A recent breakthrough paper by Bannai et.\ al.\ (SODA 2015) showed a link between the two notions: all the runs in a string can be computed via a linear number of LCE queries. The first to consider these problems over a general ordered alphabet was Kosolobov (\emph{Inf.\ Process.\ Lett.}, 2016), who presented an $O(n (\log n)^{2/3})$-time algorithm for answering $O(n)$ LCE queries. This result was improved by Gawrychowski et.\ al.\ (accepted to CPM 2016) to $O(n \log \log n)$ time. In this work we note a special \emph{non-crossing} property of LCE queries asked in the runs computation. We show that any $n$ such non-crossing queries can be answered on-line in $O(n α(n))$ time, which yields an $O(n α(n))$-time algorithm for computing runs.

preprint2016arXiv

Optimal Dynamic Strings

In this paper we study the fundamental problem of maintaining a dynamic collection of strings under the following operations: concat - concatenates two strings, split - splits a string into two at a given position, compare - finds the lexicographical order (less, equal, greater) between two strings, LCP - calculates the longest common prefix of two strings. We present an efficient data structure for this problem, where an update requires only $O(\log n)$ worst-case time with high probability, with $n$ being the total length of all strings in the collection, and a query takes constant worst-case time. On the lower bound side, we prove that even if the only possible query is checking equality of two strings, either updates or queries take amortized $Ω(\log n)$ time; hence our implementation is optimal. Such operations can be used as a basic building block to solve other string problems. We provide two examples. First, we can augment our data structure to provide pattern matching queries that may locate occurrences of a specified pattern $p$ in the strings in our collection in optimal $O(|p|)$ time, at the expense of increasing update time to $O(\log^2 n)$. Second, we show how to maintain a history of an edited text, processing updates in $O(\log t \log \log t)$ time, where $t$ is the number of edits, and how to support pattern matching queries against the whole history in $O(|p| \log t \log \log t)$ time. Finally, we note that our data structure can be applied to test dynamic tree isomorphism and to compare strings generated by dynamic straight-line grammars.

preprint2016arXiv

Pattern Matching and Consensus Problems on Weighted Sequences and Profiles

We study pattern matching problems on two major representations of uncertain sequences used in molecular biology: weighted sequences (also known as position weight matrices, PWM) and profiles (i.e., scoring matrices). In the simple version, in which only the pattern or only the text is uncertain, we obtain efficient algorithms with theoretically-provable running times using a variation of the lookahead scoring technique. We also consider a general variant of the pattern matching problems in which both the pattern and the text are uncertain. Central to our solution is a special case where the sequences have equal length, called the consensus problem. We propose algorithms for the consensus problem parameterized by the number of strings that match one of the sequences. As our basic approach, a careful adaptation of the classic meet-in-the-middle algorithm for the knapsack problem is used. On the lower bound side, we prove that our dependence on the parameter is optimal up to lower-order terms conditioned on the optimality of the original algorithm for the knapsack problem.

preprint2016arXiv

Sparse Suffix Tree Construction in Optimal Time and Space

Suffix tree (and the closely related suffix array) are fundamental structures capturing all substrings of a given text essentially by storing all its suffixes in the lexicographical order. In some applications, we work with a subset of $b$ interesting suffixes, which are stored in the so-called sparse suffix tree. Because the size of this structure is $Θ(b)$, it is natural to seek a construction algorithm using only $O(b)$ words of space assuming read-only random access to the text. We design a linear-time Monte Carlo algorithm for this problem, hence resolving an open question explicitly stated by Bille et al. [TALG 2016]. The best previously known algorithm by I et al. [STACS 2014] works in $O(n\log b)$ time. Our solution proceeds in $n/b$ rounds; in the $r$-th round, we consider all suffixes starting at positions congruent to $r$ modulo $n/b$. By maintaining rolling hashes, we lexicographically sort all interesting suffixes starting at such positions, and then we merge them with the already considered suffixes. For efficient merging, we also need to answer LCE queries in small space. By plugging in the structure of Bille et al. [CPM 2015] we obtain $O(n+b\log b)$ time complexity. We improve this structure, which implies a linear-time sparse suffix tree construction algorithm. We complement our Monte Carlo algorithm with a deterministic verification procedure. The verification takes $O(n\sqrt{\log b})$ time, which improves upon the bound of $O(n\log b)$ obtained by I et al. [STACS 2014]. This is obtained by first observing that the pruning done inside the previous solution has a rather clean description using the notion of graph spanners with small multiplicative stretch. Then, we are able to decrease the verification time by applying difference covers twice. Combined with the Monte Carlo algorithm, this gives us an $O(n\sqrt{\log b})$-time and $O(b)$-space Las Vegas algorithm.

preprint2015arXiv

Approximating LZ77 via Small-Space Multiple-Pattern Matching

We generalize Karp-Rabin string matching to handle multiple patterns in $\mathcal{O}(n \log n + m)$ time and $\mathcal{O}(s)$ space, where $n$ is the length of the text and $m$ is the total length of the $s$ patterns, returning correct answers with high probability. As a prime application of our algorithm, we show how to approximate the LZ77 parse of a string of length $n$. If the optimal parse consists of $z$ phrases, using only $\mathcal{O}(z)$ working space we can return a parse consisting of at most $(1+\varepsilon)z$ phrases in $\mathcal{O}(\varepsilon^{-1}n\log n)$ time, for any $\varepsilon\in (0,1]$. As previous quasilinear-time algorithms for LZ77 use $Ω(n/\textrm{polylog }n)$ space, but $z$ can be exponentially small in $n$, these improvements in space are substantial.

preprint2015arXiv

Fast Computation of Abelian Runs

Given a word $w$ and a Parikh vector $\mathcal{P}$, an abelian run of period $\mathcal{P}$ in $w$ is a maximal occurrence of a substring of $w$ having abelian period $\mathcal{P}$. Our main result is an online algorithm that, given a word $w$ of length $n$ over an alphabet of cardinality $σ$ and a Parikh vector $\mathcal{P}$, returns all the abelian runs of period $\mathcal{P}$ in $w$ in time $O(n)$ and space $O(σ+p)$, where $p$ is the norm of $\mathcal{P}$, i.e., the sum of its components. We also present an online algorithm that computes all the abelian runs with periods of norm $p$ in $w$ in time $O(np)$, for any given norm $p$. Finally, we give an $O(n^2)$-time offline randomized algorithm for computing all the abelian runs of $w$. Its deterministic counterpart runs in $O(n^2\logσ)$ time.

preprint2015arXiv

On the Greedy Algorithm for the Shortest Common Superstring Problem with Reversals

We study a variation of the classical Shortest Common Superstring (SCS) problem in which a shortest superstring of a finite set of strings $S$ is sought containing as a factor every string of $S$ or its reversal. We call this problem Shortest Common Superstring with Reversals (SCS-R). This problem has been introduced by Jiang et al., who designed a greedy-like algorithm with length approximation ratio $4$. In this paper, we show that a natural adaptation of the classical greedy algorithm for SCS has (optimal) compression ratio $\frac12$, i.e., the sum of the overlaps in the output string is at least half the sum of the overlaps in an optimal solution. We also provide a linear-time implementation of our algorithm.

preprint2015arXiv

Wavelet Trees Meet Suffix Trees

We present an improved wavelet tree construction algorithm and discuss its applications to a number of rank/select problems for integer keys and strings. Given a string of length n over an alphabet of size $σ\leq n$, our method builds the wavelet tree in $O(n \log σ/ \sqrt{\log{n}})$ time, improving upon the state-of-the-art algorithm by a factor of $\sqrt{\log n}$. As a consequence, given an array of n integers we can construct in $O(n \sqrt{\log n})$ time a data structure consisting of $O(n)$ machine words and capable of answering rank/select queries for the subranges of the array in $O(\log n / \log \log n)$ time. This is a $\log \log n$-factor improvement in query time compared to Chan and Pătraşcu and a $\sqrt{\log n}$-factor improvement in construction time compared to Brodal et al. Next, we switch to stringological context and propose a novel notion of wavelet suffix trees. For a string w of length n, this data structure occupies $O(n)$ words, takes $O(n \sqrt{\log n})$ time to construct, and simultaneously captures the combinatorial structure of substrings of w while enabling efficient top-down traversal and binary search. In particular, with a wavelet suffix tree we are able to answer in $O(\log |x|)$ time the following two natural analogues of rank/select queries for suffixes of substrings: for substrings x and y of w count the number of suffixes of x that are lexicographically smaller than y, and for a substring x of w and an integer k, find the k-th lexicographically smallest suffix of x. We further show that wavelet suffix trees allow to compute a run-length-encoded Burrows-Wheeler transform of a substring x of w in $O(s \log |x|)$ time, where s denotes the length of the resulting run-length encoding. This answers a question by Cormode and Muthukrishnan, who considered an analogous problem for Lempel-Ziv compression.

preprint2014arXiv

An LP-Rounding $2\sqrt{2}$ Approximation for Restricted Maximum Acyclic Subgraph

In the classical Maximum Acyclic Subgraph problem (MAS), given a directed-edge weighted graph, we are required to find an ordering of the nodes that maximizes the total weight of forward-directed edges. MAS admits a 2 approximation, and this approximation is optimal under the Unique Game Conjecture. In this paper we consider a generalization of MAS, the Restricted Maximum Acyclic Subgraph problem (RMAS), where each node is associated with a list of integer labels, and we have to find a labeling of the nodes so as to maximize the weight of edges whose head label is larger than the tail label. The best known (almost trivial) approximation for RMAS is 4. The interest of RMAS is mostly due to its connections with the Vertex Pricing problem (VP). In VP we are given an undirected graph with positive edge budgets. A feasible solution consists of an assignment of non-negative prices to the nodes. The profit for each edge $e$ is the sum of its endpoints prices if that sum is at most the budget of $e$, and zero otherwise. Our goal is to maximize the total profit. The best known approximation for VP, which works analogously to the mentioned approximation algorithm for RMAS, is 4. Improving on that is a challenging open problem. On the other hand, the best known 2 inapproximability result is due to a reduction from a special case of RMAS. In this paper we present an improved LP-rounding $2\sqrt{2}$ approximation for RMAS. Our result shows that, in order to prove a 4 hardness of approximation result for VP (if possible), one should consider reductions from harder problems. Alternatively, our approach might suggest a different way to design approximation algorithms for VP.

preprint2014arXiv

Approximating Upper Degree-Constrained Partial Orientations

In the Upper Degree-Constrained Partial Orientation problem we are given an undirected graph $G=(V,E)$, together with two degree constraint functions $d^-,d^+ : V \to \mathbb{N}$. The goal is to orient as many edges as possible, in such a way that for each vertex $v \in V$ the number of arcs entering $v$ is at most $d^-(v)$, whereas the number of arcs leaving $v$ is at most $d^+(v)$. This problem was introduced by Gabow [SODA'06], who proved it to be MAXSNP-hard (and thus APX-hard). In the same paper Gabow presented an LP-based iterative rounding $4/3$-approximation algorithm. Since the problem in question is a special case of the classic 3-Dimensional Matching, which in turn is a special case of the $k$-Set Packing problem, it is reasonable to ask whether recent improvements in approximation algorithms for the latter two problems [Cygan, FOCS'13; Sviridenko & Ward, ICALP'13] allow for an improved approximation for Upper Degree-Constrained Partial Orientation. We follow this line of reasoning and present a polynomial-time local search algorithm with approximation ratio $5/4+\varepsilon$. Our algorithm uses a combination of two types of rules: improving sets of bounded pathwidth from the recent $4/3+\varepsilon$-approximation algorithm for 3-Set Packing [Cygan, FOCS'13], and a simple rule tailor-made for the setting of partial orientations. In particular, we exploit the fact that one can check in polynomial time whether it is possible to orient all the edges of a given graph [Gyárfás & Frank, Combinatorics'76].

preprint2014arXiv

Constant Factor Approximation for Capacitated k-Center with Outliers

The $k$-center problem is a classic facility location problem, where given an edge-weighted graph $G = (V,E)$ one is to find a subset of $k$ vertices $S$, such that each vertex in $V$ is "close" to some vertex in $S$. The approximation status of this basic problem is well understood, as a simple 2-approximation algorithm is known to be tight. Consequently different extensions were studied. In the capacitated version of the problem each vertex is assigned a capacity, which is a strict upper bound on the number of clients a facility can serve, when located at this vertex. A constant factor approximation for the capacitated $k$-center was obtained last year by Cygan, Hajiaghayi and Khuller [FOCS'12], which was recently improved to a 9-approximation by An, Bhaskara and Svensson [arXiv'13]. In a different generalization of the problem some clients (denoted as outliers) may be disregarded. Here we are additionally given an integer $p$ and the goal is to serve exactly $p$ clients, which the algorithm is free to choose. In 2001 Charikar et al. [SODA'01] presented a 3-approximation for the $k$-center problem with outliers. In this paper we consider a common generalization of the two extensions previously studied separately, i.e. we work with the capacitated $k$-center with outliers. We present the first constant factor approximation algorithm with approximation ratio of 25 even for the case of non-uniform hard capacities.

preprint2014arXiv

Covering Problems for Partial Words and for Indeterminate Strings

We consider the problem of computing a shortest solid cover of an indeterminate string. An indeterminate string may contain non-solid symbols, each of which specifies a subset of the alphabet that could be present at the corresponding position. We also consider covering partial words, which are a special case of indeterminate strings where each non-solid symbol is a don't care symbol. We prove that indeterminate string covering problem and partial word covering problem are NP-complete for binary alphabet and show that both problems are fixed-parameter tractable with respect to $k$, the number of non-solid symbols. For the indeterminate string covering problem we obtain a $2^{O(k \log k)} + n k^{O(1)}$-time algorithm. For the partial word covering problem we obtain a $2^{O(\sqrt{k}\log k)} + nk^{O(1)}$-time algorithm. We prove that, unless the Exponential Time Hypothesis is false, no $2^{o(\sqrt{k})} n^{O(1)}$-time solution exists for either problem, which shows that our algorithm for this case is close to optimal. We also present an algorithm for both problems which is feasible in practice.

preprint2014arXiv

On the String Consensus Problem and the Manhattan Sequence Consensus Problem

In the Manhattan Sequence Consensus problem (MSC problem) we are given $k$ integer sequences, each of length $l$, and we are to find an integer sequence $x$ of length $l$ (called a consensus sequence), such that the maximum Manhattan distance of $x$ from each of the input sequences is minimized. For binary sequences Manhattan distance coincides with Hamming distance, hence in this case the string consensus problem (also called string center problem or closest string problem) is a special case of MSC. Our main result is a practically efficient $O(l)$-time algorithm solving MSC for $k\le 5$ sequences. Practicality of our algorithms has been verified experimentally. It improves upon the quadratic algorithm by Amir et al.\ (SPIRE 2012) for string consensus problem for $k=5$ binary strings. Similarly as in Amir's algorithm we use a column-based framework. We replace the implied general integer linear programming by its easy special cases, due to combinatorial properties of the MSC for $k\le 5$. We also show that for a general parameter $k$ any instance can be reduced in linear time to a kernel of size $k!$, so the problem is fixed-parameter tractable. Nevertheless, for $k\ge 4$ this is still too large for any naive solution to be feasible in practice.

preprint2014arXiv

Sublinear Space Algorithms for the Longest Common Substring Problem

Given $m$ documents of total length $n$, we consider the problem of finding a longest string common to at least $d \geq 2$ of the documents. This problem is known as the \emph{longest common substring (LCS) problem} and has a classic $O(n)$ space and $O(n)$ time solution (Weiner [FOCS'73], Hui [CPM'92]). However, the use of linear space is impractical in many applications. In this paper we show that for any trade-off parameter $1 \leq τ\leq n$, the LCS problem can be solved in $O(τ)$ space and $O(n^2/τ)$ time, thus providing the first smooth deterministic time-space trade-off from constant to linear space. The result uses a new and very simple algorithm, which computes a $τ$-additive approximation to the LCS in $O(n^2/τ)$ time and $O(1)$ space. We also show a time-space trade-off lower bound for deterministic branching programs, which implies that any deterministic RAM algorithm solving the LCS problem on documents from a sufficiently large alphabet in $O(τ)$ space must use $Ω(n\sqrt{\log(n/(τ\log n))/\log\log(n/(τ\log n)})$ time.

preprint2013arXiv

Fast Algorithm for Partial Covers in Words

A factor $u$ of a word $w$ is a cover of $w$ if every position in $w$ lies within some occurrence of $u$ in $w$. A word $w$ covered by $u$ thus generalizes the idea of a repetition, that is, a word composed of exact concatenations of $u$. In this article we introduce a new notion of $α$-partial cover, which can be viewed as a relaxed variant of cover, that is, a factor covering at least $α$ positions in $w$. We develop a data structure of $O(n)$ size (where $n=|w|$) that can be constructed in $O(n\log n)$ time which we apply to compute all shortest $α$-partial covers for a given $α$. We also employ it for an $O(n\log n)$-time algorithm computing a shortest $α$-partial cover for each $α=1,2,\ldots,n$.

preprint2013arXiv

Fast branching algorithm for Cluster Vertex Deletion

In the family of clustering problems, we are given a set of objects (vertices of the graph), together with some observed pairwise similarities (edges). The goal is to identify clusters of similar objects by slightly modifying the graph to obtain a cluster graph (disjoint union of cliques). Hueffner et al. [Theory Comput. Syst. 2010] initiated the parameterized study of Cluster Vertex Deletion, where the allowed modification is vertex deletion, and presented an elegant O(2^k * k^9 + n * m)-time fixed-parameter algorithm, parameterized by the solution size. In our work, we pick up this line of research and present an O(1.9102^k * (n + m))-time branching algorithm.

preprint2013arXiv

Faster deterministic Feedback Vertex Set

We present two new deterministic algorithms for the Feedback Vertex Set problem parameterized by the solution size. We begin with a simple algorithm, which runs in O*((2 + ϕ)^k) time, where ϕ< 1.619 is the golden ratio. It already surpasses the previously fastest O*((1+2sqrt(2))^k)-time deterministic algorithm due to Cao et al. [SWAT 2010]. In our developments we follow the approach of Cao et al., however, thanks to a new reduction rule, we obtain not only better dependency on the parameter in the running time, but also a solution with simple analysis and only a single branching rule. Then, we present a modification of the algorithm which, using a more involved set of branching rules, achieves O*(3.592^k) running time.

preprint2013arXiv

Order-Preserving Suffix Trees and Their Algorithmic Applications

Recently Kubica et al. (Inf. Process. Let., 2013) and Kim et al. (submitted to Theor. Comp. Sci.) introduced order-preserving pattern matching. In this problem we are looking for consecutive substrings of the text that have the same "shape" as a given pattern. These results include a linear-time order-preserving pattern matching algorithm for polynomially-bounded alphabet and an extension of this result to pattern matching with multiple patterns. We make one step forward in the analysis and give an $O(\frac{n\log{n}}{\log\log{n}})$ time randomized algorithm constructing suffix trees in the order-preserving setting. We show a number of applications of order-preserving suffix trees to identify patterns and repetitions in time series.

preprint2013arXiv

Substring Suffix Selection

We study the following substring suffix selection problem: given a substring of a string T of length n, compute its k-th lexicographically smallest suffix. This a natural generalization of the well-known question of computing the maximal suffix of a string, which is a basic ingredient in many other problems. We first revisit two special cases of the problem, introduced by Babenko, Kolesnichenko and Starikovskaya [CPM'13], in which we are asked to compute the minimal non-empty and the maximal suffixes of a substring. For the maximal suffixes problem, we give a linear-space structure with O(1) query time and linear preprocessing time, i.e., we manage to achieve optimal construction and optimal query time simultaneously. For the minimal suffix problem, we give a linear-space data structure with O(τ) query time and O(n log n / τ) preprocessing time, where 1 <= τ<= log n is a parameter of the data structure. As a sample application, we show that this data structure can be used to compute the Lyndon decomposition of any substring of T in O(k τ) time, where k is the number of distinct factors in the decomposition. Finally, we move to the general case of the substring suffix selection problem, where using any combinatorial properties seems more difficult. Nevertheless, we develop a linear-space data structure with O(log^{2+ε} n) query time.

preprint2012arXiv

A Note on Efficient Computation of All Abelian Periods in a String

We derive a simple efficient algorithm for Abelian periods knowing all Abelian squares in a string. An efficient algorithm for the latter problem was given by Cummings and Smyth in 1997. By the way we show an alternative algorithm for Abelian squares. We also obtain a linear time algorithm finding all `long' Abelian periods. The aim of the paper is a (new) reduction of the problem of all Abelian periods to that of (already solved) all Abelian squares which provides new insight into both connected problems.

Tomasz Kociumaka

What is connected

Connect this record

See the researcher in context

Building this map preview

39 published item(s)

An Improved Algorithm for The $k$-Dyck Edit Distance Problem

Computing Longest (Common) Lyndon Subsequences

Dynamic Suffix Array with Polylogarithmic Queries and Updates

Faster Pattern Matching under Edit Distance

The Dynamic k-Mismatch Problem

Towards a Definitive Compressibility Measure for Repetitive Sequences

Approximating longest common substring with $k$ mismatches: Theory and practice

Approximating Text-to-Pattern Hamming Distances

Circular Pattern Matching with $k$ Mismatches

Counting Distinct Patterns in Internal Dictionary Matching

Efficient Representation and Counting of Antipower Factors in Words

Improved Circular $k$-Mismatch Sketches

The Streaming k-Mismatch Problem: Tradeoffs between Space and Total Time

Time-Space Tradeoffs for Finding a Long Common Substring

Efficient Index for Weighted Sequences

Faster Longest Common Extension Queries in Strings over General Alphabets

Hardness of approximation for strip packing

Maximum Number of Distinct and Nonequivalent Nonstandard Squares in a Word

Minimal Suffix and Rotation of a Substring in Optimal Time

Near-Optimal Computation of Runs over General Alphabet via Non-Crossing LCE Queries

Optimal Dynamic Strings

Pattern Matching and Consensus Problems on Weighted Sequences and Profiles

Sparse Suffix Tree Construction in Optimal Time and Space

Approximating LZ77 via Small-Space Multiple-Pattern Matching

Fast Computation of Abelian Runs

On the Greedy Algorithm for the Shortest Common Superstring Problem with Reversals

Wavelet Trees Meet Suffix Trees

An LP-Rounding $2\sqrt{2}$ Approximation for Restricted Maximum Acyclic Subgraph

Approximating Upper Degree-Constrained Partial Orientations

Constant Factor Approximation for Capacitated k-Center with Outliers

Covering Problems for Partial Words and for Indeterminate Strings

On the String Consensus Problem and the Manhattan Sequence Consensus Problem

Sublinear Space Algorithms for the Longest Common Substring Problem

Fast Algorithm for Partial Covers in Words

Fast branching algorithm for Cluster Vertex Deletion

Faster deterministic Feedback Vertex Set

Order-Preserving Suffix Trees and Their Algorithmic Applications

Substring Suffix Selection

A Note on Efficient Computation of All Abelian Periods in a String