Source author record

Patrick Hagge Cording

Patrick Hagge Cording appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms

Catalog footprint

What is connected

6works

1topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2016arXiv

Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation

Given a static reference string $R$ and a source string $S$, a relative compression of $S$ with respect to $R$ is an encoding of $S$ as a sequence of references to substrings of $R$. Relative compression schemes are a classic model of compression and have recently proved very successful for compressing highly-repetitive massive data sets such as genomes and web-data. We initiate the study of relative compression in a dynamic setting where the compressed source string $S$ is subject to edit operations. The goal is to maintain the compressed representation compactly, while supporting edits and allowing efficient random access to the (uncompressed) source string. We present new data structures that achieve optimal time for updates and queries while using space linear in the size of the optimal relative compression, for nearly all combinations of parameters. We also present solutions for restricted and extended sets of updates. To achieve these results, we revisit the dynamic partial sums problem and the substring concatenation problem. We present new optimal or near optimal bounds for these problems. Plugging in our new results we also immediately obtain new bounds for the string indexing for patterns with wildcards problem and the dynamic text and static pattern matching problem.

preprint2016arXiv

Finger Search in Grammar-Compressed Strings

Grammar-based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. Given a grammar, the random access problem is to compactly represent the grammar while supporting random access, that is, given a position in the original uncompressed string report the character at that position. In this paper we study the random access problem with the finger search property, that is, the time for a random access query should depend on the distance between a specified index $f$, called the \emph{finger}, and the query index $i$. We consider both a static variant, where we first place a finger and subsequently access indices near the finger efficiently, and a dynamic variant where also moving the finger such that the time depends on the distance moved is supported. Let $n$ be the size the grammar, and let $N$ be the size of the string. For the static variant we give a linear space representation that supports placing the finger in $O(\log N)$ time and subsequently accessing in $O(\log D)$ time, where $D$ is the distance between the finger and the accessed index. For the dynamic variant we give a linear space representation that supports placing the finger in $O(\log N)$ time and accessing and moving the finger in $O(\log D + \log \log N)$ time. Compared to the best linear space solution to random access, we improve a $O(\log N)$ query bound to $O(\log D)$ for the static variant and to $O(\log D + \log \log N)$ for the dynamic variant, while maintaining linear space. As an application of our results we obtain an improved solution to the longest common extension problem in grammar compressed strings. To obtain our results, we introduce several new techniques of independent interest, including a novel van Emde Boas style decomposition of grammars.

preprint2015arXiv

Optimal Time Random Access to Grammar-Compressed Strings in Small Space

The random access problem for compressed strings is to build a data structure that efficiently supports accessing the character in position $i$ of a string given in compressed form. Given a grammar of size $n$ compressing a string of size $N$, we present a data structure using $O(nΔ\log_Δ\frac N n \log N)$ bits of space that supports accessing position $i$ in $O(\log_ΔN)$ time for $Δ\leq \log^{O(1)} N$. The query time is optimal for polynomially compressible strings, i.e., when $n=O(N^{1-ε})$.

preprint2014arXiv

Compact q-gram Profiling of Compressed Strings

We consider the problem of computing the q-gram profile of a string \str of size $N$ compressed by a context-free grammar with $n$ production rules. We present an algorithm that runs in $O(N-α)$ expected time and uses $O(n+q+\kq)$ space, where $N-α\leq qn$ is the exact number of characters decompressed by the algorithm and $\kq\leq N-α$ is the number of distinct q-grams in $\str$. This simultaneously matches the current best known time bound and improves the best known space bound. Our space bound is asymptotically optimal in the sense that any algorithm storing the grammar and the q-gram profile must use $Ω(n+q+\kq)$ space. To achieve this we introduce the q-gram graph that space-efficiently captures the structure of a string with respect to its q-grams, and show how to construct it from a grammar.

preprint2014arXiv

Compressed Subsequence Matching and Packed Tree Coloring

We present a new algorithm for subsequence matching in grammar compressed strings. Given a grammar of size $n$ compressing a string of size $N$ and a pattern string of size $m$ over an alphabet of size $σ$, our algorithm uses $O(n+\frac{nσ}{w})$ space and $O(n+\frac{nσ}{w}+m\log N\log w\cdot occ)$ or $O(n+\frac{nσ}{w}\log w+m\log N\cdot occ)$ time. Here $w$ is the word size and $occ$ is the number of occurrences of the pattern. Our algorithm uses less space than previous algorithms and is also faster for $occ=o(\frac{n}{\log N})$ occurrences. The algorithm uses a new data structure that allows us to efficiently find the next occurrence of a given character after a given position in a compressed string. This data structure in turn is based on a new data structure for the tree color problem, where the node colors are packed in bit strings.

preprint2013arXiv

Fingerprints in Compressed Strings

The Karp-Rabin fingerprint of a string is a type of hash value that due to its strong properties has been used in many string algorithms. In this paper we show how to construct a data structure for a string $S$ of size $N$ compressed by a context-free grammar of size $n$ that answers fingerprint queries. That is, given indices $i$ and $j$, the answer to a query is the fingerprint of the substring $S[i,j]$. We present the first O(n) space data structures that answer fingerprint queries without decompressing any characters. For Straight Line Programs (SLP) we get $O(\log N)$ query time, and for Linear SLPs (an SLP derivative that captures LZ78 compression and its variations) we get $O(\log \log N)$ query time. Hence, our data structures has the same time and space complexity as for random access in SLPs. We utilize the fingerprint data structures to solve the longest common extension problem in query time $O(\log N \log \lce)$ and $O(\log \lce \log\log \lce + \log\log N)$ for SLPs and Linear SLPs, respectively. Here, $\lce$ denotes the length of the LCE.