Source author record

Dominik Köppl

Dominik Köppl appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Information Retrieval math.AP

Catalog footprint

What is connected

15works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Computing Longest (Common) Lyndon Subsequences

Given a string $T$ with length $n$ whose characters are drawn from an ordered alphabet of size $σ$, its longest Lyndon subsequence is a longest subsequence of $T$ that is a Lyndon word. We propose algorithms for finding such a subsequence in $O(n^3)$ time with $O(n)$ space, or online in $O(n^3 σ)$ space and time. Our first result can be extended to find the longest common Lyndon subsequence of two strings of length $n$ in $O(n^4 σ)$ time using $O(n^3)$ space.

preprint2022arXiv

Computing NP-hard Repetitiveness Measures via MAX-SAT

Repetitiveness measures reveal profound characteristics of datasets, and give rise to compressed data structures and algorithms working in compressed space. Alas, the computation of some of these measures is NP-hard, and straight-forward computation is infeasible for datasets of even small sizes. Three such measures are the smallest size of a string attractor, the smallest size of a bidirectional macro scheme, and the smallest size of a straight-line program. While a vast variety of implementations for heuristically computing approximations exist, exact computation of these measures has received little to no attention. In this paper, we present MAX-SAT formulations that provide the first non-trivial implementations for exact computation of smallest string attractors, smallest bidirectional macro schemes, and smallest straight-line programs. Computational experiments show that our implementations work for texts of length up to a few hundred for straight-line programs and bidirectional macro schemes, and texts even over a million for string attractors.

preprint2022arXiv

Computing the Parameterized Burrows--Wheeler Transform Online

Parameterized strings are a generalization of strings in that their characters are drawn from two different alphabets, where one is considered to be the alphabet of static characters and the other to be the alphabet of parameter characters. Two parameterized strings are a parameterized match if there is a bijection over all characters such that the bijection transforms one string to the other while keeping the static characters (i.e., it behaves as the identity on the static alphabet). Ganguly et al. [SODA 2017] proposed the parameterized Burrows--Wheeler transform (pBWT) as a variant of the Burrows--Wheeler transform for space-efficient parameterized pattern matching. In this paper, we propose an algorithm for computing the pBWT online by reading the characters of a given input string one-by-one from right to left. Our algorithm works in $O(|Π| \log n / \log \log n)$ amortized time for each input character, where $n$ and $Π$ denote the size of the input string and the alphabet of the parameter characters, respectively.

preprint2022arXiv

Improving Matrix-vector Multiplication via Lossless Grammar-Compressed Matrices

As nowadays Machine Learning (ML) techniques are generating huge data collections, the problem of how to efficiently engineer their storage and operations is becoming of paramount importance. In this article we propose a new lossless compression scheme for real-valued matrices which achieves efficient performance in terms of compression ratio and time for linear-algebra operations. Experiments show that, as a compressor, our tool is clearly superior to gzip and it is usually within 20% of xz in terms of compression ratio. In addition, our compressed format supports matrix-vector multiplications in time and space proportional to the size of the compressed representation, unlike gzip and xz that require the full decompression of the compressed matrix. To our knowledge our lossless compressor is the first one achieving time and space complexities which match the theoretical limit expressed by the $k$-th order statistical entropy of the input. To achieve further time/space reductions, we propose column-reordering algorithms hinging on a novel column-similarity score. Our experiments on various data sets of ML matrices show that, with a modest preprocessing time, our column reordering can yield a further reduction of up to 16% in the peak memory usage during matrix-vector multiplication. Finally, we compare our proposal against the state-of-the-art Compressed Linear Algebra (CLA) approach showing that ours runs always at least twice faster (in a multi-thread setting) and achieves better compressed space occupancy for most of the tested data sets. This experimentally confirms the provably effective theoretical bounds we show for our compressed-matrix approach.

preprint2022arXiv

Longest (Sub-)Periodic Subsequence

We present an algorithm computing the longest periodic subsequence of a string of length $n$ in $O(n^7)$ time with $O(n^4)$ words of space. We obtain improvements when restricting the exponents or extending the search allowing the reported subsequence to be subperiodic down to $O(n^3)$ time and $O(n^2)$ words of space.

preprint2021arXiv

PHONI: Streamed Matching Statistics with Multi-Genome References

Computing the matching statistics of patterns with respect to a text is a fundamental task in bioinformatics, but a formidable one when the text is a highly compressed genomic database. Bannai et al. gave an efficient solution for this case, which Rossi et al. recently implemented, but it uses two passes over the patterns and buffers a pointer for each character during the first pass. In this paper, we simplify their solution and make it streaming, at the cost of slowing it down slightly. This means that, first, we can compute the matching statistics of several long patterns (such as whole human chromosomes) in parallel while still using a reasonable amount of RAM; second, we can compute matching statistics online with low latency and thus quickly recognize when a pattern becomes incompressible relative to the database.

preprint2020arXiv

Dynamic Path-Decomposed Tries

A keyword dictionary is an associative array whose keys are strings. Recent applications handling massive keyword dictionaries in main memory have a need for a space-efficient implementation. When limited to static applications, there are a number of highly-compressed keyword dictionaries based on the advancements of practical succinct data structures. However, as most succinct data structures are only efficient in the static case, it is still difficult to implement a keyword dictionary that is space efficient and dynamic. In this article, we propose such a keyword dictionary. Our main idea is to embrace the path decomposition technique, which was proposed for constructing cache-friendly tries. To store the path-decomposed trie in small memory, we design data structures based on recent compact hash trie representations. Experiments on real-world datasets reveal that our dynamic keyword dictionary needs up to 68% less space than the existing smallest ones, while achieving a relevant space-time tradeoff.

preprint2020arXiv

Grammar-compressed Self-index with Lyndon Words

We introduce a new class of straight-line programs (SLPs), named the Lyndon SLP, inspired by the Lyndon trees (Barcelo, 1990). Based on this SLP, we propose a self-index data structure of $O(g)$ words of space that can be built from a string $T$ in $O(n \lg n)$ expected time, retrieving the starting positions of all occurrences of a pattern $P$ of length $m$ in $O(m + \lg m \lg n + occ \lg g)$ time, where $n$ is the length of $T$, $g$ is the size of the Lyndon SLP for $T$, and $occ$ is the number of occurrences of $P$ in $T$.

preprint2020arXiv

In-Place Bijective Burrows-Wheeler Transforms

One of the most well-known variants of the Burrows-Wheeler transform (BWT) [Burrows and Wheeler, 1994] is the bijective BWT (BBWT) [Gil and Scott, arXiv 2012], which applies the extended BWT (EBWT) [Mantaci et al., TCS 2007] to the multiset of Lyndon factors of a given text. Since the EBWT is invertible, the BBWT is a bijective transform in the sense that the inverse image of the EBWT restores this multiset of Lyndon factors such that the original text can be obtained by sorting these factors in non-increasing order. In this paper, we present algorithms constructing or inverting the BBWT in-place using quadratic time. We also present conversions from the BBWT to the BWT, or vice versa, either (a) in-place using quadratic time, or (b) in the run-length compressed setting using $O(n \lg r / \lg \lg r)$ time with $O(r \lg n)$ bits of words, where $r$ is the sum of character runs in the BWT and the BBWT.

preprint2020arXiv

Space-Efficient Algorithms for Computing Minimal/Shortest Unique Substrings

Given a string $T$ of length $n$, a substring $u = T[i..j]$ of $T$ is called a shortest unique substring (SUS) for an interval $[s,t]$ if (a) $u$ occurs exactly once in $T$, (b) $u$ contains the interval $[s,t]$ (i.e. $i \leq s \leq t \leq j$), and (c) every substring $v$ of $T$ with $|v| < |u|$ containing $[s,t]$ occurs at least twice in $T$. Given a query interval $[s, t] \subset [1, n]$, the interval SUS problem is to output all the SUSs for the interval $[s,t]$. In this article, we propose a $4n + o(n)$ bits data structure answering an interval SUS query in output-sensitive $O(\mathit{occ})$ time, where $\mathit{occ}$ is the number of returned SUSs. Additionally, we focus on the point SUS problem, which is the interval SUS problem for $s = t$. Here, we propose a $\lceil (\log_2{3} + 1)n \rceil + o(n)$ bits data structure answering a point SUS query in the same output-sensitive time. We also propose space-efficient algorithms for computing the minimal unique substrings of $T$.

preprint2016arXiv

Lempel-Ziv Computation In Compressed Space (LZ-CICS)

We show that both the Lempel Ziv 77- and the 78-factorization of a text of length $n$ on an integer alphabet of size $σ$ can be computed in $O(n \lg \lg σ)$ time (linear time if we allow randomization) using $O(n \lg σ)$ bits of working space. Given that a compressed representation of the suffix tree is loaded into RAM, we can compute both factorizations in $O(n)$ time using $z \lg n + O(n)$ bits of space, where $z$ is the number of factors.

preprint2016arXiv

On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching

We present parallel algorithms for exact and approximate pattern matching with suffix arrays, using a CREW-PRAM with $p$ processors. Given a static text of length $n$, we first show how to compute the suffix array interval of a given pattern of length $m$ in $O(\frac{m}{p}+ \lg p + \lg\lg p\cdot\lg\lg n)$ time for $p \le m$. For approximate pattern matching with $k$ differences or mismatches, we show how to compute all occurrences of a given pattern in $O(\frac{m^kσ^k}{p}\max\left(k,\lg\lg n\right)\!+\!(1+\frac{m}{p}) \lg p\cdot \lg\lg n + \text{occ})$ time, where $σ$ is the size of the alphabet and $p \le σ^k m^k$. The workhorse of our algorithms is a data structure for merging suffix array intervals quickly: Given the suffix array intervals for two patterns $P$ and $P'$, we present a data structure for computing the interval of $PP'$ in $O(\lg\lg n)$ sequential time, or in $O(1+\lg_p\lg n)$ parallel time. All our data structures are of size $O(n)$ bits (in addition to the suffix array).

preprint2015arXiv

Efficiently Finding All Maximal $α$-gapped Repeats

For $α\geq 1$, an $α$-gapped repeat in a word $w$ is a factor $uvu$ of $w$ such that $|uv|\leq α|u|$; the two factors $u$ in such a repeat are called arms, while the factor $v$ is called gap. Such a repeat is called maximal if its arms cannot be extended simultaneously with the same symbol to the right or, respectively, to the left. In this paper we show that the number of maximal $α$-gapped repeats that may occur in a word is upper bounded by $18αn$. This allows us to construct an algorithm finding all the maximal $α$-gapped repeats of a word in $O(αn)$; this is optimal, in the worst case, as there are words that have $Θ(αn)$ maximal $α$-gapped repeats. Our techniques can be extended to get comparable results in the case of $α$-gapped palindromes, i.e., factors $uvu^\mathrm{T}$ with $|uv|\leq α|u|$.

preprint2015arXiv

Lempel Ziv Computation In Small Space (LZ-CISS)

For both the Lempel Ziv 77- and 78-factorization we propose algorithms generating the respective factorization using $(1+ε) n \lg n + O(n)$ bits (for any positive constant $ε\le 1$) working space (including the space for the output) for any text of size \$n\$ over an integer alphabet in $O(n / ε^{2})$ time.

preprint2013arXiv

Pseudodifferentialoperatoren mit nichtglatten Koeffizienten auf Mannigfaltigkeiten

A generalization of differential operators are pseudodifferential operators which are used for reasoning about partial differential equations with variable coefficients. A lot of useful properties about classical pseudodifferential operators are already discovered by the works of H. Kumano-Go and L. Hörmander. In this work, we like to transfer their ideas on those operators which symbols are merely Hölder-continous. We focus on the composition of operators, symbol smoothing and operators on manifolds.

Dominik Köppl

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

Computing Longest (Common) Lyndon Subsequences

Computing NP-hard Repetitiveness Measures via MAX-SAT

Computing the Parameterized Burrows--Wheeler Transform Online

Improving Matrix-vector Multiplication via Lossless Grammar-Compressed Matrices

Longest (Sub-)Periodic Subsequence

PHONI: Streamed Matching Statistics with Multi-Genome References

Dynamic Path-Decomposed Tries

Grammar-compressed Self-index with Lyndon Words

In-Place Bijective Burrows-Wheeler Transforms

Space-Efficient Algorithms for Computing Minimal/Shortest Unique Substrings

Lempel-Ziv Computation In Compressed Space (LZ-CICS)

On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching

Efficiently Finding All Maximal $α$-gapped Repeats

Lempel Ziv Computation In Small Space (LZ-CISS)

Pseudodifferentialoperatoren mit nichtglatten Koeffizienten auf Mannigfaltigkeiten