Source author record

Hjalte Wedel Vildhøj

Hjalte Wedel Vildhøj appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Computational Complexity Discrete Mathematics math.CO

Catalog footprint

What is connected

10works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2016arXiv

Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation

Given a static reference string $R$ and a source string $S$, a relative compression of $S$ with respect to $R$ is an encoding of $S$ as a sequence of references to substrings of $R$. Relative compression schemes are a classic model of compression and have recently proved very successful for compressing highly-repetitive massive data sets such as genomes and web-data. We initiate the study of relative compression in a dynamic setting where the compressed source string $S$ is subject to edit operations. The goal is to maintain the compressed representation compactly, while supporting edits and allowing efficient random access to the (uncompressed) source string. We present new data structures that achieve optimal time for updates and queries while using space linear in the size of the optimal relative compression, for nearly all combinations of parameters. We also present solutions for restricted and extended sets of updates. To achieve these results, we revisit the dynamic partial sums problem and the substring concatenation problem. We present new optimal or near optimal bounds for these problems. Plugging in our new results we also immediately obtain new bounds for the string indexing for patterns with wildcards problem and the dynamic text and static pattern matching problem.

preprint2015arXiv

Longest Common Extensions in Sublinear Space

The longest common extension problem (LCE problem) is to construct a data structure for an input string $T$ of length $n$ that supports LCE$(i,j)$ queries. Such a query returns the length of the longest common prefix of the suffixes starting at positions $i$ and $j$ in $T$. This classic problem has a well-known solution that uses $O(n)$ space and $O(1)$ query time. In this paper we show that for any trade-off parameter $1 \leq τ\leq n$, the problem can be solved in $O(\frac{n}τ)$ space and $O(τ)$ query time. This significantly improves the previously best known time-space trade-offs, and almost matches the best known time-space product lower bound.

preprint2014arXiv

A Suffix Tree Or Not A Suffix Tree?

In this paper we study the structure of suffix trees. Given an unlabeled tree $τ$ on $n$ nodes and suffix links of its internal nodes, we ask the question "Is $τ$ a suffix tree?", i.e., is there a string $S$ whose suffix tree has the same topological structure as $τ$? We place no restrictions on $S$, in particular we do not require that $S$ ends with a unique symbol. This corresponds to considering the more general definition of implicit or extended suffix trees. Such general suffix trees have many applications and are for example needed to allow efficient updates when suffix trees are built online. We prove that $τ$ is a suffix tree if and only if it is realized by a string $S$ of length $n-1$, and we give a linear-time algorithm for inferring $S$ when the first letter on each edge is known. This generalizes the work of I et al. [Discrete Appl. Math. 163, 2014].

preprint2014arXiv

Sublinear Space Algorithms for the Longest Common Substring Problem

Given $m$ documents of total length $n$, we consider the problem of finding a longest string common to at least $d \geq 2$ of the documents. This problem is known as the \emph{longest common substring (LCS) problem} and has a classic $O(n)$ space and $O(n)$ time solution (Weiner [FOCS'73], Hui [CPM'92]). However, the use of linear space is impractical in many applications. In this paper we show that for any trade-off parameter $1 \leq τ\leq n$, the LCS problem can be solved in $O(τ)$ space and $O(n^2/τ)$ time, thus providing the first smooth deterministic time-space trade-off from constant to linear space. The result uses a new and very simple algorithm, which computes a $τ$-additive approximation to the LCS in $O(n^2/τ)$ time and $O(1)$ space. We also show a time-space trade-off lower bound for deterministic branching programs, which implies that any deterministic RAM algorithm solving the LCS problem on documents from a sufficiently large alphabet in $O(τ)$ space must use $Ω(n\sqrt{\log(n/(τ\log n))/\log\log(n/(τ\log n)})$ time.

preprint2013arXiv

Fingerprints in Compressed Strings

The Karp-Rabin fingerprint of a string is a type of hash value that due to its strong properties has been used in many string algorithms. In this paper we show how to construct a data structure for a string $S$ of size $N$ compressed by a context-free grammar of size $n$ that answers fingerprint queries. That is, given indices $i$ and $j$, the answer to a query is the fingerprint of the substring $S[i,j]$. We present the first O(n) space data structures that answer fingerprint queries without decompressing any characters. For Straight Line Programs (SLP) we get $O(\log N)$ query time, and for Linear SLPs (an SLP derivative that captures LZ78 compression and its variations) we get $O(\log \log N)$ query time. Hence, our data structures has the same time and space complexity as for random access in SLPs. We utilize the fingerprint data structures to solve the longest common extension problem in query time $O(\log N \log \lce)$ and $O(\log \lce \log\log \lce + \log\log N)$ for SLPs and Linear SLPs, respectively. Here, $\lce$ denotes the length of the LCE.

preprint2013arXiv

The Hardness of the Functional Orientation 2-Color Problem

We consider the Functional Orientation 2-Color problem, which was introduced by Valiant in his seminal paper on holographic algorithms [SIAM J. Comput., 37(5), 2008]. For this decision problem, Valiant gave a polynomial time holographic algorithm for planar graphs of maximum degree 3, and showed that the problem is NP-complete for planar graphs of maximum degree 10. A recent result on defective graph coloring by Corrêa et al. [Australas. J. Combin., 43, 2009] implies that the problem is already hard for planar graphs of maximum degree 8. Together, these results leave open the hardness question for graphs of maximum degree between 4 and 7. We close this gap by showing that the answer is always yes for arbitrary graphs of maximum degree 5, and that the problem is NP-complete for planar graphs of maximum degree 6. Moreover, for graphs of maximum degree 5, we note that a linear time algorithm for finding a solution exists.

preprint2013arXiv

Time-Space Trade-Offs for Longest Common Extensions

We revisit the longest common extension (LCE) problem, that is, preprocess a string $T$ into a compact data structure that supports fast LCE queries. An LCE query takes a pair $(i,j)$ of indices in $T$ and returns the length of the longest common prefix of the suffixes of $T$ starting at positions $i$ and $j$. We study the time-space trade-offs for the problem, that is, the space used for the data structure vs. the worst-case time for answering an LCE query. Let $n$ be the length of $T$. Given a parameter $τ$, $1 \leq τ\leq n$, we show how to achieve either $O(\infrac{n}{\sqrtτ})$ space and $O(τ)$ query time, or $O(\infrac{n}τ)$ space and $O(τ\log({|\LCE(i,j)|}/τ))$ query time, where $|\LCE(i,j)|$ denotes the length of the LCE returned by the query. These bounds provide the first smooth trade-offs for the LCE problem and almost match the previously known bounds at the extremes when $τ=1$ or $τ=n$. We apply the result to obtain improved bounds for several applications where the LCE problem is the computational bottleneck, including approximate string matching and computing palindromes. We also present an efficient technique to reduce LCE queries on two strings to one string. Finally, we give a lower bound on the time-space product for LCE data structures in the non-uniform cell probe model showing that our second trade-off is nearly optimal.

preprint2012arXiv

Sparse Suffix Tree Construction with Small Space

We consider the problem of constructing a sparse suffix tree (or suffix array) for $b$ suffixes of a given text $T$ of size $n$, using only $O(b)$ words of space during construction time. Breaking the naive bound of $Ω(nb)$ time for this problem has occupied many algorithmic researchers since a different structure, the (evenly spaced) sparse suffix tree, was introduced by K{ä}rkk{ä}inen and Ukkonen in 1996. While in the evenly spaced sparse suffix tree the suffixes considered must be evenly spaced in $T$, here there is no constraint on the locations of the suffixes. We show that the sparse suffix tree can be constructed in $O(n\log^2b)$ time. To achieve this we develop a technique, which may be of independent interest, that allows to efficiently answer $b$ longest common prefix queries on suffixes of $T$, using only $O(b)$ space. We expect that this technique will prove useful in many other applications in which space usage is a concern. Furthermore, additional tradeoffs between the space usage and the construction time are given.

preprint2012arXiv

String Indexing for Patterns with Wildcards

We consider the problem of indexing a string $t$ of length $n$ to report the occurrences of a query pattern $p$ containing $m$ characters and $j$ wildcards. Let $occ$ be the number of occurrences of $p$ in $t$, and $σ$ the size of the alphabet. We obtain the following results. - A linear space index with query time $O(m+σ^j \log \log n + occ)$. This significantly improves the previously best known linear space index by Lam et al. [ISAAC 2007], which requires query time $Θ(jn)$ in the worst case. - An index with query time $O(m+j+occ)$ using space $O(σ^{k^2} n \log^k \log n)$, where $k$ is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time. - A time-space trade-off, generalizing the index by Cole et al. [STOC 2004]. We also show that these indexes can be generalized to allow variable length gaps in the pattern. Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest.

preprint2011arXiv

String Matching with Variable Length Gaps

We consider string matching with variable length gaps. Given a string $T$ and a pattern $P$ consisting of strings separated by variable length gaps (arbitrary strings of length in a specified range), the problem is to find all ending positions of substrings in $T$ that match $P$. This problem is a basic primitive in computational biology applications. Let $m$ and $n$ be the lengths of $P$ and $T$, respectively, and let $k$ be the number of strings in $P$. We present a new algorithm achieving time $O(n\log k + m +α)$ and space $O(m + A)$, where $A$ is the sum of the lower bounds of the lengths of the gaps in $P$ and $α$ is the total number of occurrences of the strings in $P$ within $T$. Compared to the previous results this bound essentially achieves the best known time and space complexities simultaneously. Consequently, our algorithm obtains the best known bounds for almost all combinations of $m$, $n$, $k$, $A$, and $α$. Our algorithm is surprisingly simple and straightforward to implement. We also present algorithms for finding and encoding the positions of all strings in $P$ for every match of the pattern.

Hjalte Wedel Vildhøj

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Dynamic Relative Compression, Dynamic Partial Sums, and Substring Concatenation

Longest Common Extensions in Sublinear Space

A Suffix Tree Or Not A Suffix Tree?

Sublinear Space Algorithms for the Longest Common Substring Problem

Fingerprints in Compressed Strings

The Hardness of the Functional Orientation 2-Color Problem

Time-Space Trade-Offs for Longest Common Extensions

Sparse Suffix Tree Construction with Small Space

String Indexing for Patterns with Wildcards

String Matching with Variable Length Gaps