Researcher profile

Hjalte Wedel Vildhøj

Hjalte Wedel Vildhøj contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2014arXiv

Sublinear Space Algorithms for the Longest Common Substring Problem

Given $m$ documents of total length $n$, we consider the problem of finding a longest string common to at least $d \geq 2$ of the documents. This problem is known as the \emph{longest common substring (LCS) problem} and has a classic $O(n)$ space and $O(n)$ time solution (Weiner [FOCS'73], Hui [CPM'92]). However, the use of linear space is impractical in many applications. In this paper we show that for any trade-off parameter $1 \leq τ\leq n$, the LCS problem can be solved in $O(τ)$ space and $O(n^2/τ)$ time, thus providing the first smooth deterministic time-space trade-off from constant to linear space. The result uses a new and very simple algorithm, which computes a $τ$-additive approximation to the LCS in $O(n^2/τ)$ time and $O(1)$ space. We also show a time-space trade-off lower bound for deterministic branching programs, which implies that any deterministic RAM algorithm solving the LCS problem on documents from a sufficiently large alphabet in $O(τ)$ space must use $Ω(n\sqrt{\log(n/(τ\log n))/\log\log(n/(τ\log n)})$ time.

preprint2013arXiv

Fingerprints in Compressed Strings

The Karp-Rabin fingerprint of a string is a type of hash value that due to its strong properties has been used in many string algorithms. In this paper we show how to construct a data structure for a string $S$ of size $N$ compressed by a context-free grammar of size $n$ that answers fingerprint queries. That is, given indices $i$ and $j$, the answer to a query is the fingerprint of the substring $S[i,j]$. We present the first O(n) space data structures that answer fingerprint queries without decompressing any characters. For Straight Line Programs (SLP) we get $O(\log N)$ query time, and for Linear SLPs (an SLP derivative that captures LZ78 compression and its variations) we get $O(\log \log N)$ query time. Hence, our data structures has the same time and space complexity as for random access in SLPs. We utilize the fingerprint data structures to solve the longest common extension problem in query time $O(\log N \log \lce)$ and $O(\log \lce \log\log \lce + \log\log N)$ for SLPs and Linear SLPs, respectively. Here, $\lce$ denotes the length of the LCE.

preprint2013arXiv

The Hardness of the Functional Orientation 2-Color Problem

We consider the Functional Orientation 2-Color problem, which was introduced by Valiant in his seminal paper on holographic algorithms [SIAM J. Comput., 37(5), 2008]. For this decision problem, Valiant gave a polynomial time holographic algorithm for planar graphs of maximum degree 3, and showed that the problem is NP-complete for planar graphs of maximum degree 10. A recent result on defective graph coloring by Corrêa et al. [Australas. J. Combin., 43, 2009] implies that the problem is already hard for planar graphs of maximum degree 8. Together, these results leave open the hardness question for graphs of maximum degree between 4 and 7. We close this gap by showing that the answer is always yes for arbitrary graphs of maximum degree 5, and that the problem is NP-complete for planar graphs of maximum degree 6. Moreover, for graphs of maximum degree 5, we note that a linear time algorithm for finding a solution exists.

preprint2013arXiv

Time-Space Trade-Offs for Longest Common Extensions

We revisit the longest common extension (LCE) problem, that is, preprocess a string $T$ into a compact data structure that supports fast LCE queries. An LCE query takes a pair $(i,j)$ of indices in $T$ and returns the length of the longest common prefix of the suffixes of $T$ starting at positions $i$ and $j$. We study the time-space trade-offs for the problem, that is, the space used for the data structure vs. the worst-case time for answering an LCE query. Let $n$ be the length of $T$. Given a parameter $τ$, $1 \leq τ\leq n$, we show how to achieve either $O(\infrac{n}{\sqrtτ})$ space and $O(τ)$ query time, or $O(\infrac{n}τ)$ space and $O(τ\log({|\LCE(i,j)|}/τ))$ query time, where $|\LCE(i,j)|$ denotes the length of the LCE returned by the query. These bounds provide the first smooth trade-offs for the LCE problem and almost match the previously known bounds at the extremes when $τ=1$ or $τ=n$. We apply the result to obtain improved bounds for several applications where the LCE problem is the computational bottleneck, including approximate string matching and computing palindromes. We also present an efficient technique to reduce LCE queries on two strings to one string. Finally, we give a lower bound on the time-space product for LCE data structures in the non-uniform cell probe model showing that our second trade-off is nearly optimal.

preprint2012arXiv

Sparse Suffix Tree Construction with Small Space

We consider the problem of constructing a sparse suffix tree (or suffix array) for $b$ suffixes of a given text $T$ of size $n$, using only $O(b)$ words of space during construction time. Breaking the naive bound of $Ω(nb)$ time for this problem has occupied many algorithmic researchers since a different structure, the (evenly spaced) sparse suffix tree, was introduced by K{ä}rkk{ä}inen and Ukkonen in 1996. While in the evenly spaced sparse suffix tree the suffixes considered must be evenly spaced in $T$, here there is no constraint on the locations of the suffixes. We show that the sparse suffix tree can be constructed in $O(n\log^2b)$ time. To achieve this we develop a technique, which may be of independent interest, that allows to efficiently answer $b$ longest common prefix queries on suffixes of $T$, using only $O(b)$ space. We expect that this technique will prove useful in many other applications in which space usage is a concern. Furthermore, additional tradeoffs between the space usage and the construction time are given.

preprint2012arXiv

String Indexing for Patterns with Wildcards

We consider the problem of indexing a string $t$ of length $n$ to report the occurrences of a query pattern $p$ containing $m$ characters and $j$ wildcards. Let $occ$ be the number of occurrences of $p$ in $t$, and $σ$ the size of the alphabet. We obtain the following results. - A linear space index with query time $O(m+σ^j \log \log n + occ)$. This significantly improves the previously best known linear space index by Lam et al. [ISAAC 2007], which requires query time $Θ(jn)$ in the worst case. - An index with query time $O(m+j+occ)$ using space $O(σ^{k^2} n \log^k \log n)$, where $k$ is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time. - A time-space trade-off, generalizing the index by Cole et al. [STOC 2004]. We also show that these indexes can be generalized to allow variable length gaps in the pattern. Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest.

preprint2011arXiv

String Matching with Variable Length Gaps

We consider string matching with variable length gaps. Given a string $T$ and a pattern $P$ consisting of strings separated by variable length gaps (arbitrary strings of length in a specified range), the problem is to find all ending positions of substrings in $T$ that match $P$. This problem is a basic primitive in computational biology applications. Let $m$ and $n$ be the lengths of $P$ and $T$, respectively, and let $k$ be the number of strings in $P$. We present a new algorithm achieving time $O(n\log k + m +α)$ and space $O(m + A)$, where $A$ is the sum of the lower bounds of the lengths of the gaps in $P$ and $α$ is the total number of occurrences of the strings in $P$ within $T$. Compared to the previous results this bound essentially achieves the best known time and space complexities simultaneously. Consequently, our algorithm obtains the best known bounds for almost all combinations of $m$, $n$, $k$, $A$, and $α$. Our algorithm is surprisingly simple and straightforward to implement. We also present algorithms for finding and encoding the positions of all strings in $P$ for every match of the pattern.