Source author record

Gad M. Landau

Gad M. Landau appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms

Catalog footprint

What is connected

8works

1topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2016arXiv

String Cadences

We say a string has a cadence if a certain character is repeated at regular intervals, possibly with intervening occurrences of that character. We call the cadence anchored if the first interval must be the same length as the others. We give a sub-quadratic algorithm for determining whether a string has any cadence consisting of at least three occurrences of a character, and a nearly linear algorithm for finding all anchored cadences.

preprint2015arXiv

Longest Common Extensions in Trees

The longest common extension (LCE) of two indices in a string is the length of the longest identical substrings starting at these two indices. The LCE problem asks to preprocess a string into a compact data structure that supports fast LCE queries. In this paper we generalize the LCE problem to trees and suggest a few applications of LCE in trees to tries and XML databases. Given a labeled and rooted tree $T$ of size $n$, the goal is to preprocess $T$ into a compact data structure that support the following LCE queries between subpaths and subtrees in $T$. Let $v_1$, $v_2$, $w_1$, and $w_2$ be nodes of $T$ such that $w_1$ and $w_2$ are descendants of $v_1$ and $v_2$ respectively. \begin{itemize} \item $\LCEPP(v_1, w_1, v_2, w_2)$: (path-path $\LCE$) return the longest common prefix of the paths $v_1 \leadsto w_1$ and $v_2 \leadsto w_2$. \item $\LCEPT(v_1, w_1, v_2)$: (path-tree $\LCE$) return maximal path-path LCE of the path $v_1 \leadsto w_1$ and any path from $v_2$ to a descendant leaf. \item $\LCETT(v_1, v_2)$: (tree-tree $\LCE$) return a maximal path-path LCE of any pair of paths from $v_1$ and $v_2$ to descendant leaves. \end{itemize} We present the first non-trivial bounds for supporting these queries. For $\LCEPP$ queries, we present a linear-space solution with $O(\log^{*} n)$ query time. For $\LCEPT$ queries, we present a linear-space solution with $O((\log\log n)^{2})$ query time, and complement this with a lower bound showing that any path-tree LCE structure of size $O(n \polylog(n))$ must necessarily use $Ω(\log\log n)$ time to answer queries. For $\LCETT$ queries, we present a time-space trade-off, that given any parameter $τ$, $1 \leq τ\leq n$, leads to an $O(nτ)$ space and $O(n/τ)$ query-time solution. This is complemented with a reduction to the the set intersection problem implying that a fast linear space solution is not likely to exist.

preprint2014arXiv

Binary Jumbled Pattern Matching on Trees and Tree-Like Structures

Binary jumbled pattern matching asks to preprocess a binary string $S$ in order to answer queries $(i,j)$ which ask for a substring of $S$ that is of length $i$ and has exactly $j$ 1-bits. This problem naturally generalizes to vertex-labeled trees and graphs by replacing "substring" with "connected subgraph". In this paper, we give an $O(n^2 / \log^2 n)$-time solution for trees, matching the currently best bound for (the simpler problem of) strings. We also give an $\Oh{g^{2 / 3} n^{4 / 3}/(\log n)^{4/3}}$-time solution for strings that are compressed by a grammar of size $g$. This solution improves the known bounds when the string is compressible under many popular compression schemes. Finally, we prove that the problem is fixed-parameter tractable with respect to the treewidth $w$ of the graph, thus improving the previous best $n^{O(w)}$ algorithm [ICALP'07].

preprint2014arXiv

Binary Jumbled Pattern Matching via All-Pairs Shortest Paths

In binary jumbled pattern matching we wish to preprocess a binary string $S$ in order to answer queries $(i,j)$ which ask for a substring of $S$ that is of size $i$ and has exactly $j$ 1-bits. The problem naturally generalizes to node-labeled trees and graphs by replacing "substring" with "connected subgraph". In this paper, we give an ${n^2}/{2^{Ω(\log n/\log \log n)^{1/2}}}$ time solution for both strings and trees. This odd-looking time complexity improves the state of the art $O(n^2/\log^2 n)$ solutions by more than any poly-logarithmic factor. It originates from the recent seminal algorithm of Williams for min-plus matrix multiplication. We obtain the result by giving a black box reduction from trees to strings. This is then combined with a reduction from strings to min-plus matrix multiplications.

preprint2014arXiv

Tree Compression with Top Trees

We introduce a new compression scheme for labeled trees based on top trees. Our compression scheme is the first to simultaneously take advantage of internal repeats in the tree (as opposed to the classical DAG compression that only exploits rooted subtree repeats) while also supporting fast navigational queries directly on the compressed representation. We show that the new compression scheme achieves close to optimal worst-case compression, can compress exponentially better than DAG compression, is never much worse than DAG compression, and supports navigational queries in logarithmic time.

preprint2013arXiv

Random Access to Grammar Compressed Strings

Grammar based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. In this paper, we present a novel grammar representation that allows efficient random access to any character or substring without decompressing the string. Let $S$ be a string of length $N$ compressed into a context-free grammar $\mathcal{S}$ of size $n$. We present two representations of $\mathcal{S}$ achieving $O(\log N)$ random access time, and either $O(n\cdot α_k(n))$ construction time and space on the pointer machine model, or $O(n)$ construction time and space on the RAM. Here, $α_k(n)$ is the inverse of the $k^{th}$ row of Ackermann's function. Our representations also efficiently support decompression of any substring in $S$: we can decompress any substring of length $m$ in the same complexity as a single random access query and additional $O(m)$ time. Combining these results with fast algorithms for uncompressed approximate string matching leads to several efficient algorithms for approximate string matching on grammar-compressed strings without decompression. For instance, we can find all approximate occurrences of a pattern $P$ with at most $k$ errors in time $O(n(\min\{|P|k, k^4 + |P|\} + \log N) + occ)$, where $occ$ is the number of occurrences of $P$ in $S$. Finally, we generalize our results to navigation and other operations on grammar-compressed ordered trees. All of the above bounds significantly improve the currently best known results. To achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data structure, two "biased" weighted ancestor data structures, and a compact representation of heavy paths in grammars.

preprint2012arXiv

On Approximating String Selection Problems with Outliers

Many problems in bioinformatics are about finding strings that approximately represent a collection of given strings. We look at more general problems where some input strings can be classified as outliers. The Close to Most Strings problem is, given a set S of same-length strings, and a parameter d, find a string x that maximizes the number of "non-outliers" within Hamming distance d of x. We prove this problem has no PTAS unless ZPP=NP, correcting a decade-old mistake. The Most Strings with Few Bad Columns problem is to find a maximum-size subset of input strings so that the number of non-identical positions is at most k; we show it has no PTAS unless P=NP. We also observe Closest to k Strings has no EPTAS unless W[1]=FPT. In sum, outliers help model problems associated with using biological data, but we show the problem of finding an approximate solution is computationally difficult.

preprint2010arXiv

Unified Compression-Based Acceleration of Edit-Distance Computation

The edit distance problem is a classical fundamental problem in computer science in general, and in combinatorial pattern matching in particular. The standard dynamic programming solution for this problem computes the edit-distance between a pair of strings of total length O(N) in O(N^2) time. To this date, this quadratic upper-bound has never been substantially improved for general strings. However, there are known techniques for breaking this bound in case the strings are known to compress well under a particular compression scheme. The basic idea is to first compress the strings, and then to compute the edit distance between the compressed strings. As it turns out, practically all known o(N^2) edit-distance algorithms work, in some sense, under the same paradigm described above. It is therefore natural to ask whether there is a single edit-distance algorithm that works for strings which are compressed under any compression scheme. A rephrasing of this question is to ask whether a single algorithm can exploit the compressibility properties of strings under any compression method, even if each string is compressed using a different compression. In this paper we set out to answer this question by using straight line programs. These provide a generic platform for representing many popular compression schemes including the LZ-family, Run-Length Encoding, Byte-Pair Encoding, and dictionary methods. For two strings of total length N having straight-line program representations of total size n, we present an algorithm running in O(nN log(N/n)) time for computing the edit-distance of these two strings under any rational scoring function, and an O(n^{2/3}N^{4/3}) time algorithm for arbitrary scoring functions. Our new result, while providing a signi cant speed up for highly compressible strings, does not surpass the quadratic time bound even in the worst case scenario.

Gad M. Landau

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

String Cadences

Longest Common Extensions in Trees

Binary Jumbled Pattern Matching on Trees and Tree-Like Structures

Binary Jumbled Pattern Matching via All-Pairs Shortest Paths

Tree Compression with Top Trees

Random Access to Grammar Compressed Strings

On Approximating String Selection Problems with Outliers

Unified Compression-Based Acceleration of Edit-Distance Computation