Source author record

J. Ian Munro

J. Ian Munro appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Computational Geometry Databases Discrete Mathematics

Catalog footprint

What is connected

19works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

On Huang and Wong's Algorithm for Generalized Binary Split Trees

Huang and Wong [1984] proposed a polynomial-time dynamic-programming algorithm for computing optimal generalized binary split trees. We show that their algorithm is incorrect. Thus, it remains open whether such trees can be computed in polynomial time. Spuler [1994] proposed modifying Huang and Wong's algorithm to obtain an algorithm for a different problem: computing optimal two-way-comparison search trees. We show that the dynamic program underlying Spuler's algorithm is not valid, in that it does not satisfy the necessary optimal-substructure property and its proposed recurrence relation is incorrect. It remains unknown whether the algorithm is guaranteed to compute a correct overall solution.

preprint2021arXiv

A Simple Algorithm for Optimal Search Trees with Two-Way Comparisons

We present a simple $O(n^4)$-time algorithm for computing optimal search trees with two-way comparisons. The only previous solution to this problem, by Anderson et al., has the same running time, but is significantly more complicated and is restricted to the variant where only successful queries are allowed. Our algorithm extends directly to solve the standard full variant of the problem, which also allows unsuccessful queries and for which no polynomial-time algorithm was previously known. The correctness proof of our algorithm relies on a new structural theorem for two-way-comparison search trees.

preprint2021arXiv

On the Cost of Unsuccessful Searches in Search Trees with Two-way Comparisons

Search trees are commonly used to implement access operations to a set of stored keys. If this set is static and the probabilities of membership queries are known in advance, then one can precompute an optimal search tree, namely one that minimizes the expected access cost. For a non-key query, a search tree can determine its approximate location by returning the inter-key interval containing the query. This is in contrast to other dictionary data structures, like hash tables, that only report a failed search. We address the question "what is the additional cost of determining approximate locations for non-key queries"? We prove that for two-way comparison trees this additional cost is at most 1. Our proof is based on a novel probabilistic argument that involves converting a search tree that does not identify non-key queries into a random tree that does.

preprint2021arXiv

Optimal Search Trees with 2-Way Comparisons

In 1971, Knuth gave an $O(n^2)$-time algorithm for the classic problem of finding an optimal binary search tree. Knuth's algorithm works only for search trees based on 3-way comparisons, while most modern computers support only 2-way comparisons (e.g., $<, \le, =, \ge$, and $>$). Until this paper, the problem of finding an optimal search tree using 2-way comparisons remained open -- poly-time algorithms were known only for restricted variants. We solve the general case, giving (i) an $O(n^4)$-time algorithm and (ii) an $O(n \log n)$-time additive-3 approximation algorithm. Also, for finding optimal binary split trees, we (iii) obtain a linear speedup and (iv) prove some previous work incorrect.

preprint2020arXiv

Space-Efficient Data Structures for Lattices

A lattice is a partially-ordered set in which every pair of elements has a unique meet (greatest lower bound) and join (least upper bound). We present new data structures for lattices that are simple, efficient, and nearly optimal in terms of space complexity. Our first data structure can answer partial order queries in constant time and find the meet or join of two elements in $O(n^{3/4})$ time, where $n$ is the number of elements in the lattice. It occupies $O(n^{3/2}\log n)$ bits of space, which is only a $Θ(\log n)$ factor from the $Θ(n^{3/2})$-bit lower bound for storing lattices. The preprocessing time is $O(n^2)$. This structure admits a simple space-time tradeoff so that, for any $c \in [\frac{1}{2}, 1]$, the data structure supports meet and join queries in $O(n^{1-c/2})$ time, occupies $O(n^{1+c}\log n)$ bits of space, and can be constructed in $O(n^2 + n^{1+3c/2})$ time. Our second data structure uses $O(n^{3/2}\log n)$ bits of space and supports meet and join in $O(d \frac{\log n}{\log d})$ time, where $d$ is the maximum degree of any element in the transitive reduction graph of the lattice. This structure is much faster for lattices with low-degree elements. This paper also identifies an error in a long-standing solution to the problem of representing lattices. We discuss the issue with this previous work.

preprint2016arXiv

Range Majorities and Minorities in Arrays

Karpinski and Nekrich (2008) introduced the problem of parameterized range majority, which asks us to preprocess a string of length $n$ such that, given the endpoints of a range, one can quickly find all the distinct elements whose relative frequencies in that range are more than a threshold $τ$. Subsequent authors have reduced their time and space bounds such that, when $τ$ is fixed at preprocessing time, we need either $O(n \log (1 / τ))$ space and optimal $O(1 / τ)$ query time or linear space and $O((1 / τ) \log \log σ)$ query time, where $σ$ is the alphabet size. In this paper we give the first linear-space solution with optimal $O(1 / τ)$ query time, even with variable $τ$ (i.e., specified with the query). For the case when $σ$ is polynomial on the computer word size, our space is optimally compressed according to the symbol frequencies in the string. Otherwise, either the compressed space is increased by an arbitrarily small constant factor or the time rises to any function in $(1/τ)\cdotω(1)$. We obtain the same results on the complementary problem of parameterized range minority introduced by Chan et al. (2015), who had achieved linear space and $O(1 / τ)$ query time with variable $τ$.

preprint2016arXiv

Space-Efficient Construction of Compressed Indexes in Deterministic Linear Time

We show that the compressed suffix array and the compressed suffix tree of a string $T$ can be built in $O(n)$ deterministic time using $O(n\logσ)$ bits of space, where $n$ is the string length and $σ$ is the alphabet size. Previously described deterministic algorithms either run in time that depends on the alphabet size or need $ω(n\log σ)$ bits of working space. Our result has immediate applications to other problems, such as yielding the first linear-time LZ77 and LZ78 parsing algorithms that use $O(n \logσ)$ bits.

preprint2015arXiv

Compressed Data Structures for Dynamic Sequences

We consider the problem of storing a dynamic string $S$ over an alphabet $Σ=\{\,1,\ldots,σ\,\}$ in compressed form. Our representation supports insertions and deletions of symbols and answers three fundamental queries: $\mathrm{access}(i,S)$ returns the $i$-th symbol in $S$, $\mathrm{rank}_a(i,S)$ counts how many times a symbol $a$ occurs among the first $i$ positions in $S$, and $\mathrm{select}_a(i,S)$ finds the position where a symbol $a$ occurs for the $i$-th time. We present the first fully-dynamic data structure for arbitrarily large alphabets that achieves optimal query times for all three operations and supports updates with worst-case time guarantees. Ours is also the first fully-dynamic data structure that needs only $nH_k+o(n\logσ)$ bits, where $H_k$ is the $k$-th order entropy and $n$ is the string length. Moreover our representation supports extraction of a substring $S[i..i+\ell]$ in optimal $O(\log n/\log\log n + \ell/\log_σn)$ time.

preprint2015arXiv

Dynamic Data Structures for Document Collections and Graphs

In the dynamic indexing problem, we must maintain a changing collection of text documents so that we can efficiently support insertions, deletions, and pattern matching queries. We are especially interested in developing efficient data structures that store and query the documents in compressed form. All previous compressed solutions to this problem rely on answering rank and select queries on a dynamic sequence of symbols. Because of the lower bound in [Fredman and Saks, 1989], answering rank queries presents a bottleneck in compressed dynamic indexing. In this paper we show how this lower bound can be circumvented using our new framework. We demonstrate that the gap between static and dynamic variants of the indexing problem can be almost closed. Our method is based on a novel framework for adding dynamism to static compressed data structures. Our framework also applies more generally to dynamizing other problems. We show, for example, how our framework can be applied to develop compressed representations of dynamic graphs and binary relations.

preprint2013arXiv

Dynamic Range Selection in Linear Space

Given a set $S$ of $n$ points in the plane, we consider the problem of answering range selection queries on $S$: that is, given an arbitrary $x$-range $Q$ and an integer $k > 0$, return the $k$-th smallest $y$-coordinate from the set of points that have $x$-coordinates in $Q$. We present a linear space data structure that maintains a dynamic set of $n$ points in the plane with real coordinates, and supports range selection queries in $O((\lg n / \lg \lg n)^2)$ time, as well as insertions and deletions in $O((\lg n / \lg \lg n)^2)$ amortized time. The space usage of this data structure is an $Θ(\lg n / \lg \lg n)$ factor improvement over the previous best result, while maintaining asymptotically matching query and update times. We also present a succinct data structure that supports range selection queries on a dynamic array of $n$ values drawn from a bounded universe.

preprint2013arXiv

Sorting under Partial Information (without the Ellipsoid Algorithm)

We revisit the well-known problem of sorting under partial information: sort a finite set given the outcomes of comparisons between some pairs of elements. The input is a partially ordered set P, and solving the problem amounts to discovering an unknown linear extension of P, using pairwise comparisons. The information-theoretic lower bound on the number of comparisons needed in the worst case is log e(P), the binary logarithm of the number of linear extensions of P. In a breakthrough paper, Jeff Kahn and Jeong Han Kim (J. Comput. System Sci. 51 (3), 390-399, 1995) showed that there exists a polynomial-time algorithm for the problem achieving this bound up to a constant factor. Their algorithm invokes the ellipsoid algorithm at each iteration for determining the next comparison, making it impractical. We develop efficient algorithms for sorting under partial information. Like Kahn and Kim, our approach relies on graph entropy. However, our algorithms differ in essential ways from theirs. Rather than resorting to convex programming for computing the entropy, we approximate the entropy, or make sure it is computed only once, in a restricted class of graphs, permitting the use of a simpler algorithm. Specifically, we present: - an O(n^2) algorithm performing O(log n log e(P)) comparisons; - an O(n^2.5) algorithm performing at most (1+ epsilon) log e(P) + O_epsilon (n) comparisons; - an O(n^2.5) algorithm performing O(log e(P)) comparisons. All our algorithms can be implemented in such a way that their computational bottleneck is confined in a preprocessing phase, while the sorting phase is completed in O(q) + O(n) time, where q denotes the number of comparisons performed.

preprint2013arXiv

Succinct data structures for representing equivalence classes

Given a partition of an n element set into equivalence classes, we consider time-space tradeoffs for representing it to support the query that asks whether two given elements are in the same equivalence class. This has various applications including for testing whether two vertices are in the same component in an undirected graph or in the same strongly connected component in a directed graph. We consider the problem in several models. -- Concerning labeling schemes where we assign labels to elements and the query is to be answered just by examining the labels of the queried elements (without any extra space): if each vertex is required to have a unique label, then we show that a label space of (\sum_{i=1}^n \lfloor {n \over i} \rfloor) is necessary and sufficient. In other words, \lg n + \lg \lg n + O(1) bits of space are necessary and sufficient for representing each of the labels. This slightly strengthens the known lower bound and is in contrast to the known necessary and sufficient bound of \lceil \lg n \rceil for the label length, if each vertex need not get a unique label. --Concerning succinct data structures for the problem when the n elements are to be uniquely assigned labels from label set {1, 2, ...n}, we first show that Θ(\sqrt n) bits are necessary and sufficient to represent the equivalence class information. This space includes the space for implicitly encoding the vertex labels. We can support the query in such a structure in O(\lg n) time in the standard word RAM model. We then develop structures resulting in one where the queries can be supported in constant time using O({\sqrt n} \lg n) bits of space. We also develop space efficient structures where union operation along with the equivalence query can be answered fast.

preprint2012arXiv

Dynamic Range Majority Data Structures

Given a set $P$ of coloured points on the real line, we study the problem of answering range $α$-majority (or "heavy hitter") queries on $P$. More specifically, for a query range $Q$, we want to return each colour that is assigned to more than an $α$-fraction of the points contained in $Q$. We present a new data structure for answering range $α$-majority queries on a dynamic set of points, where $α\in (0,1)$. Our data structure uses O(n) space, supports queries in $O((\lg n) / α)$ time, and updates in $O((\lg n) / α)$ amortized time. If the coordinates of the points are integers, then the query time can be improved to $O(\lg n / (α\lg \lg n) + (\lg(1/α))/α))$. For constant values of $α$, this improved query time matches an existing lower bound, for any data structure with polylogarithmic update time. We also generalize our data structure to handle sets of points in d-dimensions, for $d \ge 2$, as well as dynamic arrays, in which each entry is a colour.

preprint2012arXiv

Succinct Indices for Range Queries with applications to Orthogonal Range Maxima

We consider the problem of preprocessing $N$ points in 2D, each endowed with a priority, to answer the following queries: given a axis-parallel rectangle, determine the point with the largest priority in the rectangle. Using the ideas of the \emph{effective entropy} of range maxima queries and \emph{succinct indices} for range maxima queries, we obtain a structure that uses O(N) words and answers the above query in $O(\log N \log \log N)$ time. This is a direct improvement of Chazelle's result from FOCS 1985 for this problem -- Chazelle required $O(N/ε)$ words to answer queries in $O((\log N)^{1+ε})$ time for any constant $ε> 0$.

preprint2012arXiv

Succinct Posets

We describe an algorithm for compressing a partially ordered set, or \emph{poset}, so that it occupies space matching the information theory lower bound (to within lower order terms), in the worst case. Using this algorithm, we design a succinct data structure for representing a poset that, given two elements, can report whether one precedes the other in constant time. This is equivalent to succinctly representing the transitive closure graph of the poset, and we note that the same method can also be used to succinctly represent the transitive reduction graph. For an $n$ element poset, the data structure occupies $n^2/4 + o(n^2)$ bits, in the worst case, which is roughly half the space occupied by an upper triangular matrix. Furthermore, a slight extension to this data structure yields a succinct oracle for reachability in arbitrary directed graphs. Thus, using roughly a quarter of the space required to represent an arbitrary directed graph, reachability queries can be supported in constant time.

preprint2011arXiv

Succinct Representations of Permutations and Functions

We investigate the problem of succinctly representing an arbitrary permutation, π, on {0,...,n-1} so that π^k(i) can be computed quickly for any i and any (positive or negative) integer power k. A representation taking (1+ε) n lg n + O(1) bits suffices to compute arbitrary powers in constant time, for any positive constant ε<= 1. A representation taking the optimal \ceil{\lg n!} + o(n) bits can be used to compute arbitrary powers in O(lg n / lg lg n) time. We then consider the more general problem of succinctly representing an arbitrary function, f: [n] \rightarrow [n] so that f^k(i) can be computed quickly for any i and any integer power k. We give a representation that takes (1+ε) n lg n + O(1) bits, for any positive constant ε<= 1, and computes arbitrary positive powers in constant time. It can also be used to compute f^k(i), for any negative integer k, in optimal O(1+|f^k(i)|) time. We place emphasis on the redundancy, or the space beyond the information-theoretic lower bound that the data structure uses in order to support operations efficiently. A number of lower bounds have recently been shown on the redundancy of data structures. These lower bounds confirm the space-time optimality of some of our solutions. Furthermore, the redundancy of one of our structures "surpasses" a recent lower bound by Golynski [Golynski, SODA 2009], thus demonstrating the limitations of this lower bound.

preprint2010arXiv

Range Reporting for Moving Points on a Grid

In this paper we describe a new data structure that supports orthogonal range reporting queries on a set of points that move along linear trajectories on a $U\times U$ grid. The assumption that points lie on a $U\times U$ grid enables us to significantly decrease the query time in comparison to the standard kinetic model. Our data structure answers queries in $O(\sqrt{\log U/\log \log U}+k)$ time, where $k$ denotes the number of points in the answer. The above improves over the $Ω(\log n)$ lower bound that is valid in the infinite-precision kinetic model. The methods used in this paper could be also of independent interest.

preprint2010arXiv

Succinct Representations of Dynamic Strings

The rank and select operations over a string of length n from an alphabet of size $σ$ have been used widely in the design of succinct data structures. In many applications, the string itself need be maintained dynamically, allowing characters of the string to be inserted and deleted. Under the word RAM model with word size $w=Ω(\lg n)$, we design a succinct representation of dynamic strings using $nH_0 + o(n)\lgσ+ O(w)$ bits to support rank, select, insert and delete in $O(\frac{\lg n}{\lg\lg n}(\frac{\lg σ}{\lg\lg n}+1))$ time. When the alphabet size is small, i.e. when $σ= O(\polylog (n))$, including the case in which the string is a bit vector, these operations are supported in $O(\frac{\lg n}{\lg\lg n})$ time. Our data structures are more efficient than previous results on the same problem, and we have applied them to improve results on the design and construction of space-efficient text indexes.

preprint2009arXiv

An Efficient Algorithm for Partial Order Production

We consider the problem of partial order production: arrange the elements of an unknown totally ordered set T into a target partially ordered set S, by comparing a minimum number of pairs in T. Special cases include sorting by comparisons, selection, multiple selection, and heap construction. We give an algorithm performing ITLB + o(ITLB) + O(n) comparisons in the worst case. Here, n denotes the size of the ground sets, and ITLB denotes a natural information-theoretic lower bound on the number of comparisons needed to produce the target partial order. Our approach is to replace the target partial order by a weak order (that is, a partial order with a layered structure) extending it, without increasing the information theoretic lower bound too much. We then solve the problem by applying an efficient multiple selection algorithm. The overall complexity of our algorithm is polynomial. This answers a question of Yao (SIAM J. Comput. 18, 1989). We base our analysis on the entropy of the target partial order, a quantity that can be efficiently computed and provides a good estimate of the information-theoretic lower bound.

J. Ian Munro

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

On Huang and Wong's Algorithm for Generalized Binary Split Trees

A Simple Algorithm for Optimal Search Trees with Two-Way Comparisons

On the Cost of Unsuccessful Searches in Search Trees with Two-way Comparisons

Optimal Search Trees with 2-Way Comparisons

Space-Efficient Data Structures for Lattices

Range Majorities and Minorities in Arrays

Space-Efficient Construction of Compressed Indexes in Deterministic Linear Time

Compressed Data Structures for Dynamic Sequences

Dynamic Data Structures for Document Collections and Graphs

Dynamic Range Selection in Linear Space

Sorting under Partial Information (without the Ellipsoid Algorithm)

Succinct data structures for representing equivalence classes

Dynamic Range Majority Data Structures

Succinct Indices for Range Queries with applications to Orthogonal Range Maxima

Succinct Posets

Succinct Representations of Permutations and Functions

Range Reporting for Moving Points on a Grid

Succinct Representations of Dynamic Strings

An Efficient Algorithm for Partial Order Production