Source author record

Rajeev Raman

Rajeev Raman appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Information Theory Logic in Computer Science math.IT Discrete Mathematics

Catalog footprint

What is connected

12works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2015arXiv

SEPIA: Search for Proofs Using Inferred Automata

This paper describes SEPIA, a tool for automated proof generation in Coq. SEPIA combines model inference with interactive theorem proving. Existing proof corpora are modelled using state-based models inferred from tactic sequences. These can then be traversed automatically to identify proofs. The SEPIA system is described and its performance evaluated on three Coq datasets. Our results show that SEPIA provides a useful complement to existing automated tactics in Coq.

preprint2015arXiv

Tree Compression with Top Trees Revisited

We revisit tree compression with top trees (Bille et al, ICALP'13) and present several improvements to the compressor and its analysis. By significantly reducing the amount of information stored and guiding the compression step using a RePair-inspired heuristic, we obtain a fast compressor achieving good compression ratios, addressing an open problem posed by Bille et al. We show how, with relatively small overhead, the compressed file can be converted into an in-memory representation that supports basic navigation operations in worst-case logarithmic time without decompression. We also show a much improved worst-case bound on the size of the output of top-tree compression (answering an open question posed in a talk on this algorithm by Weimann in 2012).

preprint2014arXiv

Mining State-Based Models from Proof Corpora

Interactive theorem provers have been used extensively to reason about various software/hardware systems and mathematical theorems. The key challenge when using an interactive prover is finding a suitable sequence of proof steps that will lead to a successful proof requires a significant amount of human intervention. This paper presents an automated technique that takes as input examples of successful proofs and infers an Extended Finite State Machine as output. This can in turn be used to generate proofs of new conjectures. Our preliminary experiments show that the inferred models are generally accurate (contain few false-positive sequences) and that representing existing proofs in such a way can be very useful when guiding new ones.

preprint2014arXiv

On Succinct Representations of Binary Trees

We observe that a standard transformation between \emph{ordinal} trees (arbitrary rooted trees with ordered children) and binary trees leads to interesting succinct binary tree representations. There are four symmetric versions of these transformations. Via these transformations we get four succinct representations of $n$-node binary trees that use $2n + n/(\log n)^{O(1)}$ bits and support (among other operations) navigation, inorder numbering, one of pre- or post-order numbering, subtree size and lowest common ancestor (LCA) queries. The ability to support inorder numbering is crucial for the well-known range-minimum query (RMQ) problem on an array $A$ of $n$ ordered values. While this functionality, and more, is also supported in $O(1)$ time using $2n + o(n)$ bits by Davoodi et al.'s (\emph{Phil. Trans. Royal Soc. A} \textbf{372} (2014)) extension of a representation by Farzan and Munro (\emph{Algorithmica} \textbf{6} (2014)), their \emph{redundancy}, or the $o(n)$ term, is much larger, and their approach may not be suitable for practical implementations. One of these transformations is related to the Zaks' sequence (S.~Zaks, \emph{Theor. Comput. Sci.} \textbf{10} (1980)) for encoding binary trees, and we thus provide the first succinct binary tree representation based on Zaks' sequence. Another of these transformations is equivalent to Fischer and Heun's (\emph{SIAM J. Comput.} \textbf{40} (2011)) \minheap\ structure for this problem. Yet another variant allows an encoding of the Cartesian tree of $A$ to be constructed from $A$ using only $O(\sqrt{n} \log n)$ bits of working space.

preprint2013arXiv

Encoding Range Minimum Queries

We consider the problem of encoding range minimum queries (RMQs): given an array A[1..n] of distinct totally ordered values, to pre-process A and create a data structure that can answer the query RMQ(i,j), which returns the index containing the smallest element in A[i..j], without access to the array A at query time. We give a data structure whose space usage is 2n + o(n) bits, which is asymptotically optimal for worst-case data, and answers RMQs in O(1) worst-case time. This matches the previous result of Fischer and Heun [SICOMP, 2011], but is obtained in a more natural way. Furthermore, our result can encode the RMQs of a random array A in 1.919n + o(n) bits in expectation, which is not known to hold for Fischer and Heun's result. We then generalize our result to the encoding range top-2 query (RT2Q) problem, which is like the encoding RMQ problem except that the query RT2Q(i,j) returns the indices of both the smallest and second-smallest elements of A[i..j]. We introduce a data structure using 3.272n+o(n) bits that answers RT2Qs in constant time, and also give lower bounds on the effective entropy} of RT2Q.

preprint2013arXiv

Random Access to Grammar Compressed Strings

Grammar based compression, where one replaces a long string by a small context-free grammar that generates the string, is a simple and powerful paradigm that captures many popular compression schemes. In this paper, we present a novel grammar representation that allows efficient random access to any character or substring without decompressing the string. Let $S$ be a string of length $N$ compressed into a context-free grammar $\mathcal{S}$ of size $n$. We present two representations of $\mathcal{S}$ achieving $O(\log N)$ random access time, and either $O(n\cdot α_k(n))$ construction time and space on the pointer machine model, or $O(n)$ construction time and space on the RAM. Here, $α_k(n)$ is the inverse of the $k^{th}$ row of Ackermann's function. Our representations also efficiently support decompression of any substring in $S$: we can decompress any substring of length $m$ in the same complexity as a single random access query and additional $O(m)$ time. Combining these results with fast algorithms for uncompressed approximate string matching leads to several efficient algorithms for approximate string matching on grammar-compressed strings without decompression. For instance, we can find all approximate occurrences of a pattern $P$ with at most $k$ errors in time $O(n(\min\{|P|k, k^4 + |P|\} + \log N) + occ)$, where $occ$ is the number of occurrences of $P$ in $S$. Finally, we generalize our results to navigation and other operations on grammar-compressed ordered trees. All of the above bounds significantly improve the currently best known results. To achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data structure, two "biased" weighted ancestor data structures, and a compact representation of heavy paths in grammars.

preprint2012arXiv

Encoding 2-D Range Maximum Queries

We consider the \emph{two-dimensional range maximum query (2D-RMQ)} problem: given an array $A$ of ordered values, to pre-process it so that we can find the position of the smallest element in the sub-matrix defined by a (user-specified) range of rows and range of columns. We focus on determining the \emph{effective} entropy of 2D-RMQ, i.e., how many bits are needed to encode $A$ so that 2D-RMQ queries can be answered \emph{without} access to $A$. We give tight upper and lower bounds on the expected effective entropy for the case when $A$ contains independent identically-distributed random values, and new upper and lower bounds for arbitrary $A$, for the case when $A$ contains few rows. The latter results improve upon previous upper and lower bounds by Brodal et al. (ESA 2010). In some cases we also give data structures whose space usage is close to the effective entropy and answer 2D-RMQ queries rapidly.

preprint2012arXiv

Succinct Indices for Range Queries with applications to Orthogonal Range Maxima

We consider the problem of preprocessing $N$ points in 2D, each endowed with a priority, to answer the following queries: given a axis-parallel rectangle, determine the point with the largest priority in the rectangle. Using the ideas of the \emph{effective entropy} of range maxima queries and \emph{succinct indices} for range maxima queries, we obtain a structure that uses O(N) words and answers the above query in $O(\log N \log \log N)$ time. This is a direct improvement of Chazelle's result from FOCS 1985 for this problem -- Chazelle required $O(N/ε)$ words to answer queries in $O((\log N)^{1+ε})$ time for any constant $ε> 0$.

preprint2011arXiv

Optimal Indexes for Sparse Bit Vectors

We consider the problem of supporting Rank() and Select() operations on a bit vector of length m with n 1 bits. The problem is considered in the succinct index model, where the bit vector is stored in "read-only" memory and an additional data structure, called the index, is created during pre-processing to help answer the above queries. We give asymptotically optimal density-sensitive trade-offs, involving both m and n, that relate the size of the index to the number of accesses to the bit vector (and processing time) needed to answer the above queries. The results are particularly interesting for the case where n = o(m).

preprint2011arXiv

Succinct Representations of Permutations and Functions

We investigate the problem of succinctly representing an arbitrary permutation, π, on {0,...,n-1} so that π^k(i) can be computed quickly for any i and any (positive or negative) integer power k. A representation taking (1+ε) n lg n + O(1) bits suffices to compute arbitrary powers in constant time, for any positive constant ε<= 1. A representation taking the optimal \ceil{\lg n!} + o(n) bits can be used to compute arbitrary powers in O(lg n / lg lg n) time. We then consider the more general problem of succinctly representing an arbitrary function, f: [n] \rightarrow [n] so that f^k(i) can be computed quickly for any i and any integer power k. We give a representation that takes (1+ε) n lg n + O(1) bits, for any positive constant ε<= 1, and computes arbitrary positive powers in constant time. It can also be used to compute f^k(i), for any negative integer k, in optimal O(1+|f^k(i)|) time. We place emphasis on the redundancy, or the space beyond the information-theoretic lower bound that the data structure uses in order to support operations efficiently. A number of lower bounds have recently been shown on the redundancy of data structures. These lower bounds confirm the space-time optimality of some of our solutions. Furthermore, the redundancy of one of our structures "surpasses" a recent lower bound by Golynski [Golynski, SODA 2009], thus demonstrating the limitations of this lower bound.

preprint2010arXiv

Optimal Trade-Off for Succinct String Indexes

Let s be a string whose symbols are solely available through access(i), a read-only operation that probes s and returns the symbol at position i in s. Many compressed data structures for strings, trees, and graphs, require two kinds of queries on s: select(c, j), returning the position in s containing the jth occurrence of c, and rank(c, p), counting how many occurrences of c are found in the first p positions of s. We give matching upper and lower bounds for this problem, improving the lower bounds given by Golynski [Theor. Comput. Sci. 387 (2007)] [PhD thesis] and the upper bounds of Barbay et al. [SODA 2007]. We also present new results in another model, improving on Barbay et al. [SODA 2007] and matching a lower bound of Golynski [SODA 2009]. The main contribution of this paper is to introduce a general technique for proving lower bounds on succinct data structures, that is based on the access patterns of the supported operations, abstracting from the particular operations at hand. For this, it may find application to other interesting problems on succinct data structures.

preprint2007arXiv

Succinct Indexable Dictionaries with Applications to Encoding $k$-ary Trees, Prefix Sums and Multisets

We consider the {\it indexable dictionary} problem, which consists of storing a set $S \subseteq \{0,...,m-1\}$ for some integer $m$, while supporting the operations of $\Rank(x)$, which returns the number of elements in $S$ that are less than $x$ if $x \in S$, and -1 otherwise; and $\Select(i)$ which returns the $i$-th smallest element in $S$. We give a data structure that supports both operations in O(1) time on the RAM model and requires ${\cal B}(n,m) + o(n) + O(\lg \lg m)$ bits to store a set of size $n$, where ${\cal B}(n,m) = \ceil{\lg {m \choose n}}$ is the minimum number of bits required to store any $n$-element subset from a universe of size $m$. Previous dictionaries taking this space only supported (yes/no) membership queries in O(1) time. In the cell probe model we can remove the $O(\lg \lg m)$ additive term in the space bound, answering a question raised by Fich and Miltersen, and Pagh. We present extensions and applications of our indexable dictionary data structure, including: An information-theoretically optimal representation of a $k$-ary cardinal tree that supports standard operations in constant time, A representation of a multiset of size $n$ from $\{0,...,m-1\}$ in ${\cal B}(n,m+n) + o(n)$ bits that supports (appropriate generalizations of) $\Rank$ and $\Select$ operations in constant time, and A representation of a sequence of $n$ non-negative integers summing up to $m$ in ${\cal B}(n,m+n) + o(n)$ bits that supports prefix sum queries in constant time.

Rajeev Raman

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

SEPIA: Search for Proofs Using Inferred Automata

Tree Compression with Top Trees Revisited

Mining State-Based Models from Proof Corpora

On Succinct Representations of Binary Trees

Encoding Range Minimum Queries

Random Access to Grammar Compressed Strings

Encoding 2-D Range Maximum Queries

Succinct Indices for Range Queries with applications to Orthogonal Range Maxima

Optimal Indexes for Sparse Bit Vectors

Succinct Representations of Permutations and Functions

Optimal Trade-Off for Succinct String Indexes

Succinct Indexable Dictionaries with Applications to Encoding $k$-ary Trees, Prefix Sums and Multisets