Source author record

Marcin Raniszewski

Marcin Raniszewski appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms

Catalog footprint

What is connected

6works

1topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2016arXiv

Rank and select: Another lesson learned

Rank and select queries on bitmaps are essential building bricks of many compressed data structures, including text indexes, membership and range supporting spatial data structures, compressed graphs, and more. Theoretically considered yet in 1980s, these primitives have also been a subject of vivid research concerning their practical incarnations in the last decade. We present a few novel rank/select variants, focusing mostly on speed, obtaining competitive space-time results in the compressed setting. Our findings can be summarized as follows: $(i)$ no single rank/select solution works best on any kind of data (ours are optimized for concatenated bit arrays obtained from wavelet trees for real text datasets), $(ii)$ it pays to efficiently handle blocks consisting of all 0 or all 1 bits, $(iii)$ compressed select does not have to be significantly slower than compressed rank at a comparable memory use.

preprint2016arXiv

Suffix arrays with a twist

The suffix array is a classic full-text index, combining effectiveness with simplicity. We discuss three approaches aiming to improve its efficiency even more: changes to the navigation, data layout and adding extra data. In short, we show that $(i)$ how we search for the right interval boundary impacts significantly the overall search speed, $(ii)$ a B-tree data layout easily wins over the standard one, $(iii)$ the well-known idea of a lookup table for the prefixes of the suffixes can be refined with using compression, $(iv)$ caching prefixes of the suffixes in a helper array can pose a(nother) practical space-time tradeoff.

preprint2016arXiv

Two simple full-text indexes based on the suffix array

We propose two suffix array inspired full-text indexes. One, called SA-hash, augments the suffix array with a hash table to speed up pattern searches due to significantly narrowed search interval before the binary search phase. The other, called FBCSA, is a compact data structure, similar to M{ä}kinen's compact suffix array, but working on fixed sized blocks. Experiments on the Pizza~\&~Chili 200\,MB datasets show that SA-hash is about 2--3 times faster in pattern searches (counts) than the standard suffix array, for the price of requiring $0.2n-1.1n$ bytes of extra space, where $n$ is the text length, and setting a minimum pattern length. FBCSA is relatively fast in single cell accesses (a few times faster than related indexes at about the same or better compression), but not competitive if many consecutive cells are to be extracted. Still, for the task of extracting, e.g., 10 successive cells its time-space relation remains attractive.

preprint2015arXiv

A Bloom filter based semi-index on $q$-grams

We present a simple $q$-gram based semi-index, which allows to look for a pattern typically only in a small fraction of text blocks. Several space-time tradeoffs are presented. Experiments on Pizza & Chili datasets show that our solution is up to three orders of magnitude faster than the Claude et al. \cite{CNPSTjda10} semi-index at a comparable space usage.

preprint2015arXiv

FM-index for dummies

The FM-index is a celebrated compressed data structure for full-text pattern searching. After the first wave of interest in its theoretical developments, we can observe a surge of interest in practical FM-index variants in the last few years. These enhancements are often related to a bit-vector representation, augmented with an efficient rank-handling data structure. In this work, we propose a new, cache-friendly, implementation of the rank primitive and advocate for a very simple architecture of the FM-index, which trades compression ratio for speed. Experimental results show that our variants are 2--3 times faster than the fastest known ones, for the price of using typically 1.5--5 times more space.

preprint2014arXiv

Sampling the suffix array with minimizers

Sampling (evenly) the suffixes from the suffix array is an old idea trading the pattern search time for reduced index space. A few years ago Claude et al. showed an alphabet sampling scheme allowing for more efficient pattern searches compared to the sparse suffix array, for long enough patterns. A drawback of their approach is the requirement that sought patterns need to contain at least one character from the chosen subalphabet. In this work we propose an alternative suffix sampling approach with only a minimum pattern length as a requirement, which seems more convenient in practice. Experiments show that our algorithm achieves competitive time-space tradeoffs on most standard benchmark data.

Marcin Raniszewski

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Rank and select: Another lesson learned

Suffix arrays with a twist

Two simple full-text indexes based on the suffix array

A Bloom filter based semi-index on $q$-grams

FM-index for dummies

Sampling the suffix array with minimizers