Paper detail

Hash in a Flash: Hash Tables for Solid State Devices

In recent years, information retrieval algorithms have taken center stage for extracting important data in ever larger datasets. Advances in hardware technology have lead to the increasingly wide spread use of flash storage devices. Such devices have clear benefits over traditional hard drives in terms of latency of access, bandwidth and random access capabilities particularly when reading data. There are however some interesting trade-offs to consider when leveraging the advanced features of such devices. On a relative scale writing to such devices can be expensive. This is because typical flash devices (NAND technology) are updated in blocks. A minor update to a given block requires the entire block to be erased, followed by a re-writing of the block. On the other hand, sequential writes can be two orders of magnitude faster than random writes. In addition, random writes are degrading to the life of the flash drive, since each block can support only a limited number of erasures. TF-IDF can be implemented using a counting hash table. In general, hash tables are a particularly challenging case for the flash drive because this data structure is inherently dependent upon the randomness of the hash function, as opposed to the spatial locality of the data. This makes it difficult to avoid the random writes incurred during the construction of the counting hash table for TF-IDF. In this paper, we will study the design landscape for the development of a hash table for flash storage devices. We demonstrate how to effectively design a hash table with two related hash functions, one of which exhibits a data placement property with respect to the other. Specifically, we focus on three designs based on this general philosophy and evaluate the trade-offs among them along the axes of query performance, insert and update times and I/O time through an implementation of the TF-IDF algorithm.

preprint2012arXivOpen access

Signal facts

What is known right now

Open access5 authors3 topics

Next steps

Decide what to do with this paper

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Log in to curate

Reading frame

Keep the important context close to the paper

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Institutions

Add specific reaction

Move through the context

Research map

Open full explorer

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Structured reviews

0 review(s)

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

Work discussion

0 comment(s)

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

No discussion yet. The first strong comment sets the tone.