Source author record

Yoav Etsion

Yoav Etsion appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Databases

Catalog footprint

What is connected

3works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Semantic prefetching using forecast slices

Modern prefetchers identify memory access patterns in order to predict future accesses. However, many applications exhibit irregular access patterns that do not manifest spatio-temporal locality in the memory address space. Such applications usually do not fall under the scope of existing prefetching techniques, which observe only the stream of addresses dispatched by the memory unit but not the code flows that produce them. Similarly, temporal correlation prefetchers detect recurring relations between accesses, but do not track the chain of causality in program code that manifested the memory locality. Conversely, techniques that are code-aware are limited to the basic program functionality and are bounded by the machine depth. In this paper we show that contextual analysis of the code flows that generate memory accesses can detect recurring code patterns and expose their underlying semantics even for irregular access patterns. Moreover, program locality artifacts can be used to enhance the memory traversal code and predict future accesses. We present the semantic prefetcher that analyzes programs at run-time and learns their memory dependency chains and address calculation flows. The prefetcher then constructs forecast slices and injects them at key points to trigger timely prefetching of future contextually-related iterations. We show how this approach takes the best of both worlds, augmenting code injection with forecast functionality and relying on context-based temporal correlation of code slices. This combination allows us to overcome critical memory latencies that are currently not covered by any other prefetcher. Our evaluation of the semantic prefetcher using an industrial-grade, cycle-accurate x86 simulator shows that it improves performance by 24% on average over SPEC 2006 (outliers up to 3.7x), and 16% on average over SPEC 2017 (outliers up to 1.85x), using only ~6KB.

preprint2018arXiv

A neural network memory prefetcher using semantic locality

Accurate memory prefetching is paramount for processor performance, and modern processors employ various techniques to identify and prefetch different memory access patterns. While most modern prefetchers target spatio-temporal patterns by matching memory addresses that are accessed in close proximity (either in space or time), the recently proposed concept of semantic locality views locality as an artifact of the algorithmic level and searches for correlations between memory accesses and program state. While this approach was shown to be effective, capturing semantic locality requires significant associative learning capabilities. In this paper we utilize neural networks for this task. We leverage recent advances in machine learning to propose a neural network prefetcher. We show that by observing program context, this prefetcher can learn distinct memory access patterns that cannot be covered by other state-of-the-art prefetchers. We evaluate the neural network prefetcher over SPEC2006, Graph500, and several microbenchmarks. We show that the prefetcher can deliver an average speedup of 30% for SPEC2006 (up to 2.7x) and up to 4.6x over kernels. We also present a high-level design of our prefetcher, explore the power, energy and area limitations, and propose several optimizations for feasibility. We believe that this line of research can further improve the efficiency of such neural networks and allow harnessing them for additional micro-architectural predictions.

preprint2016arXiv

Flexible Caching in Trie Joins

Traditional algorithms for multiway join computation are based on rewriting the order of joins and combining results of intermediate subqueries. Recently, several approaches have been proposed for algorithms that are "worst-case optimal" wherein all relations are scanned simultaneously. An example is Veldhuizen's Leapfrog Trie Join (LFTJ). An important advantage of LFTJ is its small memory footprint, due to the fact that intermediate results are full tuples that can be dumped immediately. However, since the algorithm does not store intermediate results, recurring joins must be reconstructed from the source relations, resulting in excessive memory traffic. In this paper, we address this problem by incorporating caches into LFTJ. We do so by adopting recent developments on join optimization, tying variable ordering to tree decomposition. While the traditional usage of tree decomposition computes the result for each bag in advance, our proposed approach incorporates caching directly into LFTJ and can dynamically adjust the size of the cache. Consequently, our solution balances memory usage and repeated computation, as confirmed by our experiments over SNAP datasets.