Source author record

Abhishek Bhattacharjee

Abhishek Bhattacharjee appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Hardware Architecture Distributed, Parallel, and Cluster Computing Operating Systems Programming Languages

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

A Multi-Site Accelerator-Rich Processing Fabric for Scalable Brain-Computer Interfacing

Hull is an accelerator-rich distributed implantable Brain-Computer Interface (BCI) that reads biological neurons at data rates that are 2-3 orders of magnitude higher than the prior state of art, while supporting many neuroscientific applications. Prior approaches have restricted brain interfacing to tens of megabits per second in order to meet two constraints necessary for effective operation and safe long-term implantation -- power dissipation under tens of milliwatts and response latencies in the tens of milliseconds. Hull also adheres to these constraints, but is able to interface with the brain at much higher data rates, thereby enabling, for the first time, BCI-driven research on and clinical treatment of brain-wide behaviors and diseases that require reading and stimulating many brain locations. Central to Hull's power efficiency is its realization as a distributed system of BCI nodes with accelerator-rich compute. Hull balances modular system layering with aggressive cross-layer hardware-software co-design to integrate compute, networking, and storage. The result is a lesson in designing networked distributed systems with hardware accelerators from the ground up.

preprint2022arXiv

Distill: Domain-Specific Compilation for Cognitive Models

This paper discusses our proposal and implementation of Distill, a domain-specific compilation tool based on LLVM to accelerate cognitive models. Cognitive models explain the process of cognitive function and offer a path to human-like artificial intelligence. However, cognitive modeling is laborious, requiring composition of many types of computational tasks, and suffers from poor performance as it relies on high-level languages like Python. In order to continue enjoying the flexibility of Python while achieving high performance, Distill uses domain-specific knowledge to compile Python-based cognitive models into LLVM IR, carefully stripping away features like dynamic typing and memory management that add overheads to the actual model. As we show, this permits significantly faster model execution. We also show that the code so generated enables using classical compiler data flow analysis passes to reveal properties about data flow in cognitive models that are useful to cognitive scientists. Distill is publicly available, is being used by researchers in cognitive science, and has led to patches that are currently being evaluated for integration into mainline LLVM.

preprint2020arXiv

SPARTA: A Divide and Conquer Approach to Address Translation for Accelerators

Virtual memory (VM) is critical to the usability and programmability of hardware accelerators. Unfortunately, implementing accelerator VM efficiently is challenging because the area and power constraints make it difficult to employ the large multi-level TLBs used in general-purpose CPUs. Recent research proposals advocate a number of restrictions on virtual-to-physical address mappings in order to reduce the TLB size or increase its reach. However, such restrictions are unattractive because they forgo many of the original benefits of traditional VM, such as demand paging and copy-on-write. We propose SPARTA, a divide and conquer approach to address translation. SPARTA splits the address translation into accelerator-side and memory-side parts. The accelerator-side translation hardware consists of a tiny TLB covering only the accelerator's cache hierarchy (if any), while the translation for main memory accesses is performed by shared memory-side TLBs. Performing the translation for memory accesses on the memory side allows SPARTA to overlap data fetch with translation, and avoids the replication of TLB entries for data shared among accelerators. To further improve the performance and efficiency of the memory-side translation, SPARTA logically partitions the memory space, delegating translation to small and efficient per-partition translation hardware. Our evaluation on index-traversal accelerators shows that SPARTA virtually eliminates translation overhead, reducing it by over 30x on average (up to 47x) and improving performance by 57%. At the same time, SPARTA requires minimal accelerator-side translation hardware, reduces the total number of TLB entries in the system, gracefully scales with memory size, and preserves all key VM functionalities.