Source author record

Kevin Callahan-Coray

Kevin Callahan-Coray appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Artificial Intelligence cond-mat.dis-nn Emerging Technologies Machine Learning

Catalog footprint

What is connected

2works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Probabilistic Computers for MIMO Detection: From Sparsification to 2D Parallel Tempering

Probabilistic computers built from p-bits offer a promising path for combinatorial optimization, but the dense connectivity required by real-world problems scales poorly in hardware. Here, we address this through graph sparsification with auxiliary copy variables and demonstrate a fully on-chip parallel tempering solver on an FPGA. Targeting MIMO detection, a dense, NP-hard problem central to wireless communications, we fit 15 temperature replicas of a 128-node sparsified system (1,920 p-bits) entirely on-chip and achieve bit error rates significantly below conventional linear detectors. We report complete end-to-end solution times of 4.7 ms per instance, with all loading, sampling, readout, and verification overheads included. ASIC projections in 7 nm technology indicate about 90 MHz operation with less than 200 mW power dissipation, suggesting that massive parallelism across multiple chips could approach the throughput demands of next-generation wireless systems. However, sparsification introduces sensitivity to the copy-constraint strength. Employing Two-Dimensional Parallel Tempering (2D-PT), which exchanges replicas across both temperature and constraint dimensions, we demonstrate over 10X faster convergence without manual parameter tuning. These results establish an on-chip p-bit architecture and a scalable algorithmic framework for dense combinatorial optimization.

preprint2026arXiv

Stochastic Sparse Attention for Memory-Bound Inference

Autoregressive decoding becomes bandwidth-limited at long contexts, as generating each token requires reading all $n_k$ key and value vectors from KV cache. We present Stochastic Additive No-mulT Attention (SANTA), a method that sparsifies value-cache access by sampling $S \ll n_k$ indices from the post-softmax distribution and aggregates only those value rows. This yields an unbiased estimator of the post-softmax value aggregation while replacing value-stage multiply-accumulates with gather-and-add. We introduce stratified sampling to design variance-reduced, GPU-friendly variants, demonstrating $1.5\times$ decode-step attention kernel speedup over FlashInfer and FlashDecoding on an NVIDIA RTX 6000 Ada while matching baseline accuracy at 32k-token contexts. Finally, we propose Bernoulli $qK^\mathsf{T}$ sampling as a complementary technique to sparsify the score stage, reducing key-feature access through stochastic ternary queries. Both methods are orthogonal to upstream techniques such as ternary quantization, low-rank projections, and KV-cache compression. Together, they point toward sparse, multiplier-free, and energy-efficient inference. We open-source our kernels at: https://github.com/OPUSLab/SANTA.git

Kevin Callahan-Coray

What is connected

Connect this record

See the researcher in context

Building this map preview

2 published item(s)

Probabilistic Computers for MIMO Detection: From Sparsification to 2D Parallel Tempering

Stochastic Sparse Attention for Memory-Bound Inference