Source author record

Yinhe Han

Yinhe Han appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Hardware Architecture Computer Vision Distributed, Parallel, and Cluster Computing Emerging Technologies Machine Learning

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A System Architecture for Low Latency Multiprogramming Quantum Computing

As quantum systems scale, Multiprogramming Quantum Computing (MPQC) becomes essential to improve device utilization and throughput. However, current MPQC pipelines rely on expensive online compilation to co-optimize concurrently running programs, because quantum executables are device-dependent, non-portable across qubit regions, and highly susceptible to noise and crosstalk. This online step dominates runtime and impedes low-latency deployments for practical, real-world workloads in the future, such as repeatedly invoked Quantum Neural Network (QNN) services. We present FLAMENCO, a fidelity-aware multi-version compilation system that enables independent offline compilation and low-latency, high-fidelity multiprogramming at runtime. At the architecture level, FLAMENCO abstracts devices into compute units to drastically shrink the search space of region allocation. At compile time, it generates diverse executable versions for each program -- each bound to a distinct qubit region -- allowing dynamic region selection at runtime and overcoming non-portability. At runtime, FLAMENCO employs a streamlined orchestrator that leverages post-compilation fidelity metrics to avoid conflicts and mitigate crosstalk, achieving reliable co-execution without online co-optimization. Comprehensive evaluations against state-of-the-art MPQC baselines show that FLAMENCO removes online compilation overhead, achieves over 5$\times$ runtime speedup, improves execution fidelity, and maintains high utilization as concurrency increases.

preprint2022arXiv

Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals

Processing-in-memory (PIM) architectures have demonstrated great potential in accelerating numerous deep learning tasks. Particularly, resistive random-access memory (RRAM) devices provide a promising hardware substrate to build PIM accelerators due to their abilities to realize efficient in-situ vector-matrix multiplications (VMMs). However, existing PIM accelerators suffer from frequent and energy-intensive analog-to-digital (A/D) conversions, severely limiting their performance. This paper presents a new PIM architecture to efficiently accelerate deep learning tasks by minimizing the required A/D conversions with analog accumulation and neural approximated peripheral circuits. We first characterize the different dataflows employed by existing PIM accelerators, based on which a new dataflow is proposed to remarkably reduce the required A/D conversions for VMMs by extending shift and add (S+A) operations into the analog domain before the final quantizations. We then leverage a neural approximation method to design both analog accumulation circuits (S+A) and quantization circuits (ADCs) with RRAM crossbar arrays in a highly-efficient manner. Finally, we apply them to build an RRAM-based PIM accelerator (i.e., \textbf{Neural-PIM}) upon the proposed analog dataflow and evaluate its system-level performance. Evaluations on different benchmarks demonstrate that Neural-PIM can improve energy efficiency by 5.36x (1.73x) and speed up throughput by 3.43x (1.59x) without losing accuracy, compared to the state-of-the-art RRAM-based PIM accelerators, i.e., ISAAC (CASCADE).

preprint2020arXiv

Communication Lower Bound in Convolution Accelerators

In current convolutional neural network (CNN) accelerators, communication (i.e., memory access) dominates the energy consumption. This work provides comprehensive analysis and methodologies to minimize the communication for CNN accelerators. For the off-chip communication, we derive the theoretical lower bound for any convolutional layer and propose a dataflow to reach the lower bound. This fundamental problem has never been solved by prior studies. The on-chip communication is minimized based on an elaborate workload and storage mapping scheme. We in addition design a communication-optimal CNN accelerator architecture. Evaluations based on the 65nm technology demonstrate that the proposed architecture nearly reaches the theoretical minimum communication in a three-level memory hierarchy and it is computation dominant. The gap between the energy efficiency of our accelerator and the theoretical best value is only 37-87%.

preprint2020arXiv

Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction

Video prediction is a pixel-wise dense prediction task to infer future frames based on past frames. Missing appearance details and motion blur are still two major problems for current predictive models, which lead to image distortion and temporal inconsistency. In this paper, we point out the necessity of exploring multi-frequency analysis to deal with the two problems. Inspired by the frequency band decomposition characteristic of Human Vision System (HVS), we propose a video prediction network based on multi-level wavelet analysis to deal with spatial and temporal information in a unified manner. Specifically, the multi-level spatial discrete wavelet transform decomposes each video frame into anisotropic sub-bands with multiple frequencies, helping to enrich structural information and reserve fine details. On the other hand, multi-level temporal discrete wavelet transform which operates on time axis decomposes the frame sequence into sub-band groups of different frequencies to accurately capture multi-frequency motions under a fixed frame rate. Extensive experiments on diverse datasets demonstrate that our model shows significant improvements on fidelity and temporal consistency over state-of-the-art works.

Yinhe Han

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

A System Architecture for Low Latency Multiprogramming Quantum Computing

Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals

Communication Lower Bound in Convolution Accelerators

Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction