Source author record

Yingtao Jiang

Yingtao Jiang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Distributed, Parallel, and Cluster Computing Hardware Architecture Machine Learning Other Computer Science quant-ph

Catalog footprint

What is connected

4works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Spectral Complex Autoencoder Pruning: A Fidelity-Guided Criterion for Extreme Structured Channel Compression

We propose Spectral Complex Autoencoder Pruning (SCAP), a reconstruction-based criterion that measures functional redundancy at the level of individual output channels. For each convolutional layer, we construct a complex interaction field by pairing the full multi-channel input activation as the real part with a single output-channel activation (spatially aligned and broadcast across input channels) as the imaginary part. We transform this complex field to the frequency domain and train a low-capacity autoencoder to reconstruct normalized spectra. Channels whose spectra are reconstructed with high fidelity are interpreted as lying close to a low-dimensional manifold captured by the autoencoder and are therefore more compressible; conversely, channels with low fidelity are retained as they encode information that cannot be compactly represented by the learned manifold. This yields an importance score (optionally fused with the filter L1 norm) that supports simple threshold-based pruning and produces a structurally consistent pruned network. On VGG16 trained on CIFAR-10, at a fixed threshold of 0.6, we obtain 90.11% FLOP reduction and 96.30% parameter reduction with an absolute Top-1 accuracy drop of 1.67% from a 93.44% baseline after fine-tuning, demonstrating that spectral reconstruction fidelity of complex interaction fields is an effective proxy for channel-level redundancy under aggressive compression.

preprint2025arXiv

One-Shot Structured Pruning of Quantum Neural Networks via $q$-Group Engineering and Quantum Geometric Metrics

Quantum neural networks (QNNs) suffer from severe gate-level redundancy, which hinders their deployment on noisy intermediate-scale quantum (NISQ) devices. In this work, we propose q-iPrune, a one-shot structured pruning framework grounded in the algebraic structure of $q$-deformed groups and task-conditioned quantum geometry. Unlike prior heuristic or gradient-based pruning methods, q-iPrune formulates redundancy directly at the gate level. Each gate is compared within an algebraically consistent subgroup using a task-conditioned $q$-overlap distance, which measures functional similarity through state overlaps on a task-relevant ensemble. A gate is removed only when its replacement by a subgroup representative provably induces a bounded deviation on all task observables. We establish three rigorous theoretical guarantees. First, we prove completeness of redundancy pruning: no gate that violates the prescribed similarity threshold is removed. Second, we show that the pruned circuit is functionally equivalent up to an explicit, task-conditioned error bound, with a closed-form dependence on the redundancy tolerance and the number of replaced gates. Third, we prove that the pruning procedure is computationally feasible, requiring only polynomial-time comparisons and avoiding exponential enumeration over the Hilbert space. To adapt pruning decisions to hardware imperfections, we introduce a noise-calibrated deformation parameter $λ$ that modulates the $q$-geometry and redundancy tolerance. Experiments on standard quantum machine learning benchmarks demonstrate that q-iPrune achieves substantial gate reduction while maintaining bounded task performance degradation, consistent with our theoretical guarantees.

preprint2021arXiv

Data Streaming and Traffic Gathering in Mesh-based NoC for Deep Neural Network Acceleration

The increasing popularity of deep neural network (DNN) applications demands high computing power and efficient hardware accelerator architecture. DNN accelerators use a large number of processing elements (PEs) and on-chip memory for storing weights and other parameters. As the communication backbone of a DNN accelerator, networks-on-chip (NoC) play an important role in supporting various dataflow patterns and enabling processing with communication parallelism in a DNN accelerator. However, the widely used mesh-based NoC architectures inherently cannot support the efficient one-to-many and many-to-one traffic largely existing in DNN workloads. In this paper, we propose a modified mesh architecture with a one-way/two-way streaming bus to speedup one-to-many (multicast) traffic, and the use of gather packets to support many-to-one (gather) traffic. The analysis of the runtime latency of a convolutional layer shows that the two-way streaming architecture achieves better improvement than the one-way streaming architecture for an Output Stationary (OS) dataflow architecture. The simulation results demonstrate that the gather packets can help to reduce the runtime latency up to 1.8 times and network power consumption up to 1.7 times, compared with the repetitive unicast method on modified mesh architectures supporting two-way streaming.

preprint2014arXiv

Co-Emulation of Scan-Chain Based Designs Utilizing SCE-MI Infrastructure

As the complexity of the scan algorithm is dependent on the number of design registers, large SoC scan designs can no longer be verified in RTL simulation unless partitioned into smaller sub-blocks. This paper proposes a methodology to decrease scan-chain verification time utilizing SCE-MI, a widely used communication protocol for emulation, and an FPGA-based emulation platform. A high-level (SystemC) testbench and FPGA synthesizable hardware transactor models are developed for the scan-chain ISCAS89 S400 benchmark circuit for high-speed communication between the host CPU workstation and the FPGA emulator. The emulation results are compared to other verification methodologies (RTL Simulation, Simulation Acceleration, and Transaction-based emulation), and found to be 82% faster than regular RTL simulation. In addition, the emulation runs in the MHz speed range, allowing the incorporation of software applications, drivers, and operating systems, as opposed to the Hz range in RTL simulation or sub-megahertz range as accomplished in transaction-based emulation. In addition, the integration of scan testing and acceleration/emulation platforms allows more complex DFT methods to be developed and tested on a large scale system, decreasing the time to market for products.