Source author record

Yuxian Qiu

Yuxian Qiu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Distributed, Parallel, and Cluster Computing eess.SP Hardware Architecture Machine Learning

Catalog footprint

What is connected

3works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation

Quantization of deep neural networks (DNN) has been proven effective for compressing and accelerating DNN models. Data-free quantization (DFQ) is a promising approach without the original datasets under privacy-sensitive and confidential scenarios. However, current DFQ solutions degrade accuracy, need synthetic data to calibrate networks, and are time-consuming and costly. This paper proposes an on-the-fly DFQ framework with sub-second quantization time, called SQuant, which can quantize networks on inference-only devices with low computation and memory requirements. With the theoretical analysis of the second-order information of DNN task loss, we decompose and approximate the Hessian-based optimization objective into three diagonal sub-items, which have different areas corresponding to three dimensions of weight tensor: element-wise, kernel-wise, and output channel-wise. Then, we progressively compose sub-items and propose a novel data-free optimization objective in the discrete domain, minimizing Constrained Absolute Sum of Error (or CASE in short), which surprisingly does not need any dataset and is even not aware of network architecture. We also design an efficient algorithm without back-propagation to further reduce the computation complexity of the objective solver. Finally, without fine-tuning and synthetic datasets, SQuant accelerates the data-free quantization process to a sub-second level with >30% accuracy improvement over the existing data-free post-training quantization works, with the evaluated models under 4-bit quantization. We have open-sourced the SQuant framework at https://github.com/clevercool/SQuant.

preprint2020arXiv

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity

Network pruning can reduce the high computation cost of deep neural network (DNN) models. However, to maintain their accuracies, sparse models often carry randomly-distributed weights, leading to irregular computations. Consequently, sparse models cannot achieve meaningful speedup on commodity hardware (e.g., GPU) built for dense matrix computations. As such, prior works usually modify or design completely new sparsity-optimized architectures for exploiting sparsity. We propose an algorithm-software co-designed pruning method that achieves latency speedups on existing dense architectures. Our work builds upon the insight that the matrix multiplication generally breaks the large matrix into multiple smaller tiles for parallel execution. We propose a tiling-friendly "tile-wise" sparsity pattern, which maintains a regular pattern at the tile level for efficient execution but allows for irregular, arbitrary pruning at the global scale to maintain the high accuracy. We implement and evaluate the sparsity pattern on GPU tensor core, achieving a 1.95x speedup over the dense model.

preprint2020arXiv

Ptolemy: Architecture Support for Robust Deep Learning

Deep learning is vulnerable to adversarial attacks, where carefully-crafted input perturbations could mislead a well-trained Deep Neural Network to produce incorrect results. Today's countermeasures to adversarial attacks either do not have capability to detect adversarial samples at inference time, or introduce prohibitively high overhead to be practical at inference time. We propose Ptolemy, an algorithm-architecture co-designed system that detects adversarial attacks at inference time with low overhead and high accuracy.We exploit the synergies between DNN inference and imperative program execution: an input to a DNN uniquely activates a set of neurons that contribute significantly to the inference output, analogous to the sequence of basic blocks exercised by an input in a conventional program. Critically, we observe that adversarial samples tend to activate distinctive paths from those of benign inputs. Leveraging this insight, we propose an adversarial sample detection framework, which uses canary paths generated from offline profiling to detect adversarial samples at runtime. The Ptolemy compiler along with the co-designed hardware enable efficient execution by exploiting the unique algorithmic characteristics. Extensive evaluations show that Ptolemy achieves higher or similar adversarial example detection accuracy than today's mechanisms with a much lower runtime (as low as 2%) overhead.

Yuxian Qiu

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity

Ptolemy: Architecture Support for Robust Deep Learning