Source author record

Zhehui Wang

Zhehui Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

physics.ins-det Hardware Architecture Neural and Evolutionary Computing physics.plasm-ph Artificial Intelligence astro-ph.GA astro-ph.HE astro-ph.IM astro-ph.SR Distributed, Parallel, and Cluster Computing Emerging Technologies Machine Learning math.AP nucl-ex physics.data-an quant-ph

Catalog footprint

What is connected

10works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Resource-efficient Spiking Neural Network Accelerator Supporting Emerging Neural Encoding

Spiking neural networks (SNNs) recently gained momentum due to their low-power multiplication-free computing and the closer resemblance of biological processes in the nervous system of humans. However, SNNs require very long spike trains (up to 1000) to reach an accuracy similar to their artificial neural network (ANN) counterparts for large models, which offsets efficiency and inhibits its application to low-power systems for real-world use cases. To alleviate this problem, emerging neural encoding schemes are proposed to shorten the spike train while maintaining the high accuracy. However, current accelerators for SNN cannot well support the emerging encoding schemes. In this work, we present a novel hardware architecture that can efficiently support SNN with emerging neural encoding. Our implementation features energy and area efficient processing units with increased parallelism and reduced memory accesses. We verified the accelerator on FPGA and achieve 25% and 90% improvement over previous work in power consumption and latency, respectively. At the same time, high area efficiency allows us to scale for large neural network models. To the best of our knowledge, this is the first work to deploy the large neural network model VGG on physical FPGA-based neuromorphic hardware.

preprint2022arXiv

Efficiency Studies of Fast Neutron Tracking using MCNP

Fast neutron identification and spectroscopy is of great interest to nuclear physics experiments. Using the neutron elastic scattering, the fast neutron momentum can be measured. (Wang and Morris, 2013) introduced the theoretical concept that the initial fast neutron momentum can be derived from up to three consecutive elastic collisions between the neutron and the target, including the information of two consecutive recoil ion tracks and the vertex position of the third collision or two consecutive elastic collisions with the timing information. Here we also include the additional possibility of measuring the deposited energies from the recoil ions. In this paper, we simulate the neutron elastic scattering using the Monte Carlo N-Particle Transport Code (MCNP) and study the corresponding neutron detection and tracking efficiency. The corresponding efficiency and the scattering distances are simulated with different target materials, especially natural silicon (92.23$\%$ $^{28}$Si, 4.67$\%$ $^{29}$Si, and 3.1$\%$ $^{30}$Si) and helium-4 ($^4$He). The timing of collision and the recoil ion energy are also investigated, which are important characters for the detector design. We also calculate the ion travelling range for different energies using the software, "The Stopping and Range of Ions in Matter (SRIM)", showing that the ion track can be most conveniently observed in $^4$He unless sub-micron spatial resolution can be obtained in silicon.

preprint2022arXiv

Plasma Image Classification Using Cosine Similarity Constrained CNN

Plasma jets are widely investigated both in the laboratory and in nature. Astrophysical objects such as black holes, active galactic nuclei, and young stellar objects commonly emit plasma jets in various forms. With the availability of data from plasma jet experiments resembling astrophysical plasma jets, classification of such data would potentially aid in investigating not only the underlying physics of the experiments but the study of astrophysical jets. In this work we use deep learning to process all of the laboratory plasma images from the Caltech Spheromak Experiment spanning two decades. We found that cosine similarity can aid in feature selection, classify images through comparison of feature vector direction, and be used as a loss function for the training of AlexNet for plasma image classification. We also develop a simple vector direction comparison algorithm for binary and multi-class classification. Using our algorithm we demonstrate 93% accurate binary classification to distinguish unstable columns from stable columns and 92% accurate five-way classification of a small, labeled data set which includes three classes corresponding to varying levels of kink instability.

preprint2022arXiv

Stability of Bernstein type theorem for the minimal surface equation

Let $ Ω\subsetneq \mathbf{R}^n\,(n\geq 2)$ be an unbounded convex domain. We study the minimal surface equation in $Ω$ with boundary value given by the sum of a linear function and a bounded uniformly continuous function in $ \mathbf{R}^n$. If $ Ω$ is not a half space, we prove that the solution is unique. If $ Ω$ is a half space, we prove that graphs of all solutions form a foliation of $Ω\times\mathbf{R}$. This can be viewed as a stability type theorem for Edelen-Wang's Bernstein type theorem in \cite{EW2021}. We also establish a comparison principle for the minimal surface equation in $Ω$.

preprint2021arXiv

Asynchronous and Load-Balanced Union-Find for Distributed and Parallel Scientific Data Visualization and Analysis

We present a novel distributed union-find algorithm that features asynchronous parallelism and k-d tree based load balancing for scalable visualization and analysis of scientific data. Applications of union-find include level set extraction and critical point tracking, but distributed union-find can suffer from high synchronization costs and imbalanced workloads across parallel processes. In this study, we prove that global synchronizations in existing distributed union-find can be eliminated without changing final results, allowing overlapped communications and computations for scalable processing. We also use a k-d tree decomposition to redistribute inputs, in order to improve workload balancing. We benchmark the scalability of our algorithm with up to 1,024 processes using both synthetic and application data. We demonstrate the use of our algorithm in critical point tracking and super-level set extraction with high-speed imaging experiments and fusion plasma simulations, respectively.

preprint2021arXiv

E3NE: An End-to-End Framework for Accelerating Spiking Neural Networks with Emerging Neural Encoding on FPGAs

Compiler frameworks are crucial for the widespread use of FPGA-based deep learning accelerators. They allow researchers and developers, who are not familiar with hardware engineering, to harness the performance attained by domain-specific logic. There exists a variety of frameworks for conventional artificial neural networks. However, not much research effort has been put into the creation of frameworks optimized for spiking neural networks (SNNs). This new generation of neural networks becomes increasingly interesting for the deployment of AI on edge devices, which have tight power and resource constraints. Our end-to-end framework E3NE automates the generation of efficient SNN inference logic for FPGAs. Based on a PyTorch model and user parameters, it applies various optimizations and assesses trade-offs inherent to spike-based accelerators. Multiple levels of parallelism and the use of an emerging neural encoding scheme result in an efficiency superior to previous SNN hardware implementations. For a similar model, E3NE uses less than 50% of hardware resources and 20% less power, while reducing the latency by an order of magnitude. Furthermore, scalability and generality allowed the deployment of the large-scale SNN models AlexNet and VGG.

preprint2021arXiv

Neural Network for 3D ICF Shell Reconstruction from Single Radiographs

In inertial confinement fusion (ICF), X-ray radiography is a critical diagnostic for measuring implosion dynamics, which contains rich 3D information. Traditional methods for reconstructing 3D volumes from 2D radiographs, such as filtered backprojection, require radiographs from at least two different angles or lines of sight (LOS). In ICF experiments, space for diagnostics is limited and cameras that can operate on the fast timescales are expensive to implement, limiting the number of projections that can be acquired. To improve the imaging quality as a result of this limitation, convolutional neural networks (CNN) have recently been shown to be capable of producing 3D models from visible light images or medical X-ray images rendered by volumetric computed tomography LOS (SLOS). We propose a CNN to reconstruct 3D ICF spherical shells from single radiographs. We also examine sensitivity of the 3D reconstruction to different illumination models using preprocessing techniques such as pseudo-flat fielding. To resolve the issue of the lack of 3D supervision, we show that training the CNN utilizing synthetic radiographs produced by known simulation methods allows for reconstruction of experimental data as long as the experimental data is similar to the synthetic data. We also show that the CNN allows for 3D reconstruction of shells that possess low mode asymmetries. Further comparisons of the 3D reconstructions with direct multiple LOS measurements are justified.

preprint2020arXiv

EDCompress: Energy-Aware Model Compression for Dataflows

Edge devices demand low energy consumption, cost and small form factor. To efficiently deploy convolutional neural network (CNN) models on edge device, energy-aware model compression becomes extremely important. However, existing work did not study this problem well because the lack of considering the diversity of dataflow types in hardware architectures. In this paper, we propose EDCompress, an Energy-aware model compression method for various Dataflows. It can effectively reduce the energy consumption of various edge devices, with different dataflow types. Considering the very nature of model compression procedures, we recast the optimization process to a multi-step problem, and solve it by reinforcement learning algorithms. Experiments show that EDCompress could improve 20X, 17X, 37X energy efficiency in VGG-16, MobileNet, LeNet-5 networks, respectively, with negligible loss of accuracy. EDCompress could also find the optimal dataflow type for specific neural networks in terms of energy consumption, which can guide the deployment of CNN models on hardware systems.

preprint2020arXiv

Monte Carlo Modeling and Design of Photon Energy Attenuation Layers (PALs) for 10-30x Quantum Yield Enhancement in Si-based Hard X-ray Detectors

High-energy (>20keV) X-ray photon detection at high quantum yield, high spatial resolution and short response time has long been an important area of study in physics. Scintillation is a prevalent method but limited in various ways. Directly detecting high-energy X-ray photons has been a challenge to this day, mainly due to low photon-to-photoelectron conversion efficiencies. Commercially available state-of-the-art Si direct detection products such as the Si charge-coupled device (CCD) are inefficient for >10keV photons. Here, we present Monte Carlo simulation results and analyses to introduce a highly effective yet simple high-energy X-ray detection concept with significantly enhanced photon-to-electron conversion efficiencies composed of two layers: a top high-Z photon energy attenuator layer (PAL) and a bottom Si detector. We use the principle of photon energy down conversion, where high-energy X-ray photon energies are attenuated down to and below 10keV via inelastic scattering suitable for efficient photoelectric absorption by Si. Our Monte Carlo simulation results demonstrate that 10-30x increase in quantum yield can be achieved using PbTe PAL on Si, potentially advancing high-resolution, high-efficiency X-ray detection using PAL-enhanced Si CMOS image sensors.

preprint2020arXiv

Strong scattering and parallel guiding of ultracold neutrons

For ultracold neutrons with a kinetic energy below 10 neV, strong scattering, characterized by $2πl_{c} / λ\leq 1$, can be obtained in metamaterials of C and $^7$Li. Here $l_{c}$ and $λ$ are the coherent scattering mean free path and the neutron wavelength, respectively. UCN interferometry and high-resolution spectroscopy (nano-electronvolt to pico-electronvolt resolution) in parallel waveguide arrays of neutronic metamaterials are given as examples of new experimental possibilities.

Zhehui Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

A Resource-efficient Spiking Neural Network Accelerator Supporting Emerging Neural Encoding

Efficiency Studies of Fast Neutron Tracking using MCNP

Plasma Image Classification Using Cosine Similarity Constrained CNN

Stability of Bernstein type theorem for the minimal surface equation

Asynchronous and Load-Balanced Union-Find for Distributed and Parallel Scientific Data Visualization and Analysis

E3NE: An End-to-End Framework for Accelerating Spiking Neural Networks with Emerging Neural Encoding on FPGAs

Neural Network for 3D ICF Shell Reconstruction from Single Radiographs

EDCompress: Energy-Aware Model Compression for Dataflows

Monte Carlo Modeling and Design of Photon Energy Attenuation Layers (PALs) for 10-30x Quantum Yield Enhancement in Si-based Hard X-ray Detectors

Strong scattering and parallel guiding of ultracold neutrons