Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
41works
0followers
19topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

41 published item(s)

preprint2026arXiv

A Unified Masked Jigsaw Puzzle Framework for Vision and Language Models

In federated learning, Transformer, as a popular architecture, faces critical challenges in defending against gradient attacks and improving model performance in both Computer Vision (CV) and Natural Language Processing (NLP) tasks. It has been revealed that the gradient of Position Embeddings (PEs) in Transformer contains sufficient information, which can be used to reconstruct the input data. To mitigate this issue, we introduce a Masked Jigsaw Puzzle (MJP) framework. MJP starts with random token shuffling to break the token order, and then a learnable \textit{unknown (unk)} position embedding is used to mask out the PEs of the shuffled tokens. In this manner, the local spatial information which is encoded in the position embeddings is disrupted, and the models are forced to learn feature representations that are less reliant on the local spatial information. Notably, with the careful use of MJP, we can not only improve models' robustness against gradient attacks, but also boost their performance in both vision and text application scenarios, such as classification for images (\textit{e.g.,} ImageNet-1K) and sentiment analysis for text (\textit{e.g.,} Yelp and Amazon). Experimental results suggest that MJP is a unified framework for different Transformer-based models in both vision and language tasks. Code is publicly available via https://github.com/ywxsuperstar/transformerattack

preprint2026arXiv

Residual Cross-Modal Fusion Networks for Audio-Visual Navigation

Audio-visual embodied navigation aims to enable an agent to autonomously localize and reach a sound source in unseen 3D environments by leveraging auditory cues. The key challenge of this task lies in effectively modeling the interaction between heterogeneous features during multimodal fusion, so as to avoid single-modality dominance or information degradation, particularly in cross-domain scenarios. To address this, we propose a Cross-Modal Residual Fusion Network, which introduces bidirectional residual interactions between audio and visual streams to achieve complementary modeling and fine-grained alignment, while maintaining the independence of their representations. Unlike conventional methods that rely on simple concatenation or attention gating, CRFN explicitly models cross-modal interactions via residual connections and incorporates stabilization techniques to improve convergence and robustness. Experiments on the Replica and Matterport3D datasets demonstrate that CRFN significantly outperforms state-of-the-art fusion baselines and achieves stronger cross-domain generalization. Notably, our experiments also reveal that agents exhibit differentiated modality dependence across different datasets. The discovery of this phenomenon provides a new perspective for understanding the cross-modal collaboration mechanism of embodied agents.

preprint2022arXiv

ALMA Images the Eccentric HD 53143 Debris Disk

We present ALMA 1.3 mm observations of the HD~53143 debris disk - the first infrared or millimeter image produced of this ~1 Gyr-old solar-analogue. Previous HST STIS coronagraphic imaging did not detect flux along the minor axis of the disk which could suggest a face-on geometry with two 'clumps' of dust. These ALMA observations reveal a disk with a strikingly different structure. In order to fit models to the millimeter visibilities and constrain the uncertainties on the disk parameters, we adopt an MCMC approach. This is the most eccentric debris disk observed to date with a forced eccentricity of $0.21\pm0.02$, nearly twice that of the Fomalhaut debris disk, and also displays apocenter glow. Although this eccentric model fits the outer debris disk well, there are significant interior residuals remaining that may suggest a possible edge-on inner disk, which remains unresolved in these observations. Combined with the observed structure difference between HST and ALMA, these results suggest a potential previous scattering event or dynamical instability in this system. We also note that the stellar flux changes considerably over the course of our observations, suggesting flaring at millimeter wavelengths. Using simultaneous TESS observations, we determine the stellar rotation period to be $9.6\pm0.1$ days.

preprint2022arXiv

CoCoPIE XGen: A Full-Stack AI-Oriented Optimizing Framework

There is a growing demand for shifting the delivery of AI capability from data centers on the cloud to edge or end devices, exemplified by the fast emerging real-time AI-based apps running on smartphones, AR/VR devices, autonomous vehicles, and various IoT devices. The shift has however been seriously hampered by the large growing gap between DNN computing demands and the computing power on edge or end devices. This article presents the design of XGen, an optimizing framework for DNN designed to bridge the gap. XGen takes cross-cutting co-design as its first-order consideration. Its full-stack AI-oriented optimizations consist of a number of innovative optimizations at every layer of the DNN software stack, all designed in a cooperative manner. The unique technology makes XGen able to optimize various DNNs, including those with an extreme depth (e.g., BERT, GPT, other transformers), and generate code that runs several times faster than those from existing DNN frameworks, while delivering the same level of accuracy.

preprint2022arXiv

Compiler-Aware Neural Architecture Search for On-Mobile Real-time Super-Resolution

Deep learning-based super-resolution (SR) has gained tremendous popularity in recent years because of its high image quality performance and wide application scenarios. However, prior methods typically suffer from large amounts of computations and huge power consumption, causing difficulties for real-time inference, especially on resource-limited platforms such as mobile devices. To mitigate this, we propose a compiler-aware SR neural architecture search (NAS) framework that conducts depth search and per-layer width search with adaptive SR blocks. The inference speed is directly taken into the optimization along with the SR loss to derive SR models with high image quality while satisfying the real-time inference requirement. Instead of measuring the speed on mobile devices at each iteration during the search process, a speed model incorporated with compiler optimizations is leveraged to predict the inference latency of the SR block with various width configurations for faster convergence. With the proposed framework, we achieve real-time SR inference for implementing 720p resolution with competitive SR performance (in terms of PSNR and SSIM) on GPU/DSP of mobile platforms (Samsung Galaxy S21).

preprint2022arXiv

Disk Evolution Study Through Imaging of Nearby Young Stars (DESTINYS): A Panchromatic View of DO Tau's Complex Kilo-au Environment

While protoplanetary disks are often treated as isolated systems in planet formation models, observations increasingly suggest that vigorous interactions between Class II disks and their environments are not rare. DO Tau is a T Tauri star that has previously been hypothesized to have undergone a close encounter with the HV Tau system. As part of the DESTINYS ESO Large Programme, we present new VLT/SPHERE polarimetric observations of DO Tau and combine them with archival HST scattered light images and ALMA observations of CO isotopologues and CS to map a network of complex structures. The SPHERE and ALMA observations show that the circumstellar disk is connected to arms extending out to several hundred au. HST and ALMA also reveal stream-like structures northeast of DO Tau, some of which are at least several thousand au long. These streams appear not to be gravitationally bound to DO Tau, and comparisons with previous Herschel far-IR observations suggest that the streams are part of a bridge-like structure connecting DO Tau and HV Tau. We also detect a fainter redshifted counterpart to a previously known blueshifted CO outflow. While some of DO Tau's complex structures could be attributed to a recent disk-disk encounter, they might be explained alternatively by interactions with remnant material from the star formation process. These panchromatic observations of DO Tau highlight the need to contextualize the evolution of Class II disks by examining processes occurring over a wide range of size scales.

preprint2022arXiv

Mass Testing and Characterization of 20-inch PMTs for JUNO

Main goal of the JUNO experiment is to determine the neutrino mass ordering using a 20kt liquid-scintillator detector. Its key feature is an excellent energy resolution of at least 3 % at 1 MeV, for which its instruments need to meet a certain quality and thus have to be fully characterized. More than 20,000 20-inch PMTs have been received and assessed by JUNO after a detailed testing program which began in 2017 and elapsed for about four years. Based on this mass characterization and a set of specific requirements, a good quality of all accepted PMTs could be ascertained. This paper presents the performed testing procedure with the designed testing systems as well as the statistical characteristics of all 20-inch PMTs intended to be used in the JUNO experiment, covering more than fifteen performance parameters including the photocathode uniformity. This constitutes the largest sample of 20-inch PMTs ever produced and studied in detail to date, i.e. 15,000 of the newly developed 20-inch MCP-PMTs from Northern Night Vision Technology Co. (NNVT) and 5,000 of dynode PMTs from Hamamatsu Photonics K. K.(HPK).

preprint2022arXiv

Real-Time Portrait Stylization on the Edge

In this work we demonstrate real-time portrait stylization, specifically, translating self-portrait into cartoon or anime style on mobile devices. We propose a latency-driven differentiable architecture search method, maintaining realistic generative quality. With our framework, we obtain $10\times$ computation reduction on the generative model and achieve real-time video stylization on off-the-shelf smartphone using mobile GPUs.

preprint2022arXiv

Speed-ANN: Low-Latency and High-Accuracy Nearest Neighbor Search via Intra-Query Parallelism

Nearest Neighbor Search (NNS) has recently drawn a rapid increase of interest due to its core role in managing high-dimensional vector data in data science and AI applications. The interest is fueled by the success of neural embedding, where deep learning models transform unstructured data into semantically correlated feature vectors for data analysis, e.g., recommend popular items. Among several categories of methods for fast NNS, similarity graph is one of the most successful algorithmic trends. Several of the most popular and top-performing similarity graphs, such as NSG and HNSW, at their core employ best-first traversal along the underlying graph indices to search near neighbors. Maximizing the performance of the search is essential for many tasks, especially at the large-scale and high-recall regime. In this work, we provide an in-depth examination of the challenges of the state-of-the-art similarity search algorithms, revealing its challenges in leveraging multi-core processors to speed up the search efficiency. We also exploit whether similarity graph search is robust to deviation from maintaining strict order by allowing multiple walkers to simultaneously advance the search frontier. Based on our insights, we propose Speed-ANN, a parallel similarity search algorithm that exploits hidden intra-query parallelism and memory hierarchy that allows similarity search to take advantage of multiple CPU cores to significantly accelerate search speed while achieving high accuracy. We evaluate Speed-ANN on a wide range of datasets, ranging from million to billion data points, and show its shorter query latency than NSG and HNSW, respectively. Besides, with multicore support, we show that our approach offers faster search latency than highly-optimized GPU implementation and provides good scalability as the increase of the number of hardware resources (e.g., CPU cores) and graph sizes.

preprint2022arXiv

Survey: Exploiting Data Redundancy for Optimization of Deep Learning

Data redundancy is ubiquitous in the inputs and intermediate results of Deep Neural Networks (DNN). It offers many significant opportunities for improving DNN performance and efficiency and has been explored in a large body of work. These studies have scattered in many venues across several years. The targets they focus on range from images to videos and texts, and the techniques they use to detect and exploit data redundancy also vary in many aspects. There is not yet a systematic examination and summary of the many efforts, making it difficult for researchers to get a comprehensive view of the prior work, the state of the art, differences and shared principles, and the areas and directions yet to explore. This article tries to fill the void. It surveys hundreds of recent papers on the topic, introduces a novel taxonomy to put the various techniques into a single categorization framework, offers a comprehensive description of the main methods used for exploiting data redundancy in improving multiple kinds of DNNs on data, and points out a set of research opportunities for future to explore.

preprint2021arXiv

A High-Performance Sparse Tensor Algebra Compiler in Multi-Level IR

Tensor algebra is widely used in many applications, such as scientific computing, machine learning, and data analytics. The tensors represented real-world data are usually large and sparse. There are tens of storage formats designed for sparse matrices and/or tensors and the performance of sparse tensor operations depends on a particular architecture and/or selected sparse format, which makes it challenging to implement and optimize every tensor operation of interest and transfer the code from one architecture to another. We propose a tensor algebra domain-specific language (DSL) and compiler infrastructure to automatically generate kernels for mixed sparse-dense tensor algebra operations, named COMET. The proposed DSL provides high-level programming abstractions that resemble the familiar Einstein notation to represent tensor algebra operations. The compiler performs code optimizations and transformations for efficient code generation while covering a wide range of tensor storage formats. COMET compiler also leverages data reordering to improve spatial or temporal locality for better performance. Our results show that the performance of automatically generated kernels outperforms the state-of-the-art sparse tensor algebra compiler, with up to 20.92x, 6.39x, and 13.9x performance improvement, for parallel SpMV, SpMM, and TTM over TACO, respectively.

preprint2021arXiv

Achieving Real-Time LiDAR 3D Object Detection on a Mobile Device

3D object detection is an important task, especially in the autonomous driving application domain. However, it is challenging to support the real-time performance with the limited computation and memory resources on edge-computing devices in self-driving cars. To achieve this, we propose a compiler-aware unified framework incorporating network enhancement and pruning search with the reinforcement learning techniques, to enable real-time inference of 3D object detection on the resource-limited edge-computing devices. Specifically, a generator Recurrent Neural Network (RNN) is employed to provide the unified scheme for both network enhancement and pruning search automatically, without human expertise and assistance. And the evaluated performance of the unified schemes can be fed back to train the generator RNN. The experimental results demonstrate that the proposed framework firstly achieves real-time 3D object detection on mobile devices (Samsung Galaxy S20 phone) with competitive detection performance.

preprint2021arXiv

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices. However, prior pruning schemes have limited application scenarios due to accuracy degradation, difficulty in leveraging hardware acceleration, and/or restriction on certain types of DNN layers. In this paper, we propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations that are applicable to any type of DNN layer while achieving high accuracy and hardware inference performance. With the flexibility of applying different pruning schemes to different layers enabled by our compiler optimizations, we further probe into the new problem of determining the best-suited pruning scheme considering the different acceleration and accuracy performance of various pruning schemes. Two pruning scheme mapping methods, one is search-based and the other is rule-based, are proposed to automatically derive the best-suited pruning regularity and block size for each layer of any given DNN. Experimental results demonstrate that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework with up to 2.48$\times$ and 1.73$\times$ DNN inference acceleration on CIFAR-10 and ImageNet dataset without accuracy loss.

preprint2021arXiv

COMET: A Domain-Specific Compilation of High-Performance Computational Chemistry

The computational power increases over the past decades havegreatly enhanced the ability to simulate chemical reactions andunderstand ever more complex transformations. Tensor contractions are the fundamental computational building block of these simulations. These simulations have often been tied to one platform and restricted in generality by the interface provided to the user. The expanding prevalence of accelerators and researcher demands necessitate a more general approach which is not tied to specific hardware or requires contortion of algorithms to specific hardware platforms. In this paper we present COMET, a domain-specific programming language and compiler infrastructure for tensor contractions targeting heterogeneous accelerators. We present a system of progressive lowering through multiple layers of abstraction and optimization that achieves up to 1.98X speedup for 30 tensor contractions commonly used in computational chemistry and beyond.

preprint2021arXiv

Improving Planet Detection with Disk Modeling: Keck/NIRC2 Imaging of the HD 34282 Single-armed Protoplanetary Disk

Formed in protoplanetary disks around young stars, giant planets can leave observational features such as spirals and gaps in their natal disks through planet-disk interactions. Although such features can indicate the existence of giant planets, protoplanetary disk signals can overwhelm the innate luminosity of planets. Therefore, in order to image planets that are embedded in disks, it is necessary to remove the contamination from the disks to reveal the planets possibly hiding within their natal environments. We observe and directly model the detected disk in the Keck/NIRC2 vortex coronagraph $L'$-band observations of the single-armed protoplanetary disk around HD 34282. Despite a non-detection of companions for HD 34282, this direct disk modeling improves planet detection sensitivity by up to a factor of 2 in flux ratio and ${\sim}10 M_{\rm Jupiter}$ in mass. This suggests that performing disk modeling can improve directly imaged planet detection limits in systems with visible scattered light disks, and can help to better constrain the occurrence rates of self-luminous planets in these systems.

preprint2021arXiv

JUNO Physics and Detector

The Jiangmen Underground Neutrino Observatory (JUNO) is a 20 kton LS detector at 700-m underground. An excellent energy resolution and a large fiducial volume offer exciting opportunities for addressing many important topics in neutrino and astro-particle physics. With 6 years of data, the neutrino mass ordering can be determined at 3-4 sigma and three oscillation parameters can be measured to a precision of 0.6% or better by detecting reactor antineutrinos. With 10 years of data, DSNB could be observed at 3-sigma; a lower limit of the proton lifetime of 8.34e33 years (90% C.L.) can be set by searching for p->nu_bar K^+; detection of solar neutrinos would shed new light on the solar metallicity problem and examine the vacuum-matter transition region. A core-collapse supernova at 10 kpc would lead to ~5000 IBD and ~2000 (300) all-flavor neutrino-proton (electron) scattering events. Geo-neutrinos can be detected with a rate of ~400 events/year. We also summarize the final design of the JUNO detector and the key R&D achievements. All 20-inch PMTs have been tested. The average photon detection efficiency is 28.9% for the 15,000 MCP PMTs and 28.1% for the 5,000 dynode PMTs, higher than the JUNO requirement of 27%. Together with the >20 m attenuation length of LS, we expect a yield of 1345 p.e. per MeV and an effective energy resolution of 3.02%/\sqrt{E (MeV)}$ in simulations. The underwater electronics is designed to have a loss rate <0.5% in 6 years. With degassing membranes and a micro-bubble system, the radon concentration in the 35-kton water pool could be lowered to <10 mBq/m^3. Acrylic panels of radiopurity <0.5 ppt U/Th are produced. The 20-kton LS will be purified onsite. Singles in the fiducial volume can be controlled to ~10 Hz. The JUNO experiment also features a double calorimeter system with 25,600 3-inch PMTs, a LS testing facility OSIRIS, and a near detector TAO.

preprint2021arXiv

RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices

Mobile devices are becoming an important carrier for deep learning tasks, as they are being equipped with powerful, high-end mobile CPUs and GPUs. However, it is still a challenging task to execute 3D Convolutional Neural Networks (CNNs) targeting for real-time performance, besides high inference accuracy. The reason is more complex model structure and higher model dimensionality overwhelm the available computation/storage resources on mobile devices. A natural way may be turning to deep learning weight pruning techniques. However, the direct generalization of existing 2D CNN weight pruning methods to 3D CNNs is not ideal for fully exploiting mobile parallelism while achieving high inference accuracy. This paper proposes RT3D, a model compression and mobile acceleration framework for 3D CNNs, seamlessly integrating neural network weight pruning and compiler code generation techniques. We propose and investigate two structured sparsity schemes i.e., the vanilla structured sparsity and kernel group structured (KGS) sparsity that are mobile acceleration friendly. The vanilla sparsity removes whole kernel groups, while KGS sparsity is a more fine-grained structured sparsity that enjoys higher flexibility while exploiting full on-device parallelism. We propose a reweighted regularization pruning algorithm to achieve the proposed sparsity schemes. The inference time speedup due to sparsity is approaching the pruning rate of the whole model FLOPs (floating point operations). RT3D demonstrates up to 29.1$\times$ speedup in end-to-end inference time comparing with current mobile frameworks supporting 3D CNNs, with moderate 1%-1.5% accuracy loss. The end-to-end inference time for 16 video frames could be within 150 ms, when executing representative C3D and R(2+1)D models on a cellphone. For the first time, real-time execution of 3D CNNs is achieved on off-the-shelf mobiles.

preprint2021arXiv

Sensitivity of the Roman Coronagraph Instrument to Exozodiacal Dust

Exozodiacal dust, warm debris from comets and asteroids in and near the habitable zone of stellar systems, reveals the physical processes that shape planetary systems. Scattered light from this dust is also a source of background flux which must be overcome by future missions to image Earthlike planets. This study quantifies the sensitivity of the Nancy Grace Roman Space Telescope Coronagraph to light scattered by exozodi, the zodiacal dust around other stars. Using a sample of 149 nearby stars, previously selected for optimum detection of habitable exoplanets by space observatories, we find the maximum number of exozodiacal disks with observable \textit{inner} habitable zone boundaries is six and the number of observable outer habitable boundaries is 74. One zodi was defined as the visible-light surface brightness of 22 $m_{\rm V}\ $arcsec$^{-2}$ around a solar-mass star, approximating the scattered light brightness in visible light at the Earth-equivalent insolation. In the speckle limited case, where the signal-to-noise ratio is limited by speckle temporal stability rather than shot noise, the median $5σ$ sensitivity to habitable zone exozodi is 12 zodi per resolution element. This estimate is calculated at the inner-working angle of the coronagraph, for the current best estimate performance, neglecting margins on the uncertainty in instrument performance and including a post-processing speckle suppression factor. For an log-norm distribution of exozodi levels with a median exozodi of 3$\times$ the solar zodi, we find that the Roman Coronagraph would be able to make 5$σ$ detections of exozodiacal disks in scattered light from 13 systems with a 95\% confidence interval spanning 7-20 systems. This sensitivity allows Roman Coronagraph to complement ground-based measurements of exozodiacal thermal emission and constrain dust albedos.

preprint2020arXiv

A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration Framework

Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices. However, previous pruning methods mainly focus on reducing the model size and/or improving performance without considering the privacy of user data. To mitigate this concern, we propose a privacy-preserving-oriented pruning and mobile acceleration framework that does not require the private training dataset. At the algorithm level of the proposed framework, a systematic weight pruning technique based on the alternating direction method of multipliers (ADMM) is designed to iteratively solve the pattern-based pruning problem for each layer with randomly generated synthetic data. In addition, corresponding optimizations at the compiler level are leveraged for inference accelerations on devices. With the proposed framework, users could avoid the time-consuming pruning process for non-experts and directly benefit from compressed models. Experimental results show that the proposed framework outperforms three state-of-art end-to-end DNN frameworks, i.e., TensorFlow-Lite, TVM, and MNN, with speedup up to 4.2X, 2.5X, and 2.0X, respectively, with almost no accuracy loss, while preserving data privacy.

preprint2020arXiv

An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices

Weight pruning has been widely acknowledged as a straightforward and effective method to eliminate redundancy in Deep Neural Networks (DNN), thereby achieving acceleration on various platforms. However, most of the pruning techniques are essentially trade-offs between model accuracy and regularity which lead to impaired inference accuracy and limited on-device acceleration performance. To solve the problem, we introduce a new sparsity dimension, namely pattern-based sparsity that comprises pattern and connectivity sparsity, and becoming both highly accurate and hardware friendly. With carefully designed patterns, the proposed pruning unprecedentedly and consistently achieves accuracy enhancement and better feature extraction ability on different DNN structures and datasets, and our pattern-aware pruning framework also achieves pattern library extraction, pattern selection, pattern and connectivity pruning and weight training simultaneously. Our approach on the new pattern-based sparsity naturally fits into compiler optimization for highly efficient DNN execution on mobile platforms. To the best of our knowledge, it is the first time that mobile devices achieve real-time inference for the large-scale DNN models thanks to the unique spatial property of pattern-based sparsity and the help of the code generation capability of compilers.

preprint2020arXiv

BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted Regularization Method

Accelerating DNN execution on various resource-limited computing platforms has been a long-standing problem. Prior works utilize l1-based group lasso or dynamic regularization such as ADMM to perform structured pruning on DNN models to leverage the parallel computing architectures. However, both of the pruning dimensions and pruning methods lack universality, which leads to degraded performance and limited applicability. To solve the problem, we propose a new block-based pruning framework that comprises a general and flexible structured pruning dimension as well as a powerful and efficient reweighted regularization method. Our framework is universal, which can be applied to both CNNs and RNNs, implying complete support for the two major kinds of computation-intensive layers (i.e., CONV and FC layers). To complete all aspects of the pruning-for-acceleration task, we also integrate compiler-based code optimization into our framework that can perform DNN inference in a real-time manner. To the best of our knowledge, it is the first time that the weight pruning framework achieves universal coverage for both CNNs and RNNs with real-time mobile acceleration and no accuracy compromise.

preprint2020arXiv

CoCoPIE: Making Mobile AI Sweet As PIE --Compression-Compilation Co-Design Goes a Long Way

Assuming hardware is the major constraint for enabling real-time mobile intelligence, the industry has mainly dedicated their efforts to developing specialized hardware accelerators for machine learning and inference. This article challenges the assumption. By drawing on a recent real-time AI optimization framework CoCoPIE, it maintains that with effective compression-compiler co-design, it is possible to enable real-time artificial intelligence on mainstream end devices without special hardware. CoCoPIE is a software framework that holds numerous records on mobile AI: the first framework that supports all main kinds of DNNs, from CNNs to RNNs, transformer, language models, and so on; the fastest DNN pruning and acceleration framework, up to 180X faster compared with current DNN pruning on other frameworks such as TensorFlow-Lite; making many representative AI applications able to run in real-time on off-the-shelf mobile devices that have been previously regarded possible only with special hardware support; making off-the-shelf mobile devices outperform a number of representative ASIC and FPGA solutions in terms of energy efficiency and/or performance.

preprint2020arXiv

Debris Disk Results from the Gemini Planet Imager Exoplanet Survey&#39;s Polarimetric Imaging Campaign

We report the results of a ${\sim}4$-year direct imaging survey of 104 stars to resolve and characterize circumstellar debris disks in scattered light as part of the Gemini Planet Imager Exoplanet Survey. We targeted nearby (${\lesssim}150$ pc), young (${\lesssim}500$ Myr) stars with high infrared excesses ($L_{\mathrm{IR}} / L_\star > 10^{-5}$), including 38 with previously resolved disks. Observations were made using the Gemini Planet Imager high-contrast integral field spectrograph in $H$-band (1.6 $μ$m) coronagraphic polarimetry mode to measure both polarized and total intensities. We resolved 26 debris disks and three protoplanetary/transitional disks. Seven debris disks were resolved in scattered light for the first time, including newly presented HD 117214 and HD 156623, and we quantified basic morphologies of five of them using radiative transfer models. All of our detected debris disks but HD 156623 have dust-poor inner holes, and their scattered-light radii are generally larger than corresponding radii measured from resolved thermal emission and those inferred from spectral energy distributions. To assess sensitivity, we report contrasts and consider causes of non-detections. Detections were strongly correlated with high IR excess and high inclination, although polarimetry outperformed total intensity angular differential imaging for detecting low inclination disks (${\lesssim} 70 °$). Based on post-survey statistics, we improved upon our pre-survey target prioritization metric predicting polarimetric disk detectability. We also examined scattered-light disks in the contexts of gas, far-IR, and millimeter detections. Comparing $H$-band and ALMA fluxes for two disks revealed tentative evidence for differing grain properties. Finally, we found no preference for debris disks to be detected in scattered light if wide-separation substellar companions were present.

preprint2020arXiv

Dynamical Evidence of a Spiral Arm--Driving Planet in the MWC 758 Protoplanetary Disk

More than a dozen young stars host spiral arms in their surrounding protoplanetary disks. The excitation mechanisms of such arms are under debate. The two leading hypotheses -- companion-disk interaction and gravitational instability (GI) -- predict distinct motion for spirals. By imaging the MWC 758 spiral arm system at two epochs spanning ${\sim}5$ yr using the SPHERE instrument on the Very Large Telescope (VLT), we test the two hypotheses for the first time. We find that the pattern speeds of the spirals are not consistent with the GI origin. Our measurements further evince the existence of a faint &#34;missing planet&#34; driving the disk arms. The average spiral pattern speed is $0.\!^\circ22\pm0.\!^\circ03$ yr$^{-1}$, pointing to a driver at $172_{-14}^{+18}$ au around a $1.9$ $M_\odot$ central star if it is on a circular orbit. In addition, we witness time varying shadowing effects on a global scale that are likely originated from an inner disk.

preprint2020arXiv

Feasibility and physics potential of detecting $^8$B solar neutrinos at JUNO

The Jiangmen Underground Neutrino Observatory~(JUNO) features a 20~kt multi-purpose underground liquid scintillator sphere as its main detector. Some of JUNO&#39;s features make it an excellent experiment for $^8$B solar neutrino measurements, such as its low-energy threshold, its high energy resolution compared to water Cherenkov detectors, and its much large target mass compared to previous liquid scintillator detectors. In this paper we present a comprehensive assessment of JUNO&#39;s potential for detecting $^8$B solar neutrinos via the neutrino-electron elastic scattering process. A reduced 2~MeV threshold on the recoil electron energy is found to be achievable assuming the intrinsic radioactive background $^{238}$U and $^{232}$Th in the liquid scintillator can be controlled to 10$^{-17}$~g/g. With ten years of data taking, about 60,000 signal and 30,000 background events are expected. This large sample will enable an examination of the distortion of the recoil electron spectrum that is dominated by the neutrino flavor transformation in the dense solar matter, which will shed new light on the tension between the measured electron spectra and the predictions of the standard three-flavor neutrino oscillation framework. If $Δm^{2}_{21}=4.8\times10^{-5}~(7.5\times10^{-5})$~eV$^{2}$, JUNO can provide evidence of neutrino oscillation in the Earth at the about 3$σ$~(2$σ$) level by measuring the non-zero signal rate variation with respect to the solar zenith angle. Moveover, JUNO can simultaneously measure $Δm^2_{21}$ using $^8$B solar neutrinos to a precision of 20\% or better depending on the central value and to sub-percent precision using reactor antineutrinos. A comparison of these two measurements from the same detector will help elucidate the current tension between the value of $Δm^2_{21}$ reported by solar neutrino experiments and the KamLAND experiment.

preprint2020arXiv

HD 165054: an astrometric calibration field for high-contrast imagers in Baade&#39;s Window

We present a study of the HD 165054 astrometric calibration field that has been periodically observed with the Gemini Planet Imager. HD 165054 is a bright star within Baade&#39;s Window, a region of the galactic plane with relatively low extinction from interstellar dust. HD 165054 was selected as a calibrator target due to the high number density of stars within this region ($\sim 3$ stars per square arcsecond with $H<22$), necessary because of the small field-of-view of the Gemini Planet Imager. Using nine epochs spanning over five years, we have fit a standard five-parameter astrometric model to the astrometry of seven background stars within close proximity to HD 165054 (angular separation $< 2$ arcsec). We achieved a proper motion precision of $\sim 0.3$ mas/yr, and constrained the parallax of each star to be $\lesssim 1$ mas. Our measured proper motions and parallax limits are consistent with the background stars being a part of the galactic bulge. Using these measurements we find no evidence of any systematic trend of either the plate scale or the north angle offset of GPI between 2014 and 2019. We compared our model describing the motions of the seven background stars to observations of the same field in 2014 and 2018 obtained with Keck/NIRC2, an instrument with an excellent astrometric calibration. We find that predicted position of the background sources is consistent with that measured by NIRC2, within the uncertainties of the calibration of the two instruments. In the future, we will use this field as a standard astrometric calibrator for the upgrade of GPI and potentially for other high-contrast imagers.

preprint2020arXiv

Keck/NIRC2 $L$&#39;-Band Imaging of Jovian-Mass Accreting Protoplanets around PDS 70

We present $L$&#39;-band imaging of the PDS 70 planetary system with Keck/NIRC2 using the new infrared pyramid wavefront sensor. We detected both PDS 70 b and c in our images, as well as the front rim of the circumstellar disk. After subtracting off a model of the disk, we measured the astrometry and photometry of both planets. Placing priors based on the dynamics of the system, we estimated PDS 70 b to have a semi-major axis of $20^{+3}_{-4}$~au and PDS 70 c to have a semi-major axis of $34^{+12}_{-6}$~au (95\% credible interval). We fit the spectral energy distribution (SED) of both planets. For PDS 70 b, we were able to place better constraints on the red half of its SED than previous studies and inferred the radius of the photosphere to be 2-3~$R_{Jup}$. The SED of PDS 70 c is less well constrained, with a range of total luminosities spanning an order of magnitude. With our inferred radii and luminosities, we used evolutionary models of accreting protoplanets to derive a mass of PDS 70 b between 2 and 4 $M_{\textrm{Jup}}$ and a mean mass accretion rate between $3 \times 10^{-7}$ and $8 \times 10^{-7}~M_{\textrm{Jup}}/\textrm{yr}$. For PDS 70 c, we computed a mass between 1 and 3 $M_{\textrm{Jup}}$ and mean mass accretion rate between $1 \times 10^{-7}$ and $5 \times~10^{-7} M_{\textrm{Jup}}/\textrm{yr}$. The mass accretion rates imply dust accretion timescales short enough to hide strong molecular absorption features in both planets&#39; SEDs.

preprint2020arXiv

Multiband GPI Imaging of the HR 4796A Debris Disk

We have obtained Gemini Planet Imager (GPI) J-, H-, K1-, and K2-Spec observations of the iconic debris ring around the young, main-sequence star HR 4796A. We applied several point-spread function (PSF) subtraction techniques to the observations (Mask-and-Interpolate, RDI-NMF, RDI-KLIP, and ADI-KLIP) to measure the geometric parameters and the scattering phase function for the disk. To understand the systematic errors associated with PSF subtraction, we also forward-modeled the observations using a Markov Chain Monte Carlo framework and a simple model for the disk. We found that measurements of the disk geometric parameters were robust, with all of our analyses yielding consistent results; however, measurements of the scattering phase function were challenging to reconstruct from PSF-subtracted images, despite extensive testing. As a result, we estimated the scattering phase function using disk modeling. We searched for a dependence of the scattering phase function with respect to the GPI filters but found none. We compared the H-band scattering phase function with that measured by Hubble Space Telescope STIS at visual wavelengths and discovered a blue color at small scattering angles and a red color at large scattering angles, consistent with predictions and laboratory measurements of large grains. Finally, we successfully modeled the SPHERE H2 HR 4796A scattered phase function using a distribution of hollow spheres composed of silicates, carbon, and metallic iron.

preprint2020arXiv

PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning

With the emergence of a spectrum of high-end mobile devices, many applications that formerly required desktop-level computation capability are being transferred to these devices. However, executing the inference of Deep Neural Networks (DNNs) is still challenging considering high computation and storage demands, specifically, if real-time performance with high accuracy is needed. Weight pruning of DNNs is proposed, but existing schemes represent two extremes in the design space: non-structured pruning is fine-grained, accurate, but not hardware friendly; structured pruning is coarse-grained, hardware-efficient, but with higher accuracy loss. In this paper, we introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space. With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency. In other words, our method achieves the best of both worlds, and is desirable across theory/algorithm, compiler, and hardware levels. The proposed PatDNN is an end-to-end framework to efficiently execute DNN on mobile devices with the help of a novel model compression technique (pattern-based pruning based on extended ADMM solution framework) and a set of thorough architecture-aware compiler- and code generation-based optimizations (filter kernel reordering, compressed weight storage, register load redundancy elimination, and parameter auto-tuning). Evaluation results demonstrate that PatDNN outperforms three state-of-the-art end-to-end DNN frameworks, TensorFlow Lite, TVM, and Alibaba Mobile Neural Network with speedup up to 44.5x, 11.4x, and 7.1x, respectively, with no accuracy compromise. Real-time inference of representative large-scale DNNs (e.g., VGG-16, ResNet-50) can be achieved using mobile devices.

preprint2020arXiv

PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices

Model compression techniques on Deep Neural Network (DNN) have been widely acknowledged as an effective way to achieve acceleration on a variety of platforms, and DNN weight pruning is a straightforward and effective method. There are currently two mainstreams of pruning methods representing two extremes of pruning regularity: non-structured, fine-grained pruning can achieve high sparsity and accuracy, but is not hardware friendly; structured, coarse-grained pruning exploits hardware-efficient structures in pruning, but suffers from accuracy drop when the pruning rate is high. In this paper, we introduce PCONV, comprising a new sparsity dimension, -- fine-grained pruning patterns inside the coarse-grained structures. PCONV comprises two types of sparsities, Sparse Convolution Patterns (SCP) which is generated from intra-convolution kernel pruning and connectivity sparsity generated from inter-convolution kernel pruning. Essentially, SCP enhances accuracy due to its special vision properties, and connectivity sparsity increases pruning rate while maintaining balanced workload on filter computation. To deploy PCONV, we develop a novel compiler-assisted DNN inference framework and execute PCONV models in real-time without accuracy compromise, which cannot be achieved in prior work. Our experimental results show that, PCONV outperforms three state-of-art end-to-end DNN frameworks, TensorFlow-Lite, TVM, and Alibaba Mobile Neural Network with speedup up to 39.2x, 11.4x, and 6.3x, respectively, with no accuracy loss. Mobile devices can achieve real-time inference on large-scale DNNs.

preprint2020arXiv

Petascale XCT: 3D Image Reconstruction with Hierarchical Communications on Multi-GPU Nodes

X-ray computed tomography is a commonly used technique for noninvasive imaging at synchrotron facilities. Iterative tomographic reconstruction algorithms are often preferred for recovering high quality 3D volumetric images from 2D X-ray images, however, their use has been limited to small/medium datasets due to their computational requirements. In this paper, we propose a high-performance iterative reconstruction system for terabyte(s)-scale 3D volumes. Our design involves three novel optimizations: (1) optimization of (back)projection operators by extending the 2D memory-centric approach to 3D; (2) performing hierarchical communications by exploiting &#34;fat-node&#34; architecture with many GPUs; (3) utilization of mixed-precision types while preserving convergence rate and quality. We extensively evaluate the proposed optimizations and scaling on the Summit supercomputer. Our largest reconstruction is a mouse brain volume with 9Kx11Kx11K voxels, where the total reconstruction time is under three minutes using 24,576 GPUs, reaching 65 PFLOPS: 34% of Summit&#39;s peak performance.

preprint2020arXiv

RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition

Recurrent neural networks (RNNs) based automatic speech recognition has nowadays become prevalent on mobile devices such as smart phones. However, previous RNN compression techniques either suffer from hardware performance overhead due to irregularity or significant accuracy loss due to the preserved regularity for hardware friendliness. In this work, we propose RTMobile that leverages both a novel block-based pruning approach and compiler optimizations to accelerate RNN inference on mobile devices. Our proposed RTMobile is the first work that can achieve real-time RNN inference on mobile platforms. Experimental results demonstrate that RTMobile can significantly outperform existing RNN hardware acceleration methods in terms of inference accuracy and time. Compared with prior work on FPGA, RTMobile using Adreno 640 embedded GPU on GRU can improve the energy-efficiency by about 40$\times$ while maintaining the same inference time.

preprint2020arXiv

TAO Conceptual Design Report: A Precision Measurement of the Reactor Antineutrino Spectrum with Sub-percent Energy Resolution

The Taishan Antineutrino Observatory (TAO, also known as JUNO-TAO) is a satellite experiment of the Jiangmen Underground Neutrino Observatory (JUNO). A ton-level liquid scintillator detector will be placed at about 30 m from a core of the Taishan Nuclear Power Plant. The reactor antineutrino spectrum will be measured with sub-percent energy resolution, to provide a reference spectrum for future reactor neutrino experiments, and to provide a benchmark measurement to test nuclear databases. A spherical acrylic vessel containing 2.8 ton gadolinium-doped liquid scintillator will be viewed by 10 m^2 Silicon Photomultipliers (SiPMs) of >50% photon detection efficiency with almost full coverage. The photoelectron yield is about 4500 per MeV, an order higher than any existing large-scale liquid scintillator detectors. The detector operates at -50 degree C to lower the dark noise of SiPMs to an acceptable level. The detector will measure about 2000 reactor antineutrinos per day, and is designed to be well shielded from cosmogenic backgrounds and ambient radioactivities to have about 10% background-to-signal ratio. The experiment is expected to start operation in 2022.

preprint2020arXiv

The Gemini Planet Imager view of the HD 32297 debris disk

We present new $H$-band scattered light images of the HD 32297 edge-on debris disk obtained with the Gemini Planet Imager (GPI). The disk is detected in total and polarized intensity down to a projected angular separation of 0.15&#34;, or 20au. On the other hand, the large scale swept-back halo remains undetected, likely a consequence of its markedly blue color relative to the parent body belt. We analyze the curvature of the disk spine and estimate a radius of $\approx$100au for the parent body belt, smaller than past scattered light studies but consistent with thermal emission maps of the system. We employ three different flux-preserving post-processing methods to suppress the residual starlight and evaluate the surface brightness and polarization profile along the disk spine. Unlike past studies of the system, our high fidelity images reveal the disk to be highly symmetric and devoid of morphological and surface brightness perturbations. We find the dust scattering properties of the system to be consistent with those observed in other debris disks, with the exception of HR 4796. Finally, we find no direct evidence for the presence of a planetary-mass object in the system.

preprint2020arXiv

Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization

High-end mobile platforms rapidly serve as primary computing devices for a wide range of Deep Neural Network (DNN) applications. However, the constrained computation and storage resources on these devices still pose significant challenges for real-time DNN inference executions. To address this problem, we propose a set of hardware-friendly structured model pruning and compiler optimization techniques to accelerate DNN executions on mobile devices. This demo shows that these optimizations can enable real-time mobile execution of multiple DNN applications, including style transfer, DNN coloring and super resolution.

preprint2020arXiv

Using Data Imputation for Signal Separation in High Contrast Imaging

To characterize circumstellar systems in high contrast imaging, the fundamental step is to construct a best point spread function (PSF) template for the non-circumstellar signals (i.e., star light and speckles) and separate it from the observation. With existing PSF construction methods, the circumstellar signals (e.g., planets, circumstellar disks) are unavoidably altered by over-fitting and/or self-subtraction, making forward modeling a necessity to recover these signals. We present a forward modeling--free solution to these problems with data imputation using sequential non-negative matrix factorization (DI-sNMF). DI-sNMF first converts this signal separation problem to a &#34;missing data&#34; problem in statistics by flagging the regions which host circumstellar signals as missing data, then attributes PSF signals to these regions. We mathematically prove it to have negligible alteration to circumstellar signals when the imputation region is relatively small, which thus enables precise measurement for these circumstellar objects. We apply it to simulated point source and circumstellar disk observations to demonstrate its proper recovery of them. We apply it to Gemini Planet Imager (GPI) K1-band observations of the debris disk surrounding HR 4796A, finding a tentative trend that the dust is more forward scattering as the wavelength increases. We expect DI-sNMF to be applicable to other general scenarios where the separation of signals is needed.

preprint2020arXiv

YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design

The rapid development and wide utilization of object detection techniques have aroused attention on both accuracy and speed of object detectors. However, the current state-of-the-art object detection works are either accuracy-oriented using a large model but leading to high latency or speed-oriented using a lightweight model but sacrificing accuracy. In this work, we propose YOLObile framework, a real-time object detection on mobile devices via compression-compilation co-design. A novel block-punched pruning scheme is proposed for any kernel size. To improve computational efficiency on mobile devices, a GPU-CPU collaborative scheme is adopted along with advanced compiler-assisted optimizations. Experimental results indicate that our pruning scheme achieves 14$\times$ compression rate of YOLOv4 with 49.0 mAP. Under our YOLObile framework, we achieve 17 FPS inference speed using GPU on Samsung Galaxy S20. By incorporating our proposed GPU-CPU collaborative scheme, the inference speed is increased to 19.1 FPS, and outperforms the original YOLOv4 by 5$\times$ speedup. Source code is at: \url{https://github.com/nightsnack/YOLObile}.

preprint2019arXiv

An updated visual orbit of the directly-imaged exoplanet 51 Eridani b and prospects for a dynamical mass measurement with Gaia

We present a revision to the visual orbit of the young, directly-imaged exoplanet 51 Eridani b using four years of observations with the Gemini Planet Imager. The relative astrometry is consistent with an eccentric ($e=0.53_{-0.13}^{+0.09}$) orbit at an intermediate inclination ($i=136_{-11}^{+10}$\,deg), although circular orbits cannot be excluded due to the complex shape of the multidimensional posterior distribution. We find a semi-major axis of $11.1_{-1.3}^{+4.2}$\,au and a period of $28.1_{-4.9}^{+17.2}$\,yr, assuming a mass of 1.75\,M$_{\odot}$ for the host star. We find consistent values with a recent analysis of VLT/SPHERE data covering a similar baseline. We investigated the potential of using absolute astrometry of the host star to obtain a dynamical mass constraint for the planet. The astrometric acceleration of 51~Eri derived from a comparison of the {\it Hipparcos} and {\it Gaia} catalogues was found to be inconsistent at the 2--3$σ$ level with the predicted reflex motion induced by the orbiting planet. Potential sources of this inconsistency include a combination of random and systematic errors between the two astrometric catalogs or the signature of an additional companion within the system interior to current detection limits. We also explored the potential of using {\it Gaia} astrometry alone for a dynamical mass measurement of the planet by simulating {\it Gaia} measurements of the motion of the photocenter of the system over the course of the extended eight-year mission. We find that such a measurement is only possible ($>98$\% probability) given the most optimistic predictions for the {\it Gaia} scan astrometric uncertainties for bright stars, and a high mass for the planet ($\gtrsim3.6$\,M$_{\rm Jup}$).

preprint2019arXiv

Detection of a low-mass stellar companion to the accelerating A2IV star HR 1645

The $\sim500$\, Myr A2IV star HR 1645 has one of the most significant low-amplitude accelerations of nearby early-type stars measured from a comparison of the {\it Hipparcos} and {\it Gaia} astrometric catalogues. This signal is consistent with either a stellar companion with a moderate mass ratio ($q\sim0.5$) on a short period ($P<1$\,yr), or a substellar companion at a separation wide enough to be resolved with ground-based high contrast imaging instruments; long-period equal mass ratio stellar companions that are also consistent with the measured acceleration are excluded with previous imaging observations. The small but significant amplitude of the acceleration made HR 1645 a promising candidate for targeted searches for brown dwarf and planetary-mass companions around nearby, young stars. In this paper we explore the origin of the astrometric acceleration by modelling the signal induced by a wide-orbit M8 companion discovered with the Gemini Planet Imager, as well as the effects of an inner short-period spectroscopic companion discovered a century ago but not since followed-up. We present the first constraints on the orbit of the inner companion, and demonstrate that it is a plausible cause of the astrometric acceleration. This result demonstrates the importance of vetting of targets with measured astrometric acceleration for short-period stellar companions prior to conducting targeted direct imaging surveys for wide-orbit substellar companions.

preprint2019arXiv

First Resolved Scattered-Light Images of Four Debris Disks in Scorpius-Centaurus with the Gemini Planet Imager

We present the first spatially resolved scattered-light images of four debris disks around members of the Scorpius-Centaurus (Sco-Cen) OB Association with high-contrast imaging and polarimetry using the Gemini Planet Imager (GPI). All four disks are resolved for the first time in polarized light and one disk is also detected in total intensity. The three disks imaged around HD 111161, HD 143675, and HD 145560 are symmetric in both morphology and brightness distribution. The three systems span a range of inclinations and radial extents. The disk imaged around HD 98363 shows indications of asymmetries in morphology and brightness distribution, with some structural similarities to the HD 106906 planet-disk system. Uniquely, HD 98363 has a wide co-moving stellar companion Wray 15-788 with a recently resolved disk with very different morphological properties. HD 98363 A/B is the first binary debris disk system with two spatially resolved disks. All four targets have been observed with ALMA, and their continuum fluxes range from one non-detection to one of the brightest disks in the region. With the new results, a total of 15 A/F-stars in Sco-Cen have resolved scattered light debris disks, and approximately half of these systems exhibit some form of asymmetry. Combining the GPI disk structure results with information from the literature on millimeter fluxes and imaged planets reveals a diversity of disk properties in this young population. Overall, the four newly resolved disks contribute to the census of disk structures measured around A/F-stars at this important stage in the development of planetary systems.

preprint2019arXiv

Plasmon-enhanced Stimulated Raman Scattering Microscopy with Single-molecule Detection Sensitivity

Stimulated Raman scattering (SRS) microscopy allows for high-speed label-free chemical imaging of biomedical systems. The imaging sensitivity of SRS microscopy is limited to ~10 mM for endogenous biomolecules. Electronic pre-resonant SRS allows detection of sub-micromolar chromophores. However, label-free SRS detection of single biomolecules having extremely small Raman cross-sections (~10-30 cm2 sr-1) remains unreachable. Here, we demonstrate plasmon-enhanced stimulated Raman scattering (PESRS) microscopy with single-molecule detection sensitivity. Incorporating pico-Joule laser excitation, background subtraction, and a denoising algorithm, we obtained robust single-pixel SRS spectra exhibiting the statistics of single-molecule events. Single-molecule detection was verified by using two isotopologues of adenine. We further demonstrated the capability of applying PESRS for biological applications and utilized PESRS to map adenine released from bacteria due to starvation stress. PESRS microscopy holds the promise for ultrasensitive detection of molecular events in chemical and biomedical systems.