Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
27works
0followers
25topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

27 published item(s)

preprint2026arXiv

PADE: A Predictor-Free Sparse Attention Accelerator via Unified Execution and Stage Fusion

Attention-based models have revolutionized AI, but the quadratic cost of self-attention incurs severe computational and memory overhead. Sparse attention methods alleviate this by skipping low-relevance token pairs. However, current approaches lack practicality due to the heavy expense of added sparsity predictor, which severely drops their hardware efficiency. This paper advances the state-of-the-art (SOTA) by proposing a bit-serial enable stage-fusion (BSF) mechanism, which eliminates the need for a separate predictor. However, it faces key challenges: 1) Inaccurate bit-sliced sparsity speculation leads to incorrect pruning; 2) Hardware under-utilization due to fine-grained and imbalanced bit-level workloads. 3) Tiling difficulty caused by the row-wise dependency in sparsity pruning criteria. We propose PADE, a predictor-free algorithm-hardware co-design for dynamic sparse attention acceleration. PADE features three key innovations: 1) Bit-wise uncertainty interval-enabled guard filtering (BUI-GF) strategy to accurately identify trivial tokens during each bit round; 2) Bidirectional sparsity-based out-of-order execution (BS-OOE) to improve hardware utilization; 3) Interleaving-based sparsity-tiled attention (ISTA) to reduce both I/O and computational complexity. These techniques, combined with custom accelerator designs, enable practical sparsity acceleration without relying on an added sparsity predictor. Extensive experiments on 22 benchmarks show that PADE achieves 7.43x speed up and 31.1x higher energy efficiency than Nvidia H100 GPU. Compared to SOTA accelerators, PADE achieves 5.1x, 4.3x and 3.4x energy saving than Sanger, DOTA and SOFA.

preprint2025arXiv

Max-Entropy Reinforcement Learning with Flow Matching and A Case Study on LQR

Soft actor-critic (SAC) is a popular algorithm for max-entropy reinforcement learning. In practice, the energy-based policies in SAC are often approximated using simple policy classes for efficiency, sacrificing the expressiveness and robustness. In this paper, we propose a variant of the SAC algorithm that parameterizes the policy with flow-based models, leveraging their rich expressiveness. In the algorithm, we evaluate the flow-based policy utilizing the instantaneous change-of-variable technique and update the policy with an online variant of flow matching developed in this paper. This online variant, termed importance sampling flow matching (ISFM), enables policy update with only samples from a user-specified sampling distribution rather than the unknown target distribution. We develop a theoretical analysis of ISFM, characterizing how different choices of sampling distributions affect the learning efficiency. Finally, we conduct a case study of our algorithm on the max-entropy linear quadratic regulator problems, demonstrating that the proposed algorithm learns the optimal action distribution.

preprint2022arXiv

Actively tuning anisotropic light-matter interaction in biaxial hyperbolic material $α$-MoO$_3$ using phase change material VO$_2$ and graphene

Anisotropic hyperbolic phonon polaritons (PhPs) in natural biaxial hyperbolic material MoO$_3$ has opened up new avenues for mid-infrared nanophotonics, while active tunability of $α$-MoO$_3$ PhPs is still an urgent problem needing to be solved.In this study, we present a theoretical demonstration of actively tuning $α$-MoO$_3$ PhPs using phase change material VO$_2$ and graphene. It is observed that $α$-MoO$_3$ PhPs are greatly depending on the propagation plane angle of PhPs. The metal-to-insulator phase transition of VO$_2$ has a significant effect on the hybridization PhPs of the $α$-MoO$_3$/VO$_2$ structure and allows to obtain an actively tunable $α$-MoO$_3$ PhPs, which is especially obvious when the propagation plane angle of PhPs is 90.

preprint2022arXiv

Brief Industry Paper: The Necessity of Adaptive Data Fusion in Infrastructure-Augmented Autonomous Driving System

This paper is the first to provide a thorough system design overview along with the fusion methods selection criteria of a real-world cooperative autonomous driving system, named Infrastructure-Augmented Autonomous Driving or IAAD. We present an in-depth introduction of the IAAD hardware and software on both road-side and vehicle-side computing and communication platforms. We extensively characterize the IAAD system in the context of real-world deployment scenarios and observe that the network condition that fluctuates along the road is currently the main technical roadblock for cooperative autonomous driving. To address this challenge, we propose new fusion methods, dubbed "inter-frame fusion" and "planning fusion" to complement the current state-of-the-art "intra-frame fusion". We demonstrate that each fusion method has its own benefit and constraint.

preprint2022arXiv

Demystifying Arch-hints for Model Extraction: An Attack in Unified Memory System

The deep neural network (DNN) models are deemed confidential due to their unique value in expensive training efforts, privacy-sensitive training data, and proprietary network characteristics. Consequently, the model value raises incentive for adversary to steal the model for profits, such as the representative model extraction attack. Emerging attack can leverage timing-sensitive architecture-level events (i.e., Arch-hints) disclosed in hardware platforms to extract DNN model layer information accurately. In this paper, we take the first step to uncover the root cause of such Arch-hints and summarize the principles to identify them. We then apply these principles to emerging Unified Memory (UM) management system and identify three new Arch-hints caused by UM's unique data movement patterns. We then develop a new extraction attack, UMProbe. We also create the first DNN benchmark suite in UM and utilize the benchmark suite to evaluate UMProbe. Our evaluation shows that UMProbe can extract the layer sequence with an accuracy of 95% for almost all victim test models, which thus calls for more attention to the DNN security in UM system.

preprint2022arXiv

Enabling Efficient Deep Convolutional Neural Network-based Sensor Fusion for Autonomous Driving

Autonomous driving demands accurate perception and safe decision-making. To achieve this, automated vehicles are now equipped with multiple sensors (e.g., camera, Lidar, etc.), enabling them to exploit complementary environmental context by fusing data from different sensing modalities. With the success of Deep Convolutional Neural Network(DCNN), the fusion between DCNNs has been proved as a promising strategy to achieve satisfactory perception accuracy. However, mainstream existing DCNN fusion schemes conduct fusion by directly element-wisely adding feature maps extracted from different modalities together at various stages, failing to consider whether the features being fused are matched or not. Therefore, we first propose a feature disparity metric to quantitatively measure the degree of feature disparity between the feature maps being fused. We then propose Fusion-filter as a feature-matching techniques to tackle the feature-mismatching issue. We also propose a Layer-sharing technique in the deep layer that can achieve better accuracy with less computational overhead. Together with the help of the feature disparity to be an additional loss, our proposed technologies enable DCNN to learn corresponding feature maps with similar characteristics and complementary visual context from different modalities to achieve better accuracy. Experimental results demonstrate that our proposed fusion technique can achieve better accuracy on KITTI dataset with less computational resources demand.

preprint2022arXiv

Metastable complex vector bundles over complex projective spaces

We apply Weiss calculus to compute the number of topological complex vector bundles of rank $n-2$ with vanishing Chern classes over $\mathbb{C}P^n$ for $n>3$, as given by the list $1, 1, 12, 2, 1, 3, 2, 2, 3, 1, 4, 6, 1, 1, 6, 2, 1, 3, 4, 2, 3, 1, 2, 6$, where the $i$-th entry in this list is the number of such bundles whenever $n$ is congruent to $i$ modulo $24$, starting with $i = 0$. Similarly, the number of rank $n-1$ bundles with vanishing Chern classes over $\mathbb{C}P^n$ for $n>2$ is $2$ when $n$ is odd and $1$ when $n$ is even.

preprint2022arXiv

Negative differential thermal conductance between Weyl semimetals nanoparticles through vacuum

In this work, the near-field radiative heat transfer (NFRHT) between two Weyl semimetal (WSM) nanoparticles (NPs) is investigated. The numerical results show that negative differential thermal conductance (NDTC) effect can be obtained in this system, i.e., when the temperature of the emitter is fixed, the heat flux does not decrease monotonically with the increase of the temperature of the receiver. Specifically, when the temperature of the emitter is 300 K, the heat flux is identical when the temperature of the receiver is 50 K or 280 K. The NDTC effect is attributed to the fact that the permittivity of the WSMs changes with the temperature. The coupling effects of polarizability of two WSM NPs have been further identified at different temperature to reveal the physical mechanism of the NDTC effect. In addition, the NFRHT between two Weyl WSM NPs can be greatly enhanced by exciting the localized plasmon and circular modes. This work indicates that the WSMs maybe promising candidate materials for manipulating NFRHT.

preprint2022arXiv

NTIRE 2022 Challenge on High Dynamic Range Imaging: Methods and Results

This paper reviews the challenge on constrained high dynamic range (HDR) imaging that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2022. This manuscript focuses on the competition set-up, datasets, the proposed methods and their results. The challenge aims at estimating an HDR image from multiple respective low dynamic range (LDR) observations, which might suffer from under- or over-exposed regions and different sources of noise. The challenge is composed of two tracks with an emphasis on fidelity and complexity constraints: In Track 1, participants are asked to optimize objective fidelity scores while imposing a low-complexity constraint (i.e. solutions can not exceed a given number of operations). In Track 2, participants are asked to minimize the complexity of their solutions while imposing a constraint on fidelity scores (i.e. solutions are required to obtain a higher fidelity score than the prescribed baseline). Both tracks use the same data and metrics: Fidelity is measured by means of PSNR with respect to a ground-truth HDR image (computed both directly and with a canonical tonemapping operation), while complexity metrics include the number of Multiply-Accumulate (MAC) operations and runtime (in seconds).

preprint2022arXiv

On the Sample Complexity of Stabilizing LTI Systems on a Single Trajectory

Stabilizing an unknown dynamical system is one of the central problems in control theory. In this paper, we study the sample complexity of the learn-to-stabilize problem in Linear Time-Invariant (LTI) systems on a single trajectory. Current state-of-the-art approaches require a sample complexity linear in $n$, the state dimension, which incurs a state norm that blows up exponentially in $n$. We propose a novel algorithm based on spectral decomposition that only needs to learn "a small part" of the dynamical matrix acting on its unstable subspace. We show that, under proper assumptions, our algorithm stabilizes an LTI system on a single trajectory with $\tilde{O}(k)$ samples, where $k$ is the instability index of the system. This represents the first sub-linear sample complexity result for the stabilization of LTI systems under the regime when $k = o(n)$.

preprint2022arXiv

RFMask: A Simple Baseline for Human Silhouette Segmentation with Radio Signals

Human silhouette segmentation, which is originally defined in computer vision, has achieved promising results for understanding human activities. However, the physical limitation makes existing systems based on optical cameras suffer from severe performance degradation under low illumination, smoke, and/or opaque obstruction conditions. To overcome such limitations, in this paper, we propose to utilize the radio signals, which can traverse obstacles and are unaffected by the lighting conditions to achieve silhouette segmentation. The proposed RFMask framework is composed of three modules. It first transforms RF signals captured by millimeter wave radar on two planes into spatial domain and suppress interference with the signal processing module. Then, it locates human reflections on RF frames and extract features from surrounding signals with human detection module. Finally, the extracted features from RF frames are aggregated with an attention based mask generation module. To verify our proposed framework, we collect a dataset containing 804,760 radio frames and 402,380 camera frames with human activities under various scenes. Experimental results show that the proposed framework can achieve impressive human silhouette segmentation even under the challenging scenarios(such as low light and occlusion scenarios) where traditional optical-camera-based methods fail. To the best of our knowledge, this is the first investigation towards segmenting human silhouette based on millimeter wave signals. We hope that our work can serve as a baseline and inspire further research that perform vision tasks with radio signals. The dataset and codes will be made in public.

preprint2022arXiv

SapientML: Synthesizing Machine Learning Pipelines by Learning from Human-Written Solutions

Automatic machine learning, or AutoML, holds the promise of truly democratizing the use of machine learning (ML), by substantially automating the work of data scientists. However, the huge combinatorial search space of candidate pipelines means that current AutoML techniques, generate sub-optimal pipelines, or none at all, especially on large, complex datasets. In this work we propose an AutoML technique SapientML, that can learn from a corpus of existing datasets and their human-written pipelines, and efficiently generate a high-quality pipeline for a predictive task on a new dataset. To combat the search space explosion of AutoML, SapientML employs a novel divide-and-conquer strategy realized as a three-stage program synthesis approach, that reasons on successively smaller search spaces. The first stage uses a machine-learned model to predict a set of plausible ML components to constitute a pipeline. In the second stage, this is then refined into a small pool of viable concrete pipelines using syntactic constraints derived from the corpus and the machine-learned model. Dynamically evaluating these few pipelines, in the third stage, provides the best solution. We instantiate SapientML as part of a fully automated tool-chain that creates a cleaned, labeled learning corpus by mining Kaggle, learns from it, and uses the learned models to then synthesize pipelines for new predictive tasks. We have created a training corpus of 1094 pipelines spanning 170 datasets, and evaluated SapientML on a set of 41 benchmark datasets, including 10 new, large, real-world datasets from Kaggle, and against 3 state-of-the-art AutoML tools and 2 baselines. Our evaluation shows that SapientML produces the best or comparable accuracy on 27 of the benchmarks while the second best tool fails to even produce a pipeline on 9 of the instances.

preprint2022arXiv

Towards a High-performance and Secure Memory System and Architecture for Emerging Applications

In this dissertation, we propose a memory and computing coordinated methodology to thoroughly exploit the characteristics and capabilities of the GPU-based heterogeneous system to effectively optimize applications' performance and privacy. Specifically, 1) we propose a task-aware and dynamic memory management mechanism to co-optimize applications' latency and memory footprint, especially in multitasking scenarios. 2) We propose a novel latency-aware memory management framework that analyzes the application characteristics and hardware features to reduce applications' initialization latency and response time. 3) We develop a new model extraction attack that explores the vulnerability of the GPU unified memory system to accurately steal private DNN models. 4) We propose a CPU/GPU Co-Encryption mechanism that can defend against a timing-correlation attack in an integrated CPU/GPU platform to provide a secure execution environment for the edge applications. This dissertation aims at developing a high-performance and secure memory system and architecture in GPU heterogeneous platforms to deploy emerging AI-enabled applications efficiently and safely.

preprint2022arXiv

Towards Efficient Architecture and Algorithms for Sensor Fusion

The safety of an automated vehicle hinges crucially upon the accuracy of perception and decision-making latency. Under these stringent requirements, future automated cars are usually equipped with multi-modal sensors such as cameras and LiDARs. The sensor fusion is adopted to provide a confident context of driving scenarios for better decision-making. A promising sensor fusion technique is middle fusion that combines the feature representations from intermediate layers that belong to different sensing modalities. However, achieving both the accuracy and latency efficiency is challenging for middle fusion, which is critical for driving automation applications. We present A3Fusion, a software-hardware system specialized for an adaptive, agile, and aligned fusion in driving automation. A3Fusion achieves a high efficiency for the middle fusion of multiple CNN-based modalities by proposing an adaptive multi-modal learning network architecture and a latency-aware, agile network architecture optimization algorithm that enhances semantic segmentation accuracy while taking the inference latency as a key trade-off. In addition, A3Fusion proposes a FPGA-based accelerator that captures unique data flow patterns of our middle fusion algorithm while reducing the overall compute overheads. We enable these contributions by co-designing the neural network, algorithm, and the accelerator architecture.

preprint2022arXiv

Two-Dimensional Weisfeiler-Lehman Graph Neural Networks for Link Prediction

Link prediction is one important application of graph neural networks (GNNs). Most existing GNNs for link prediction are based on one-dimensional Weisfeiler-Lehman (1-WL) test. 1-WL-GNNs first compute node representations by iteratively passing neighboring node features to the center, and then obtain link representations by aggregating the pairwise node representations. As pointed out by previous works, this two-step procedure results in low discriminating power, as 1-WL-GNNs by nature learn node-level representations instead of link-level. In this paper, we study a completely different approach which can directly obtain node pair (link) representations based on \textit{two-dimensional Weisfeiler-Lehman (2-WL) tests}. 2-WL tests directly use links (2-tuples) as message passing units instead of nodes, and thus can directly obtain link representations. We theoretically analyze the expressive power of 2-WL tests to discriminate non-isomorphic links, and prove their superior link discriminating power than 1-WL. Based on different 2-WL variants, we propose a series of novel 2-WL-GNN models for link prediction. Experiments on a wide range of real-world datasets demonstrate their competitive performance to state-of-the-art baselines and superiority over plain 1-WL-GNNs.

preprint2021arXiv

Cepstral Scanning Transmission Electron Microscopy Imaging of Severe Lattice Distortions

The development of four-dimensional (4D) scanning transmission electron microscopy (STEM) using fast detectors has opened-up new avenues for addressing some of long-standing challenges in electron imaging. One of these challenges is how to image severely distorted crystal lattices, such as at a dislocation core. Here we introduce a new 4D-STEM technique, called Cepstral STEM, for imaging disordered crystals using electron diffuse scattering. Local fluctuations of diffuse scattering are captured by scanning electron nanodiffraction (SEND) using a coherent probe. The harmonic signals in electron diffuse scattering are detected through Cepstral analysis and used for imaging. By integrating Cepstral analysis with 4D-STEM, we demonstrate that information about the distortive part of electron scattering potential can be separated and imaged at nm spatial resolution. We apply our technique to the analysis of a dislocation core in SiGe and lattice distortions in high entropy alloy.

preprint2021arXiv

Graph-based Visual-Semantic Entanglement Network for Zero-shot Image Recognition

Zero-shot learning uses semantic attributes to connect the search space of unseen objects. In recent years, although the deep convolutional network brings powerful visual modeling capabilities to the ZSL task, its visual features have severe pattern inertia and lack of representation of semantic relationships, which leads to severe bias and ambiguity. In response to this, we propose the Graph-based Visual-Semantic Entanglement Network to conduct graph modeling of visual features, which is mapped to semantic attributes by using a knowledge graph, it contains several novel designs: 1. it establishes a multi-path entangled network with the convolutional neural network (CNN) and the graph convolutional network (GCN), which input the visual features from CNN to GCN to model the implicit semantic relations, then GCN feedback the graph modeled information to CNN features; 2. it uses attribute word vectors as the target for the graph semantic modeling of GCN, which forms a self-consistent regression for graph modeling and supervise GCN to learn more personalized attribute relations; 3. it fuses and supplements the hierarchical visual-semantic features refined by graph modeling into visual embedding. Our method outperforms state-of-the-art approaches on multiple representative ZSL datasets: AwA2, CUB, and SUN by promoting the semantic linkage modelling of visual features.

preprint2021arXiv

Inner-Imaging Networks: Put Lenses into Convolutional Structure

Despite the tremendous success in computer vision, deep convolutional networks suffer from serious computation costs and redundancies. Although previous works address this issue by enhancing diversities of filters, they have not considered the complementarity and the completeness of the internal structure of the convolutional network. To deal with these problems, a novel Inner-Imaging architecture is proposed in this paper, which allows relationships between channels to meet the above requirement. Specifically, we organize the channel signal points in groups using convolutional kernels to model both the intra-group and inter-group relationships simultaneously. The convolutional filter is a powerful tool for modeling spatial relations and organizing grouped signals, so the proposed methods map the channel signals onto a pseudo-image, like putting a lens into convolution internal structure. Consequently, not only the diversity of channels is increased, but also the complementarity and completeness can be explicitly enhanced. The proposed architecture is lightweight and easy to be implemented. It provides an efficient self-organization strategy for convolutional networks so as to improve their efficiency and performance. Extensive experiments are conducted on multiple benchmark image recognition data sets including CIFAR, SVHN and ImageNet. Experimental results verify the effectiveness of the Inner-Imaging mechanism with the most popular convolutional networks as the backbones.

preprint2021arXiv

Q-VR: System-Level Design for Future Mobile Collaborative Virtual Reality

High Quality Mobile Virtual Reality (VR) is what the incoming graphics technology era demands: users around the world, regardless of their hardware and network conditions, can all enjoy the immersive virtual experience. However, the state-of-the-art software-based mobile VR designs cannot fully satisfy the realtime performance requirements due to the highly interactive nature of user's actions and complex environmental constraints during VR execution. Inspired by the unique human visual system effects and the strong correlation between VR motion features and realtime hardware-level information, we propose Q-VR, a novel dynamic collaborative rendering solution via software-hardware co-design for enabling future low-latency high-quality mobile VR. At software-level, Q-VR provides flexible high-level tuning interface to reduce network latency while maintaining user perception. At hardware-level, Q-VR accommodates a wide spectrum of hardware and network conditions across users by effectively leveraging the computing capability of the increasingly powerful VR hardware. Extensive evaluation on real-world games demonstrates that Q-VR can achieve an average end-to-end performance speedup of 3.4x (up to 6.7x) over the traditional local rendering design in commercial VR devices, and a 4.1x frame rate improvement over the state-of-the-art static collaborative rendering.

preprint2020arXiv

Co-Optimizing Performance and Memory FootprintVia Integrated CPU/GPU Memory Management, anImplementation on Autonomous Driving Platform

Cutting-edge embedded system applications, such as self-driving cars and unmanned drone software, are reliant on integrated CPU/GPU platforms for their DNNs-driven workload, such as perception and other highly parallel components. In this work, we set out to explore the hidden performance implication of GPU memory management methods of integrated CPU/GPU architecture. Through a series of experiments on micro-benchmarks and real-world workloads, we find that the performance under different memory management methods may vary according to application characteristics. Based on this observation, we develop a performance model that can predict system overhead for each memory management method based on application characteristics. Guided by the performance model, we further propose a runtime scheduler. By conducting per-task memory management policy switching and kernel overlapping, the scheduler can significantly relieve the system memory pressure and reduce the multitasking co-run response time. We have implemented and extensively evaluated our system prototype on the NVIDIA Jetson TX2, Drive PX2, and Xavier AGX platforms, using both Rodinia benchmark suite and two real-world case studies of drone software and autonomous driving software.

preprint2020arXiv

Data Augmentation Imbalance For Imbalanced Attribute Classification

Pedestrian attribute recognition is an important multi-label classification problem. Although the convolutional neural networks are prominent in learning discriminative features from images, the data imbalance in multi-label setting for fine-grained tasks remains an open problem. In this paper, we propose a new re-sampling algorithm called: data augmentation imbalance (DAI) to explicitly enhance the ability to discriminate the fewer attributes via increasing the proportion of labels accounting for a small part. Fundamentally, by applying over-sampling and under-sampling on the multi-label dataset at the same time, the thought of robbing the rich attributes and helping the poor makes a significant contribution to DAI. Extensive empirical evidence shows that our DAI algorithm achieves state-of-the-art results, based on pedestrian attribute datasets, i.e. standard PA-100K and PETA datasets.

preprint2020arXiv

Disconnection-mediated twin embryo growth in Mg

While deformation twinning in hexagonal close-packed metals has been widely studied due to its substantial impact on mechanical properties, an understanding of the detailed atomic processes associated with twin embryo growth is still lacking. Conducting molecular dynamics simulations on Mg, we show that the propagation of twinning disconnections emitted by basal-prismatic interfaces controls the twin boundary motion and is the rate-limiting mechanism during the initial growth of the twin embryo. The time needed for disconnection propagation is related to the distance between the twin tips, with widely spaced twin tips requiring more time for a unit twin boundary migration event to be completed. Thus, a phenomenological model, which unifies the two processes of disconnection and twin tip propagation, is proposed here to provide a quantitative analysis of twin embryo growth. The model fits the simulation data well, with two key parameters (twin tip velocity and twinning disconnection velocity) being extracted. In addition, a linear relationship between the ratio of twinning disconnection velocity to twin tip velocity and the applied shear stress is observed. Using an example of twin growth in a nanoscale single crystal from the recent literature, we find that our molecular dynamics simulations and analytical model are in good agreement with experimental data.

preprint2020arXiv

Multiple Attentional Pyramid Networks for Chinese Herbal Recognition

Chinese herbs play a critical role in Traditional Chinese Medicine. Due to different recognition granularity, they can be recognized accurately only by professionals with much experience. It is expected that they can be recognized automatically using new techniques like machine learning. However, there is no Chinese herbal image dataset available. Simultaneously, there is no machine learning method which can deal with Chinese herbal image recognition well. Therefore, this paper begins with building a new standard Chinese-Herbs dataset. Subsequently, a new Attentional Pyramid Networks (APN) for Chinese herbal recognition is proposed, where both novel competitive attention and spatial collaborative attention are proposed and then applied. APN can adaptively model Chinese herbal images with different feature scales. Finally, a new framework for Chinese herbal recognition is proposed as a new application of APN. Experiments are conducted on our constructed dataset and validate the effectiveness of our methods.

preprint2020arXiv

Neural Architecture Search For Fault Diagnosis

Data-driven methods have made great progress in fault diagnosis, especially deep learning method. Deep learning is suitable for processing big data, and has a strong feature extraction ability to realize end-to-end fault diagnosis systems. However, designing neural network architecture requires rich professional knowledge and debugging experience, and a lot of experiments are needed to screen models and hyperparameters, increasing the difficulty of developing deep learning models. Frortunately, neural architecture search (NAS) is developing rapidly, and is becoming one of the next directions for deep learning. In this paper, we proposed a NAS method for fault diagnosis using reinforcement learning. A recurrent neural network is used as an agent to generate network architecture. The accuracy of the generated network on the validation dataset is fed back to the agent as a reward, and the parameters of the agent are updated through the strategy gradient algorithm. We use PHM 2009 Data Challenge gearbox dataset to prove the effectiveness of proposed method, and obtain state-of-the-art results compared with other artificial designed network structures. To author's best knowledge, it's the first time that NAS has been applied in fault diagnosis.

preprint2020arXiv

Some theoretical results on the second-order conservative phase field equation

In this paper, a theoretical research on the second-order conservative phase field (SOCPF) equation is presented. The theoretical results include the following three aspects. First, three new derivation methods for the SOCPF equation are given. The SOCPF equation can be viewed as the gradient flow, the special diffusion equation and the diffuse interface form of a sharp interface formulation for the piecewise constant function, respectively. These derivation methods help us to understand the SOCPF equation at different perspectives. Second, the conservation's properties of the solution of SOCPF equation are studied. Compared with the Cahn-Hilliard equation and the Allen-Cahn equation, it is found that the solution of SOCPF equation satisfies more conservation laws. Third, the wetting boundary condition for the SOCPF equation is investigated. We find that the no-flux boundary condition is equivalent to the wetting boundary condition for two-component phase field model. Moreover, applying the no-flux boundary conditions for $N$-component phase field model, we give a set of wetting boundary conditions for $N$ phase field parameters.

preprint2019arXiv

Automatic construction of Chinese herbal prescription from tongue image via CNNs and auxiliary latent therapy topics

The tongue image provides important physical information of humans. It is of great importance for diagnoses and treatments in clinical medicine. Herbal prescriptions are simple, noninvasive and have low side effects. Thus, they are widely applied in China. Studies on the automatic construction technology of herbal prescriptions based on tongue images have great significance for deep learning to explore the relevance of tongue images for herbal prescriptions, it can be applied to healthcare services in mobile medical systems. In order to adapt to the tongue image in a variety of photographic environments and construct herbal prescriptions, a neural network framework for prescription construction is designed. It includes single/double convolution channels and fully connected layers. Furthermore, it proposes the auxiliary therapy topic loss mechanism to model the therapy of Chinese doctors and alleviate the interference of sparse output labels on the diversity of results. The experiment use the real world tongue images and the corresponding prescriptions and the results can generate prescriptions that are close to the real samples, which verifies the feasibility of the proposed method for the automatic construction of herbal prescriptions from tongue images. Also, it provides a reference for automatic herbal prescription construction from more physical information.

preprint2018arXiv

Non-Thermal Cosmic Rays During Big Bang Nucleosynthesis to Solve the Lithium Problem

The discrepancy between the theoretical prediction of primordial lithium abundances and astronomical observations is called the Lithium Problem. We find that extra contributions from non-thermal hydrogen and helium during Big Bang nucleosynthesis can explain the discrepancy, for both Li-7 and Li-6, and will change the deuterium abundance only little. The allowed parameter space of such an amount of non-thermal particles and the energy range is shown. The hypothesis is stable regardless of the cross-section uncertainty of relevant reactions and the explicit shape of the energy spectrum.