Source author record

Yang Hu

Yang Hu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

34works

30topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

PADE: A Predictor-Free Sparse Attention Accelerator via Unified Execution and Stage Fusion

Attention-based models have revolutionized AI, but the quadratic cost of self-attention incurs severe computational and memory overhead. Sparse attention methods alleviate this by skipping low-relevance token pairs. However, current approaches lack practicality due to the heavy expense of added sparsity predictor, which severely drops their hardware efficiency. This paper advances the state-of-the-art (SOTA) by proposing a bit-serial enable stage-fusion (BSF) mechanism, which eliminates the need for a separate predictor. However, it faces key challenges: 1) Inaccurate bit-sliced sparsity speculation leads to incorrect pruning; 2) Hardware under-utilization due to fine-grained and imbalanced bit-level workloads. 3) Tiling difficulty caused by the row-wise dependency in sparsity pruning criteria. We propose PADE, a predictor-free algorithm-hardware co-design for dynamic sparse attention acceleration. PADE features three key innovations: 1) Bit-wise uncertainty interval-enabled guard filtering (BUI-GF) strategy to accurately identify trivial tokens during each bit round; 2) Bidirectional sparsity-based out-of-order execution (BS-OOE) to improve hardware utilization; 3) Interleaving-based sparsity-tiled attention (ISTA) to reduce both I/O and computational complexity. These techniques, combined with custom accelerator designs, enable practical sparsity acceleration without relying on an added sparsity predictor. Extensive experiments on 22 benchmarks show that PADE achieves 7.43x speed up and 31.1x higher energy efficiency than Nvidia H100 GPU. Compared to SOTA accelerators, PADE achieves 5.1x, 4.3x and 3.4x energy saving than Sanger, DOTA and SOFA.

preprint2025arXiv

Max-Entropy Reinforcement Learning with Flow Matching and A Case Study on LQR

Soft actor-critic (SAC) is a popular algorithm for max-entropy reinforcement learning. In practice, the energy-based policies in SAC are often approximated using simple policy classes for efficiency, sacrificing the expressiveness and robustness. In this paper, we propose a variant of the SAC algorithm that parameterizes the policy with flow-based models, leveraging their rich expressiveness. In the algorithm, we evaluate the flow-based policy utilizing the instantaneous change-of-variable technique and update the policy with an online variant of flow matching developed in this paper. This online variant, termed importance sampling flow matching (ISFM), enables policy update with only samples from a user-specified sampling distribution rather than the unknown target distribution. We develop a theoretical analysis of ISFM, characterizing how different choices of sampling distributions affect the learning efficiency. Finally, we conduct a case study of our algorithm on the max-entropy linear quadratic regulator problems, demonstrating that the proposed algorithm learns the optimal action distribution.

preprint2022arXiv

Actively tuning anisotropic light-matter interaction in biaxial hyperbolic material $α$-MoO$_3$ using phase change material VO$_2$ and graphene

Anisotropic hyperbolic phonon polaritons (PhPs) in natural biaxial hyperbolic material MoO$_3$ has opened up new avenues for mid-infrared nanophotonics, while active tunability of $α$-MoO$_3$ PhPs is still an urgent problem needing to be solved.In this study, we present a theoretical demonstration of actively tuning $α$-MoO$_3$ PhPs using phase change material VO$_2$ and graphene. It is observed that $α$-MoO$_3$ PhPs are greatly depending on the propagation plane angle of PhPs. The metal-to-insulator phase transition of VO$_2$ has a significant effect on the hybridization PhPs of the $α$-MoO$_3$/VO$_2$ structure and allows to obtain an actively tunable $α$-MoO$_3$ PhPs, which is especially obvious when the propagation plane angle of PhPs is 90.

preprint2022arXiv

Brief Industry Paper: The Necessity of Adaptive Data Fusion in Infrastructure-Augmented Autonomous Driving System

This paper is the first to provide a thorough system design overview along with the fusion methods selection criteria of a real-world cooperative autonomous driving system, named Infrastructure-Augmented Autonomous Driving or IAAD. We present an in-depth introduction of the IAAD hardware and software on both road-side and vehicle-side computing and communication platforms. We extensively characterize the IAAD system in the context of real-world deployment scenarios and observe that the network condition that fluctuates along the road is currently the main technical roadblock for cooperative autonomous driving. To address this challenge, we propose new fusion methods, dubbed "inter-frame fusion" and "planning fusion" to complement the current state-of-the-art "intra-frame fusion". We demonstrate that each fusion method has its own benefit and constraint.

preprint2022arXiv

Demystifying Arch-hints for Model Extraction: An Attack in Unified Memory System

The deep neural network (DNN) models are deemed confidential due to their unique value in expensive training efforts, privacy-sensitive training data, and proprietary network characteristics. Consequently, the model value raises incentive for adversary to steal the model for profits, such as the representative model extraction attack. Emerging attack can leverage timing-sensitive architecture-level events (i.e., Arch-hints) disclosed in hardware platforms to extract DNN model layer information accurately. In this paper, we take the first step to uncover the root cause of such Arch-hints and summarize the principles to identify them. We then apply these principles to emerging Unified Memory (UM) management system and identify three new Arch-hints caused by UM's unique data movement patterns. We then develop a new extraction attack, UMProbe. We also create the first DNN benchmark suite in UM and utilize the benchmark suite to evaluate UMProbe. Our evaluation shows that UMProbe can extract the layer sequence with an accuracy of 95% for almost all victim test models, which thus calls for more attention to the DNN security in UM system.

preprint2022arXiv

Enabling Efficient Deep Convolutional Neural Network-based Sensor Fusion for Autonomous Driving

Autonomous driving demands accurate perception and safe decision-making. To achieve this, automated vehicles are now equipped with multiple sensors (e.g., camera, Lidar, etc.), enabling them to exploit complementary environmental context by fusing data from different sensing modalities. With the success of Deep Convolutional Neural Network(DCNN), the fusion between DCNNs has been proved as a promising strategy to achieve satisfactory perception accuracy. However, mainstream existing DCNN fusion schemes conduct fusion by directly element-wisely adding feature maps extracted from different modalities together at various stages, failing to consider whether the features being fused are matched or not. Therefore, we first propose a feature disparity metric to quantitatively measure the degree of feature disparity between the feature maps being fused. We then propose Fusion-filter as a feature-matching techniques to tackle the feature-mismatching issue. We also propose a Layer-sharing technique in the deep layer that can achieve better accuracy with less computational overhead. Together with the help of the feature disparity to be an additional loss, our proposed technologies enable DCNN to learn corresponding feature maps with similar characteristics and complementary visual context from different modalities to achieve better accuracy. Experimental results demonstrate that our proposed fusion technique can achieve better accuracy on KITTI dataset with less computational resources demand.

preprint2022arXiv

Metastable complex vector bundles over complex projective spaces

We apply Weiss calculus to compute the number of topological complex vector bundles of rank $n-2$ with vanishing Chern classes over $\mathbb{C}P^n$ for $n>3$, as given by the list $1, 1, 12, 2, 1, 3, 2, 2, 3, 1, 4, 6, 1, 1, 6, 2, 1, 3, 4, 2, 3, 1, 2, 6$, where the $i$-th entry in this list is the number of such bundles whenever $n$ is congruent to $i$ modulo $24$, starting with $i = 0$. Similarly, the number of rank $n-1$ bundles with vanishing Chern classes over $\mathbb{C}P^n$ for $n>2$ is $2$ when $n$ is odd and $1$ when $n$ is even.

preprint2022arXiv

Negative differential thermal conductance between Weyl semimetals nanoparticles through vacuum

In this work, the near-field radiative heat transfer (NFRHT) between two Weyl semimetal (WSM) nanoparticles (NPs) is investigated. The numerical results show that negative differential thermal conductance (NDTC) effect can be obtained in this system, i.e., when the temperature of the emitter is fixed, the heat flux does not decrease monotonically with the increase of the temperature of the receiver. Specifically, when the temperature of the emitter is 300 K, the heat flux is identical when the temperature of the receiver is 50 K or 280 K. The NDTC effect is attributed to the fact that the permittivity of the WSMs changes with the temperature. The coupling effects of polarizability of two WSM NPs have been further identified at different temperature to reveal the physical mechanism of the NDTC effect. In addition, the NFRHT between two Weyl WSM NPs can be greatly enhanced by exciting the localized plasmon and circular modes. This work indicates that the WSMs maybe promising candidate materials for manipulating NFRHT.

preprint2022arXiv

NTIRE 2022 Challenge on High Dynamic Range Imaging: Methods and Results

This paper reviews the challenge on constrained high dynamic range (HDR) imaging that was part of the New Trends in Image Restoration and Enhancement (NTIRE) workshop, held in conjunction with CVPR 2022. This manuscript focuses on the competition set-up, datasets, the proposed methods and their results. The challenge aims at estimating an HDR image from multiple respective low dynamic range (LDR) observations, which might suffer from under- or over-exposed regions and different sources of noise. The challenge is composed of two tracks with an emphasis on fidelity and complexity constraints: In Track 1, participants are asked to optimize objective fidelity scores while imposing a low-complexity constraint (i.e. solutions can not exceed a given number of operations). In Track 2, participants are asked to minimize the complexity of their solutions while imposing a constraint on fidelity scores (i.e. solutions are required to obtain a higher fidelity score than the prescribed baseline). Both tracks use the same data and metrics: Fidelity is measured by means of PSNR with respect to a ground-truth HDR image (computed both directly and with a canonical tonemapping operation), while complexity metrics include the number of Multiply-Accumulate (MAC) operations and runtime (in seconds).

preprint2022arXiv

On the Sample Complexity of Stabilizing LTI Systems on a Single Trajectory

Stabilizing an unknown dynamical system is one of the central problems in control theory. In this paper, we study the sample complexity of the learn-to-stabilize problem in Linear Time-Invariant (LTI) systems on a single trajectory. Current state-of-the-art approaches require a sample complexity linear in $n$, the state dimension, which incurs a state norm that blows up exponentially in $n$. We propose a novel algorithm based on spectral decomposition that only needs to learn "a small part" of the dynamical matrix acting on its unstable subspace. We show that, under proper assumptions, our algorithm stabilizes an LTI system on a single trajectory with $\tilde{O}(k)$ samples, where $k$ is the instability index of the system. This represents the first sub-linear sample complexity result for the stabilization of LTI systems under the regime when $k = o(n)$.

preprint2022arXiv

RFMask: A Simple Baseline for Human Silhouette Segmentation with Radio Signals

Human silhouette segmentation, which is originally defined in computer vision, has achieved promising results for understanding human activities. However, the physical limitation makes existing systems based on optical cameras suffer from severe performance degradation under low illumination, smoke, and/or opaque obstruction conditions. To overcome such limitations, in this paper, we propose to utilize the radio signals, which can traverse obstacles and are unaffected by the lighting conditions to achieve silhouette segmentation. The proposed RFMask framework is composed of three modules. It first transforms RF signals captured by millimeter wave radar on two planes into spatial domain and suppress interference with the signal processing module. Then, it locates human reflections on RF frames and extract features from surrounding signals with human detection module. Finally, the extracted features from RF frames are aggregated with an attention based mask generation module. To verify our proposed framework, we collect a dataset containing 804,760 radio frames and 402,380 camera frames with human activities under various scenes. Experimental results show that the proposed framework can achieve impressive human silhouette segmentation even under the challenging scenarios(such as low light and occlusion scenarios) where traditional optical-camera-based methods fail. To the best of our knowledge, this is the first investigation towards segmenting human silhouette based on millimeter wave signals. We hope that our work can serve as a baseline and inspire further research that perform vision tasks with radio signals. The dataset and codes will be made in public.

preprint2022arXiv

SapientML: Synthesizing Machine Learning Pipelines by Learning from Human-Written Solutions

Automatic machine learning, or AutoML, holds the promise of truly democratizing the use of machine learning (ML), by substantially automating the work of data scientists. However, the huge combinatorial search space of candidate pipelines means that current AutoML techniques, generate sub-optimal pipelines, or none at all, especially on large, complex datasets. In this work we propose an AutoML technique SapientML, that can learn from a corpus of existing datasets and their human-written pipelines, and efficiently generate a high-quality pipeline for a predictive task on a new dataset. To combat the search space explosion of AutoML, SapientML employs a novel divide-and-conquer strategy realized as a three-stage program synthesis approach, that reasons on successively smaller search spaces. The first stage uses a machine-learned model to predict a set of plausible ML components to constitute a pipeline. In the second stage, this is then refined into a small pool of viable concrete pipelines using syntactic constraints derived from the corpus and the machine-learned model. Dynamically evaluating these few pipelines, in the third stage, provides the best solution. We instantiate SapientML as part of a fully automated tool-chain that creates a cleaned, labeled learning corpus by mining Kaggle, learns from it, and uses the learned models to then synthesize pipelines for new predictive tasks. We have created a training corpus of 1094 pipelines spanning 170 datasets, and evaluated SapientML on a set of 41 benchmark datasets, including 10 new, large, real-world datasets from Kaggle, and against 3 state-of-the-art AutoML tools and 2 baselines. Our evaluation shows that SapientML produces the best or comparable accuracy on 27 of the benchmarks while the second best tool fails to even produce a pipeline on 9 of the instances.

preprint2022arXiv

Towards a High-performance and Secure Memory System and Architecture for Emerging Applications

In this dissertation, we propose a memory and computing coordinated methodology to thoroughly exploit the characteristics and capabilities of the GPU-based heterogeneous system to effectively optimize applications' performance and privacy. Specifically, 1) we propose a task-aware and dynamic memory management mechanism to co-optimize applications' latency and memory footprint, especially in multitasking scenarios. 2) We propose a novel latency-aware memory management framework that analyzes the application characteristics and hardware features to reduce applications' initialization latency and response time. 3) We develop a new model extraction attack that explores the vulnerability of the GPU unified memory system to accurately steal private DNN models. 4) We propose a CPU/GPU Co-Encryption mechanism that can defend against a timing-correlation attack in an integrated CPU/GPU platform to provide a secure execution environment for the edge applications. This dissertation aims at developing a high-performance and secure memory system and architecture in GPU heterogeneous platforms to deploy emerging AI-enabled applications efficiently and safely.

preprint2022arXiv

Towards Efficient Architecture and Algorithms for Sensor Fusion

The safety of an automated vehicle hinges crucially upon the accuracy of perception and decision-making latency. Under these stringent requirements, future automated cars are usually equipped with multi-modal sensors such as cameras and LiDARs. The sensor fusion is adopted to provide a confident context of driving scenarios for better decision-making. A promising sensor fusion technique is middle fusion that combines the feature representations from intermediate layers that belong to different sensing modalities. However, achieving both the accuracy and latency efficiency is challenging for middle fusion, which is critical for driving automation applications. We present A3Fusion, a software-hardware system specialized for an adaptive, agile, and aligned fusion in driving automation. A3Fusion achieves a high efficiency for the middle fusion of multiple CNN-based modalities by proposing an adaptive multi-modal learning network architecture and a latency-aware, agile network architecture optimization algorithm that enhances semantic segmentation accuracy while taking the inference latency as a key trade-off. In addition, A3Fusion proposes a FPGA-based accelerator that captures unique data flow patterns of our middle fusion algorithm while reducing the overall compute overheads. We enable these contributions by co-designing the neural network, algorithm, and the accelerator architecture.

preprint2022arXiv

Two-Dimensional Weisfeiler-Lehman Graph Neural Networks for Link Prediction

Link prediction is one important application of graph neural networks (GNNs). Most existing GNNs for link prediction are based on one-dimensional Weisfeiler-Lehman (1-WL) test. 1-WL-GNNs first compute node representations by iteratively passing neighboring node features to the center, and then obtain link representations by aggregating the pairwise node representations. As pointed out by previous works, this two-step procedure results in low discriminating power, as 1-WL-GNNs by nature learn node-level representations instead of link-level. In this paper, we study a completely different approach which can directly obtain node pair (link) representations based on \textit{two-dimensional Weisfeiler-Lehman (2-WL) tests}. 2-WL tests directly use links (2-tuples) as message passing units instead of nodes, and thus can directly obtain link representations. We theoretically analyze the expressive power of 2-WL tests to discriminate non-isomorphic links, and prove their superior link discriminating power than 1-WL. Based on different 2-WL variants, we propose a series of novel 2-WL-GNN models for link prediction. Experiments on a wide range of real-world datasets demonstrate their competitive performance to state-of-the-art baselines and superiority over plain 1-WL-GNNs.

preprint2021arXiv

Cepstral Scanning Transmission Electron Microscopy Imaging of Severe Lattice Distortions

The development of four-dimensional (4D) scanning transmission electron microscopy (STEM) using fast detectors has opened-up new avenues for addressing some of long-standing challenges in electron imaging. One of these challenges is how to image severely distorted crystal lattices, such as at a dislocation core. Here we introduce a new 4D-STEM technique, called Cepstral STEM, for imaging disordered crystals using electron diffuse scattering. Local fluctuations of diffuse scattering are captured by scanning electron nanodiffraction (SEND) using a coherent probe. The harmonic signals in electron diffuse scattering are detected through Cepstral analysis and used for imaging. By integrating Cepstral analysis with 4D-STEM, we demonstrate that information about the distortive part of electron scattering potential can be separated and imaged at nm spatial resolution. We apply our technique to the analysis of a dislocation core in SiGe and lattice distortions in high entropy alloy.

preprint2021arXiv

Graph-based Visual-Semantic Entanglement Network for Zero-shot Image Recognition

Zero-shot learning uses semantic attributes to connect the search space of unseen objects. In recent years, although the deep convolutional network brings powerful visual modeling capabilities to the ZSL task, its visual features have severe pattern inertia and lack of representation of semantic relationships, which leads to severe bias and ambiguity. In response to this, we propose the Graph-based Visual-Semantic Entanglement Network to conduct graph modeling of visual features, which is mapped to semantic attributes by using a knowledge graph, it contains several novel designs: 1. it establishes a multi-path entangled network with the convolutional neural network (CNN) and the graph convolutional network (GCN), which input the visual features from CNN to GCN to model the implicit semantic relations, then GCN feedback the graph modeled information to CNN features; 2. it uses attribute word vectors as the target for the graph semantic modeling of GCN, which forms a self-consistent regression for graph modeling and supervise GCN to learn more personalized attribute relations; 3. it fuses and supplements the hierarchical visual-semantic features refined by graph modeling into visual embedding. Our method outperforms state-of-the-art approaches on multiple representative ZSL datasets: AwA2, CUB, and SUN by promoting the semantic linkage modelling of visual features.

preprint2021arXiv

Inner-Imaging Networks: Put Lenses into Convolutional Structure

Despite the tremendous success in computer vision, deep convolutional networks suffer from serious computation costs and redundancies. Although previous works address this issue by enhancing diversities of filters, they have not considered the complementarity and the completeness of the internal structure of the convolutional network. To deal with these problems, a novel Inner-Imaging architecture is proposed in this paper, which allows relationships between channels to meet the above requirement. Specifically, we organize the channel signal points in groups using convolutional kernels to model both the intra-group and inter-group relationships simultaneously. The convolutional filter is a powerful tool for modeling spatial relations and organizing grouped signals, so the proposed methods map the channel signals onto a pseudo-image, like putting a lens into convolution internal structure. Consequently, not only the diversity of channels is increased, but also the complementarity and completeness can be explicitly enhanced. The proposed architecture is lightweight and easy to be implemented. It provides an efficient self-organization strategy for convolutional networks so as to improve their efficiency and performance. Extensive experiments are conducted on multiple benchmark image recognition data sets including CIFAR, SVHN and ImageNet. Experimental results verify the effectiveness of the Inner-Imaging mechanism with the most popular convolutional networks as the backbones.

preprint2021arXiv

Q-VR: System-Level Design for Future Mobile Collaborative Virtual Reality

High Quality Mobile Virtual Reality (VR) is what the incoming graphics technology era demands: users around the world, regardless of their hardware and network conditions, can all enjoy the immersive virtual experience. However, the state-of-the-art software-based mobile VR designs cannot fully satisfy the realtime performance requirements due to the highly interactive nature of user's actions and complex environmental constraints during VR execution. Inspired by the unique human visual system effects and the strong correlation between VR motion features and realtime hardware-level information, we propose Q-VR, a novel dynamic collaborative rendering solution via software-hardware co-design for enabling future low-latency high-quality mobile VR. At software-level, Q-VR provides flexible high-level tuning interface to reduce network latency while maintaining user perception. At hardware-level, Q-VR accommodates a wide spectrum of hardware and network conditions across users by effectively leveraging the computing capability of the increasingly powerful VR hardware. Extensive evaluation on real-world games demonstrates that Q-VR can achieve an average end-to-end performance speedup of 3.4x (up to 6.7x) over the traditional local rendering design in commercial VR devices, and a 4.1x frame rate improvement over the state-of-the-art static collaborative rendering.

preprint2020arXiv

Co-Optimizing Performance and Memory FootprintVia Integrated CPU/GPU Memory Management, anImplementation on Autonomous Driving Platform

Cutting-edge embedded system applications, such as self-driving cars and unmanned drone software, are reliant on integrated CPU/GPU platforms for their DNNs-driven workload, such as perception and other highly parallel components. In this work, we set out to explore the hidden performance implication of GPU memory management methods of integrated CPU/GPU architecture. Through a series of experiments on micro-benchmarks and real-world workloads, we find that the performance under different memory management methods may vary according to application characteristics. Based on this observation, we develop a performance model that can predict system overhead for each memory management method based on application characteristics. Guided by the performance model, we further propose a runtime scheduler. By conducting per-task memory management policy switching and kernel overlapping, the scheduler can significantly relieve the system memory pressure and reduce the multitasking co-run response time. We have implemented and extensively evaluated our system prototype on the NVIDIA Jetson TX2, Drive PX2, and Xavier AGX platforms, using both Rodinia benchmark suite and two real-world case studies of drone software and autonomous driving software.

preprint2020arXiv

Data Augmentation Imbalance For Imbalanced Attribute Classification

Pedestrian attribute recognition is an important multi-label classification problem. Although the convolutional neural networks are prominent in learning discriminative features from images, the data imbalance in multi-label setting for fine-grained tasks remains an open problem. In this paper, we propose a new re-sampling algorithm called: data augmentation imbalance (DAI) to explicitly enhance the ability to discriminate the fewer attributes via increasing the proportion of labels accounting for a small part. Fundamentally, by applying over-sampling and under-sampling on the multi-label dataset at the same time, the thought of robbing the rich attributes and helping the poor makes a significant contribution to DAI. Extensive empirical evidence shows that our DAI algorithm achieves state-of-the-art results, based on pedestrian attribute datasets, i.e. standard PA-100K and PETA datasets.

preprint2020arXiv

Disconnection-mediated twin embryo growth in Mg

While deformation twinning in hexagonal close-packed metals has been widely studied due to its substantial impact on mechanical properties, an understanding of the detailed atomic processes associated with twin embryo growth is still lacking. Conducting molecular dynamics simulations on Mg, we show that the propagation of twinning disconnections emitted by basal-prismatic interfaces controls the twin boundary motion and is the rate-limiting mechanism during the initial growth of the twin embryo. The time needed for disconnection propagation is related to the distance between the twin tips, with widely spaced twin tips requiring more time for a unit twin boundary migration event to be completed. Thus, a phenomenological model, which unifies the two processes of disconnection and twin tip propagation, is proposed here to provide a quantitative analysis of twin embryo growth. The model fits the simulation data well, with two key parameters (twin tip velocity and twinning disconnection velocity) being extracted. In addition, a linear relationship between the ratio of twinning disconnection velocity to twin tip velocity and the applied shear stress is observed. Using an example of twin growth in a nanoscale single crystal from the recent literature, we find that our molecular dynamics simulations and analytical model are in good agreement with experimental data.

preprint2020arXiv

Multiple Attentional Pyramid Networks for Chinese Herbal Recognition

Chinese herbs play a critical role in Traditional Chinese Medicine. Due to different recognition granularity, they can be recognized accurately only by professionals with much experience. It is expected that they can be recognized automatically using new techniques like machine learning. However, there is no Chinese herbal image dataset available. Simultaneously, there is no machine learning method which can deal with Chinese herbal image recognition well. Therefore, this paper begins with building a new standard Chinese-Herbs dataset. Subsequently, a new Attentional Pyramid Networks (APN) for Chinese herbal recognition is proposed, where both novel competitive attention and spatial collaborative attention are proposed and then applied. APN can adaptively model Chinese herbal images with different feature scales. Finally, a new framework for Chinese herbal recognition is proposed as a new application of APN. Experiments are conducted on our constructed dataset and validate the effectiveness of our methods.

preprint2020arXiv

Neural Architecture Search For Fault Diagnosis

Data-driven methods have made great progress in fault diagnosis, especially deep learning method. Deep learning is suitable for processing big data, and has a strong feature extraction ability to realize end-to-end fault diagnosis systems. However, designing neural network architecture requires rich professional knowledge and debugging experience, and a lot of experiments are needed to screen models and hyperparameters, increasing the difficulty of developing deep learning models. Frortunately, neural architecture search (NAS) is developing rapidly, and is becoming one of the next directions for deep learning. In this paper, we proposed a NAS method for fault diagnosis using reinforcement learning. A recurrent neural network is used as an agent to generate network architecture. The accuracy of the generated network on the validation dataset is fed back to the agent as a reward, and the parameters of the agent are updated through the strategy gradient algorithm. We use PHM 2009 Data Challenge gearbox dataset to prove the effectiveness of proposed method, and obtain state-of-the-art results compared with other artificial designed network structures. To author's best knowledge, it's the first time that NAS has been applied in fault diagnosis.

preprint2020arXiv

Some theoretical results on the second-order conservative phase field equation

In this paper, a theoretical research on the second-order conservative phase field (SOCPF) equation is presented. The theoretical results include the following three aspects. First, three new derivation methods for the SOCPF equation are given. The SOCPF equation can be viewed as the gradient flow, the special diffusion equation and the diffuse interface form of a sharp interface formulation for the piecewise constant function, respectively. These derivation methods help us to understand the SOCPF equation at different perspectives. Second, the conservation's properties of the solution of SOCPF equation are studied. Compared with the Cahn-Hilliard equation and the Allen-Cahn equation, it is found that the solution of SOCPF equation satisfies more conservation laws. Third, the wetting boundary condition for the SOCPF equation is investigated. We find that the no-flux boundary condition is equivalent to the wetting boundary condition for two-component phase field model. Moreover, applying the no-flux boundary conditions for $N$-component phase field model, we give a set of wetting boundary conditions for $N$ phase field parameters.

preprint2019arXiv

Automatic construction of Chinese herbal prescription from tongue image via CNNs and auxiliary latent therapy topics

The tongue image provides important physical information of humans. It is of great importance for diagnoses and treatments in clinical medicine. Herbal prescriptions are simple, noninvasive and have low side effects. Thus, they are widely applied in China. Studies on the automatic construction technology of herbal prescriptions based on tongue images have great significance for deep learning to explore the relevance of tongue images for herbal prescriptions, it can be applied to healthcare services in mobile medical systems. In order to adapt to the tongue image in a variety of photographic environments and construct herbal prescriptions, a neural network framework for prescription construction is designed. It includes single/double convolution channels and fully connected layers. Furthermore, it proposes the auxiliary therapy topic loss mechanism to model the therapy of Chinese doctors and alleviate the interference of sparse output labels on the diversity of results. The experiment use the real world tongue images and the corresponding prescriptions and the results can generate prescriptions that are close to the real samples, which verifies the feasibility of the proposed method for the automatic construction of herbal prescriptions from tongue images. Also, it provides a reference for automatic herbal prescription construction from more physical information.

preprint2018arXiv

Non-Thermal Cosmic Rays During Big Bang Nucleosynthesis to Solve the Lithium Problem

The discrepancy between the theoretical prediction of primordial lithium abundances and astronomical observations is called the Lithium Problem. We find that extra contributions from non-thermal hydrogen and helium during Big Bang nucleosynthesis can explain the discrepancy, for both Li-7 and Li-6, and will change the deuterium abundance only little. The allowed parameter space of such an amount of non-thermal particles and the energy range is shown. The hypothesis is stable regardless of the cross-section uncertainty of relevant reactions and the explicit shape of the energy spectrum.

preprint2015arXiv

Person Re-identification by Local Maximal Occurrence Representation and Metric Learning

Person re-identification is an important technique towards automatic search of a person's presence in a surveillance video. Two fundamental problems are critical for person re-identification, feature representation and metric learning. An effective feature representation should be robust to illumination and viewpoint changes, and a discriminant metric should be learned to match various person images. In this paper, we propose an effective feature representation called Local Maximal Occurrence (LOMO), and a subspace and metric learning method called Cross-view Quadratic Discriminant Analysis (XQDA). The LOMO feature analyzes the horizontal occurrence of local features, and maximizes the occurrence to make a stable representation against viewpoint changes. Besides, to handle illumination variations, we apply the Retinex transform and a scale invariant texture operator. To learn a discriminant metric, we propose to learn a discriminant low dimensional subspace by cross-view quadratic discriminant analysis, and simultaneously, a QDA metric is learned on the derived subspace. We also present a practical computation method for XQDA, as well as its regularization. Experiments on four challenging person re-identification databases, VIPeR, QMUL GRID, CUHK Campus, and CUHK03, show that the proposed method improves the state-of-the-art rank-1 identification rates by 2.2%, 4.88%, 28.91%, and 31.55% on the four databases, respectively.

preprint2014arXiv

High Range Resolution Profiling in Missing Data Case: A New Approach

We have proposed a novel method for Synthetic High Range Resolution (HRR) profiling, under the condition of missing frequency domain samples. This new approach estimates the autocovariance function (ACF) of the signal by valid sample pairs. Autocovariance matrix is formed from ACF estimations. Even with large part of data missing, new approach exhibits robust profiling result. Simulations are presented to show a advantage over other approaches in missing data case. Moreover, a real radar experiment was conducted to validate the new approach.

preprint2014arXiv

Open-set Person Re-identification

Person re-identification is becoming a hot research for developing both machine learning algorithms and video surveillance applications. The task of person re-identification is to determine which person in a gallery has the same identity to a probe image. This task basically assumes that the subject of the probe image belongs to the gallery, that is, the gallery contains this person. However, in practical applications such as searching a suspect in a video, this assumption is usually not true. In this paper, we consider the open-set person re-identification problem, which includes two sub-tasks, detection and identification. The detection sub-task is to determine the presence of the probe subject in the gallery, and the identification sub-task is to determine which person in the gallery has the same identity as the accepted probe. We present a database collected from a video surveillance setting of 6 cameras, with 200 persons and 7,413 images segmented. Based on this database, we develop a benchmark protocol for evaluating the performance under the open-set person re-identification scenario. Several popular metric learning algorithms for person re-identification have been evaluated as baselines. From the baseline performance, we observe that the open-set person re-identification problem is still largely unresolved, thus further attention and effort is needed.

preprint2013arXiv

Fast Matching by 2 Lines of Code for Large Scale Face Recognition Systems

In this paper, we propose a method to apply the popular cascade classifier into face recognition to improve the computational efficiency while keeping high recognition rate. In large scale face recognition systems, because the probability of feature templates coming from different subjects is very high, most of the matching pairs will be rejected by the early stages of the cascade. Therefore, the cascade can improve the matching speed significantly. On the other hand, using the nested structure of the cascade, we could drop some stages at the end of feature to reduce the memory and bandwidth usage in some resources intensive system while not sacrificing the performance too much. The cascade is learned by two steps. Firstly, some kind of prepared features are grouped into several nested stages. And then, the threshold of each stage is learned to achieve user defined verification rate (VR). In the paper, we take a landmark based Gabor+LDA face recognition system as baseline to illustrate the process and advantages of the proposed method. However, the use of this method is very generic and not limited in face recognition, which can be easily generalized to other biometrics as a post-processing module. Experiments on the FERET database show the good performance of our baseline and an experiment on a self-collected large scale database illustrates that the cascade can improve the matching speed significantly.

preprint2013arXiv

Vector cross product in n-dimensional vector space

The definition of vector cross product (VCP) introduced by Eckmann only exists in thethree- and the seven- dimensional vector space. In this paper, according to the orthogonal completeness, magnitude of basis vector cross product and all kinds of combinations of basis vector $\hat{e}_i$, the generalized definition of VCP in the odd n-dimensional vector space is given by introducing a cross term $X_{AB}$. In addition, the definition is validated by reducing the generalization definition to the fundamental three- and seven-dimensional vector space.

preprint2012arXiv

Cosmic Rays during BBN as Origin of Lithium Problem

There may be non-thermal cosmic rays during big-bang nucleosynthesis (BBN) epoch (dubbed as BBNCRs). This paper investigated whether such BBNCRs can be the origin of Lithium problem or not. It can be expected that BBNCRs flux will be small in order to keep the success of standard BBN (SBBN). With favorable assumptions on the BBNCR spectrum between 0.09 -- 4 MeV, our numerical calculation showed that extra contributions from BBNCRs can account for the $^7$Li abundance successfully. However $^6$Li abundance is only lifted an order of magnitude, which is still much lower than the observed value. As the deuteron abundance is very sensitive to the spectrum choice of BBNCRs, the allowed parameter space for the spectrum is strictly constrained. We should emphasize that the acceleration mechanism for BBNCRs in the early universe is still an open question. For example, strong turbulent magnetic field is probably the solution to the problem. Whether such a mechanism can provide the required spectrum deserves further studies.

preprint2010arXiv

Extended Range Profiling in Stepped-Frequency Radar with Sparse Recovery

The newly emerging theory of compressed sensing (CS) enables restoring a sparse signal from inadequate number of linear projections. Based on compressed sensing theory, a new algorithm of high-resolution range profiling for stepped-frequency (SF) radar suffering from missing pulses is proposed. The new algorithm recovers target range profile over multiple coarse-range-bins, providing a wide range profiling capability. MATLAB simulation results are presented to verify the proposed method. Furthermore, we use collected data from real SF radar to generate extended target high-resolution range (HRR) profile. Results are compared with `stretch' based least square method to prove its applicability.

Yang Hu

What is connected

Connect this record

See the researcher in context

Building this map preview

34 published item(s)

PADE: A Predictor-Free Sparse Attention Accelerator via Unified Execution and Stage Fusion

Max-Entropy Reinforcement Learning with Flow Matching and A Case Study on LQR

Actively tuning anisotropic light-matter interaction in biaxial hyperbolic material $α$-MoO$_3$ using phase change material VO$_2$ and graphene

Brief Industry Paper: The Necessity of Adaptive Data Fusion in Infrastructure-Augmented Autonomous Driving System

Demystifying Arch-hints for Model Extraction: An Attack in Unified Memory System

Enabling Efficient Deep Convolutional Neural Network-based Sensor Fusion for Autonomous Driving

Metastable complex vector bundles over complex projective spaces

Negative differential thermal conductance between Weyl semimetals nanoparticles through vacuum

NTIRE 2022 Challenge on High Dynamic Range Imaging: Methods and Results

On the Sample Complexity of Stabilizing LTI Systems on a Single Trajectory

RFMask: A Simple Baseline for Human Silhouette Segmentation with Radio Signals

SapientML: Synthesizing Machine Learning Pipelines by Learning from Human-Written Solutions

Towards a High-performance and Secure Memory System and Architecture for Emerging Applications

Towards Efficient Architecture and Algorithms for Sensor Fusion

Two-Dimensional Weisfeiler-Lehman Graph Neural Networks for Link Prediction

Cepstral Scanning Transmission Electron Microscopy Imaging of Severe Lattice Distortions

Graph-based Visual-Semantic Entanglement Network for Zero-shot Image Recognition

Inner-Imaging Networks: Put Lenses into Convolutional Structure

Q-VR: System-Level Design for Future Mobile Collaborative Virtual Reality

Co-Optimizing Performance and Memory FootprintVia Integrated CPU/GPU Memory Management, anImplementation on Autonomous Driving Platform

Data Augmentation Imbalance For Imbalanced Attribute Classification

Disconnection-mediated twin embryo growth in Mg

Multiple Attentional Pyramid Networks for Chinese Herbal Recognition

Neural Architecture Search For Fault Diagnosis

Some theoretical results on the second-order conservative phase field equation

Automatic construction of Chinese herbal prescription from tongue image via CNNs and auxiliary latent therapy topics

Non-Thermal Cosmic Rays During Big Bang Nucleosynthesis to Solve the Lithium Problem

Person Re-identification by Local Maximal Occurrence Representation and Metric Learning

High Range Resolution Profiling in Missing Data Case: A New Approach

Open-set Person Re-identification

Fast Matching by 2 Lines of Code for Large Scale Face Recognition Systems

Vector cross product in n-dimensional vector space

Cosmic Rays during BBN as Origin of Lithium Problem

Extended Range Profiling in Stepped-Frequency Radar with Sparse Recovery