Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
33works
0followers
25topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

33 published item(s)

preprint2026arXiv

DCFold: Efficient Protein Structure Generation with Single Forward Pass

AlphaFold3 introduces a diffusion-based architecture that elevates protein structure prediction to all-atom resolution with improved accuracy. This state-of-the-art performance has established AlphaFold3 as a foundation model for diverse generation and design tasks. However, its iterative design substantially increases inference time, limiting practical deployment in downstream settings such as virtual screening and protein design. We propose DCFold, a single-step generative model that attains AlphaFold3-level accuracy. Our Dual Consistency training framework, which incorporates a novel Temporal Geodesic Matching (TGM) scheduler, enables DCFold to achieve a 15x acceleration in inference while maintaining predictive fidelity. We validate its effectiveness across both structure prediction and binder design benchmarks.

preprint2025arXiv

Transverse momentum asymmetry in the semi-inclusive electron positron annihilation process

Hadronization, a nonperturbative process, cannot be calculated from first principles. It can be investigated either by using phenomenological models or by examining the behavior of produced hadrons or through fragmentation functions. These fragmentation functions are nonperturbative quantities whose determination relies entirely on experimental data. However, higher-twist fragmentation functions present significant challenges for their determination due to power suppression. In this paper, we propose an asymmetry to study twist-3 fragmentation functions. This asymmetry is defined as the transverse momentum asymmetry of the fragmenting quark and/or the produced jet with respect to the observed hadron direction within the semi-inclusive electron positron annihilation process. As a twist-3 effect, this asymmetry is sensitive to the distribution of the jet relative to the produced hadron direction during hadronization. Furthermore, it is closely related to twist-3 transverse momentum dependent fragmentation functions and provides a set of measurable quantities for their determination.

preprint2022arXiv

6G-enabled Edge AI for Metaverse: Challenges, Methods, and Future Research Directions

6G-enabled edge intelligence opens up a new era of Internet of Everything and makes it possible to interconnect people-devices-cloud anytime, anywhere. More and more next-generation wireless network smart service applications are changing our way of life and improving our quality of life. As the hottest new form of next-generation Internet applications, Metaverse is striving to connect billions of users and create a shared world where virtual and reality merge. However, limited by resources, computing power, and sensory devices, Metaverse is still far from realizing its full vision of immersion, materialization, and interoperability. To this end, this survey aims to realize this vision through the organic integration of 6G-enabled edge AI and Metaverse. Specifically, we first introduce three new types of edge-Metaverse architectures that use 6G-enabled edge AI to solve resource and computing constraints in Metaverse. Then we summarize technical challenges that these architectures face in Metaverse and the existing solutions. Furthermore, we explore how the edge-Metaverse architecture technology helps Metaverse to interact and share digital data. Finally, we discuss future research directions to realize the true vision of Metaverse with 6G-enabled edge AI.

preprint2022arXiv

Efficient Federated Learning with Spike Neural Networks for Traffic Sign Recognition

With the gradual popularization of self-driving, it is becoming increasingly important for vehicles to smartly make the right driving decisions and autonomously obey traffic rules by correctly recognizing traffic signs. However, for machine learning-based traffic sign recognition on the Internet of Vehicles (IoV), a large amount of traffic sign data from distributed vehicles is needed to be gathered in a centralized server for model training, which brings serious privacy leakage risk because of traffic sign data containing lots of location privacy information. To address this issue, we first exploit privacy-preserving federated learning to perform collaborative training for accurate recognition models without sharing raw traffic sign data. Nevertheless, due to the limited computing and energy resources of most devices, it is hard for vehicles to continuously undertake complex artificial intelligence tasks. Therefore, we introduce powerful Spike Neural Networks (SNNs) into traffic sign recognition for energy-efficient and fast model training, which is the next generation of neural networks and is practical and well-fitted to IoV scenarios. Furthermore, we design a novel encoding scheme for SNNs based on neuron receptive fields to extract information from the pixel and spatial dimensions of traffic signs to achieve high-accuracy training. Numerical results indicate that the proposed federated SNN outperforms traditional federated convolutional neural networks in terms of accuracy, noise immunity, and energy efficiency as well.

preprint2022arXiv

Glassy crystals with colossal multi-baroresponsivities

As a nontrivial solid state of matter, the glassy-crystal state embraces physical features of both crystalline and amorphous solids, where a long-range ordered periodic structure formed by the mass centers of constituent molecules accommodates orientational glasses. Here, we discover and validate a glassy-crystal state in 2-amino-2-methyl-1,3-propanediol (AMP, C4H11NO2) by neutron scattering and complementary broadband dielectric spectroscopy (BDS) measurements. The freezing process of the dynamic orientational disorder is manifested at relaxation times well described by the Vogel-Fulcher-Tammann (VFT) law and the strongly frequency-dependent freezing temperature ranging from around 225 K at 0.1 Hz to above room temperature in the GHz region. At room temperature, the supercooled state is extremely sensitive to pressure such that a few MPa pressure can induce crystallization to the ordered crystal state, eventually leading to a temperature increase by 48 K within 20 s, a significant reduction of visible light transmittance from about 95% to a few percentages, and a remarkable decrease of electrical conductivity by three orders of magnitude. These ultrasensitive baroresponsivities might find their applications in low-grade waste heat recycling, pressure sensors and non-volatile memory devices. It is expected that glassy crystals serve as an emerging platform for exploiting exotic states of matter and the associated fantastic applications.

preprint2022arXiv

Gridless Tomographic SAR Imaging Based on Accelerated Atomic Norm Minimization with Efficiency

Synthetic aperture radar (SAR) tomography (TomoSAR) enables the reconstruction and three-dimensional (3D) localization of targets based on multiple two-dimensional (2D) observations of the same scene. The resolving along the elevation direction can be treated as a line spectrum estimation problem. However, traditional super-resolution spectrum estimation algorithms require multiple snapshots and uncorrelated targets. Meanwhile, as the most popular TomoSAR imaging method in modern years, compressed sensing (CS) based methods suffer from the gridding mismatch effect which markedly degrades the imaging performance. As a gridless CS approach, atomic norm minimization can avoid the gridding effect but requires enormous computing resources. Addressing the above issues, this paper proposes an improved fast ANM algorithm to TomoSAR elevation focusing by introducing the IVDST-ANM algorithm, which reduces the huge computational complexity of the conventional time-consuming semi-positive definite programming (SDP) by the iterative Vandermonde decomposition and shrinkage-thresholding (IVDST) approach, and retains the benefits of ANM in terms of gridless imaging and single snapshot recovery. We conducted experiments using simulated data to evaluate the performance of the proposed method, and reconstruction results of an urban area from the SARMV3D-Imaging 1.0 dataset are also presented.

preprint2022arXiv

Interpretable Melody Generation from Lyrics with Discrete-Valued Adversarial Training

Generating melody from lyrics is an interesting yet challenging task in the area of artificial intelligence and music. However, the difficulty of keeping the consistency between input lyrics and generated melody limits the generation quality of previous works. In our proposal, we demonstrate our proposed interpretable lyrics-to-melody generation system which can interact with users to understand the generation process and recreate the desired songs. To improve the reliability of melody generation that matches lyrics, mutual information is exploited to strengthen the consistency between lyrics and generated melodies. Gumbel-Softmax is exploited to solve the non-differentiability problem of generating discrete music attributes by Generative Adversarial Networks (GANs). Moreover, the predicted probabilities output by the generator is utilized to recommend music attributes. Interacting with our lyrics-to-melody generation system, users can listen to the generated AI song as well as recreate a new song by selecting from recommended music attributes.

preprint2022arXiv

MD modeling of cracks in clay at the nanoscale

Cracks in clay are significant in geotechnical and geoenvironmental engineering (e.g., embankment erosion and stability of landfill cover systems). This article studies the mechanism of nucleation and growth of cracks in clay at the nanoscale through full-scale molecular dynamics simulations. The clay adopted is pyrophyllite, and the force field is CLAYFF. The crack formation in a pyrophyllite clay layer is evaluated under uniaxial tension and simple shear. The numerical results show that cracks in the nanoscale pyrophyllite clay layer are brittle and strain-rate dependent. Small strain rate results in low ultimate tensile/shear strength. As strain rate increases, clay crack shifts from a single-crack pattern to a multiple-crack one. The cracking mechanism is investigated from bond breakage analysis at the atomic scale. It is found that the first bond breakage occurs in the silicon-surface oxygen bond. As a crack propagates, the relative percentage of broken silicon-surface oxygen bonds is the smallest compared to other types of metal-oxygen interactions, demonstrating that the atomic interaction between silicon and surface oxygen atoms is the strongest. To understand the propagation of cracks, we also study the stress intensity factor and energy release rate of pyrophyllite and their size dependence at the atomic scale.

preprint2022arXiv

Model-Free Statistical Inference on High-Dimensional Data

This paper aims to develop an effective model-free inference procedure for high-dimensional data. We first reformulate the hypothesis testing problem via sufficient dimension reduction framework. With the aid of new reformulation, we propose a new test statistic and show that its asymptotic distribution is $χ^2$ distribution whose degree of freedom does not depend on the unknown population distribution. We further conduct power analysis under local alternative hypotheses. In addition, we study how to control the false discovery rate of the proposed $χ^2$ tests, which are correlated, to identify important predictors under a model-free framework. To this end, we propose a multiple testing procedure and establish its theoretical guarantees. Monte Carlo simulation studies are conducted to assess the performance of the proposed tests and an empirical analysis of a real-world data set is used to illustrate the proposed methodology.

preprint2022arXiv

On joint training with interfaces for spoken language understanding

Spoken language understanding (SLU) systems extract both text transcripts and semantics associated with intents and slots from input speech utterances. SLU systems usually consist of (1) an automatic speech recognition (ASR) module, (2) an interface module that exposes relevant outputs from ASR, and (3) a natural language understanding (NLU) module. Interfaces in SLU systems carry information on text transcriptions or richer information like neural embeddings from ASR to NLU. In this paper, we study how interfaces affect joint-training for spoken language understanding. Most notably, we obtain the state-of-the-art results on the publicly available 50-hr SLURP dataset. We first leverage large-size pretrained ASR and NLU models that are connected by a text interface, and then jointly train both models via a sequence loss function. For scenarios where pretrained models are not utilized, the best results are obtained through a joint sequence loss training using richer neural interfaces. Finally, we show the overall diminishing impact of leveraging pretrained models with increased training data size.

preprint2022arXiv

Optimal Algorithms for Convex Nested Stochastic Composite Optimization

Recently, convex nested stochastic composite optimization (NSCO) has received considerable attention for its applications in reinforcement learning and risk-averse optimization. The current NSCO algorithms have worse stochastic oracle complexities, by orders of magnitude, than those for simpler stochastic composite optimization problems (e.g., sum of smooth and nonsmooth functions) without the nested structure. Moreover, they require all outer-layer functions to be smooth, which is not satisfied by some important applications. These discrepancies prompt us to ask: ``does the nested composition make stochastic optimization more difficult in terms of the order of oracle complexity?" In this paper, we answer the question by developing order-optimal algorithms for the convex NSCO problem constructed from an arbitrary composition of smooth, structured non-smooth and general non-smooth layer functions. When all outer-layer functions are smooth, we propose a stochastic sequential dual (SSD) method to achieve an oracle complexity of $\mathcal{O}(1/ε^2)$ ($\mathcal{O}(1/ε)$) when the problem is non-strongly (strongly) convex. When there exists some structured non-smooth or general non-smooth outer-layer function, we propose a nonsmooth stochastic sequential dual (nSSD) method to achieve an oracle complexity of $\mathcal{O}(1/ε^2)$. We provide a lower complexity bound to show the latter $\mathcal{O}(1/ε^2)$ complexity to be unimprovable even under a strongly convex setting. All these complexity results seem to be new in the literature and they indicate that the convex NSCO problem has the same order of oracle complexity as those without the nested composition in all but the strongly convex and outer-non-smooth problem.

preprint2022arXiv

PERT: A New Solution to Pinyin to Character Conversion Task

Pinyin to Character conversion (P2C) task is the key task of Input Method Engine (IME) in commercial input software for Asian languages, such as Chinese, Japanese, Thai language and so on. It's usually treated as sequence labelling task and resolved by language model, i.e. n-gram or RNN. However, the low capacity of the n-gram or RNN limits its performance. This paper introduces a new solution named PERT which stands for bidirectional Pinyin Encoder Representations from Transformers. It achieves significant improvement of performance over baselines. Furthermore, we combine PERT with n-gram under a Markov framework, and improve performance further. Lastly, the external lexicon is incorporated into PERT so as to resolve the OOD issue of IME.

preprint2022arXiv

Robust Semi-supervised Federated Learning for Images Automatic Recognition in Internet of Drones

Air access networks have been recognized as a significant driver of various Internet of Things (IoT) services and applications. In particular, the aerial computing network infrastructure centered on the Internet of Drones has set off a new revolution in automatic image recognition. This emerging technology relies on sharing ground truth labeled data between Unmanned Aerial Vehicle (UAV) swarms to train a high-quality automatic image recognition model. However, such an approach will bring data privacy and data availability challenges. To address these issues, we first present a Semi-supervised Federated Learning (SSFL) framework for privacy-preserving UAV image recognition. Specifically, we propose model parameters mixing strategy to improve the naive combination of FL and semi-supervised learning methods under two realistic scenarios (labels-at-client and labels-at-server), which is referred to as Federated Mixing (FedMix). Furthermore, there are significant differences in the number, features, and distribution of local data collected by UAVs using different camera modules in different environments, i.e., statistical heterogeneity. To alleviate the statistical heterogeneity problem, we propose an aggregation rule based on the frequency of the client's participation in training, namely the FedFreq aggregation rule, which can adjust the weight of the corresponding local model according to its frequency. Numerical results demonstrate that the performance of our proposed method is significantly better than those of the current baseline and is robust to different non-IID levels of client data.

preprint2022arXiv

Smoothing Advantage Learning

Advantage learning (AL) aims to improve the robustness of value-based reinforcement learning against estimation errors with action-gap-based regularization. Unfortunately, the method tends to be unstable in the case of function approximation. In this paper, we propose a simple variant of AL, named smoothing advantage learning (SAL), to alleviate this problem. The key to our method is to replace the original Bellman Optimal operator in AL with a smooth one so as to obtain more reliable estimation of the temporal difference target. We give a detailed account of the resulting action gap and the performance bound for approximate SAL. Further theoretical analysis reveals that the proposed value smoothing technique not only helps to stabilize the training procedure of AL by controlling the trade-off between convergence rate and the upper bound of the approximation errors, but is beneficial to increase the action gap between the optimal and sub-optimal action value as well.

preprint2022arXiv

TomoSAR-ALISTA: Efficient TomoSAR Imaging via Deep Unfolded Network

Synthetic aperture radar (SAR) tomography (TomoSAR) has attracted remarkable interest for its ability in achieving three-dimensional reconstruction along the elevation direction from multiple observations. In recent years, compressed sensing (CS) technique has been introduced into TomoSAR considering for its super-resolution ability with limited samples. Whereas, the CS-based methods suffer from several drawbacks, including weak noise resistance, high computational complexity and complex parameter fine-tuning. Among the different CS algorithms, iterative soft-thresholding algorithm (ISTA) is widely used as a robust reconstruction approach, however, the parameters in the ISTA algorithm are manually chosen, which usually requires a time-consuming fine-tuning process to achieve the best performance. Aiming at efficient TomoSAR imaging, a novel sparse unfolding network named analytic learned ISTA (ALISTA) is proposed towards the TomoSAR imaging problem in this paper, and the key parameters of ISTA are learned from training data via deep learning to avoid complex parameter fine-tuning and significantly relieves the training burden. In addition, experiments verify that it is feasible to use traditional CS algorithms as training labels, which provides a tangible supervised training method to achieve better 3D reconstruction performance even in the absence of labeled data in real applications.

preprint2021arXiv

Femtosecond dynamics of a polariton bosonic cascade at room temperature

Whispering gallery modes in a microwire are characterized by a nearly equidistant energy spectrum. In the strong exciton-photon coupling regime, this system represents a bosonic cascade: a ladder of discrete energy levels that sustains stimulated transitions between neighboring steps. In this work, by using femtosecond angle-resolved spectroscopic imaging technique, the ultrafast dynamics of polaritons in a bosonic cascade based on a one-dimensional ZnO whispering gallery microcavity is explicitly visualized. Clear ladder-form build-up process from higher to lower energy branches of the polariton condensates are observed, which are well reproduced by modeling using rate equations. Moreover, the polariton parametric scattering dynamics are distinguished on a timescale of hundreds of femtoseconds. Our understanding of the femtosecond condensation and scattering dynamics paves the way towards ultrafast coherent control of polaritons at room temperature, which will make it promising for high-speed all-optical integrated applications.

preprint2021arXiv

Inheritance-guided Hierarchical Assignment for Clinical Automatic Diagnosis

Clinical diagnosis, which aims to assign diagnosis codes for a patient based on the clinical note, plays an essential role in clinical decision-making. Considering that manual diagnosis could be error-prone and time-consuming, many intelligent approaches based on clinical text mining have been proposed to perform automatic diagnosis. However, these methods may not achieve satisfactory results due to the following challenges. First, most of the diagnosis codes are rare, and the distribution is extremely unbalanced. Second, existing methods are challenging to capture the correlation between diagnosis codes. Third, the lengthy clinical note leads to the excessive dispersion of key information related to codes. To tackle these challenges, we propose a novel framework to combine the inheritance-guided hierarchical assignment and co-occurrence graph propagation for clinical automatic diagnosis. Specifically, we propose a hierarchical joint prediction strategy to address the challenge of unbalanced codes distribution. Then, we utilize graph convolutional neural networks to obtain the correlation and semantic representations of medical ontology. Furthermore, we introduce multi attention mechanisms to extract crucial information. Finally, extensive experiments on MIMIC-III dataset clearly validate the effectiveness of our method.

preprint2021arXiv

NEMD modeling of nanoscale hydrodynamics of clay-water system at elevated temperature

The engineering problems involving clay under non-isothermal conditions (e.g., geothermal energy harvest, landfill cover system, and nuclear waste disposal) are multiscale and multiphysics by nature. The nanoscale hydrodynamics of clay at elevated temperature is essential in developing a physics-based multiscale model for clay under non-isothermal conditions. The nonequilibrium molecular dynamics (NEMD) is a useful tool to study the nanoscale hydrodyndamics of clay. This article presents an NEMD modeling of hydrodynamics of clay nanopores at elevated temperatures. Water flow confined in pyrophyllite and montmorillonite clay nanopores is investigated. The nonequilibrium state is maintained by uniformly exerting an external force on each water molecule. The NEMD simulations have provided a molecular-scale perspective of temperature effect on clay-water density, water flow velocity, shear viscosity, clay-water slip length, hydraulic conductivity, and clay-water friction coefficient. The numerical results have shown a strong temperature dependence of fluid flow velocity, shear viscosity, clay-water slip length, and hydraulic conductivity at the nanoscale. We have validated the applicability of cubic law in determining hydraulic conductivity at the nanopore scale under at elevated temperature. It is found from our numerical results that slip clay-water boundary condition is an essential factor in properly determining nanoscale fluid flow velocity. By numerical examples, we also study the impact of nanopore size and clay layer thickness on the hydrodynamics of the clay-water system.

preprint2021arXiv

The solution space structure of planted constraint satisfaction problems with growing domains

Planting a solution into the random RB model, which is a prototype of random constraint satisfaction problem (CSP) with growing domains, can generate very hard satisfiable CSP benchmarks. We study the solution space structure of the planted RB model. With constraint density growing, we find that this model goes through four phase transitions. In the replica symmetric phase, what we call the independent phase transition occurs, after which the planted cluster (cluster containing the planted solution) is separated from the giant cluster. Then the solutions except that in the planted cluster go through the same clustering phase transition and the same satisfiability phase transition as the random RB model. The planted cluster goes through the isolated phase transition, after which the planted cluster contains only one solution. This phase diagram provides strong evidence that this model can generate very hard satisfiable CSP benchmarks. For over constraint instances (where the constraint density is very large), we find that the configuration space has only a single energy valley, which makes the instances tractable. Experiments using Belief Propagation confirm the locations of the clustering, satisfiability (by configurations outside the planted cluster), and isolated phase transition points.

preprint2020arXiv

3dDepthNet: Point Cloud Guided Depth Completion Network for Sparse Depth and Single Color Image

In this paper, we propose an end-to-end deep learning network named 3dDepthNet, which produces an accurate dense depth image from a single pair of sparse LiDAR depth and color image for robotics and autonomous driving tasks. Based on the dimensional nature of depth images, our network offers a novel 3D-to-2D coarse-to-fine dual densification design that is both accurate and lightweight. Depth densification is first performed in 3D space via point cloud completion, followed by a specially designed encoder-decoder structure that utilizes the projected dense depth from 3D completion and the original RGB-D images to perform 2D image completion. Experiments on the KITTI dataset show our network achieves state-of-art accuracy while being more efficient. Ablation and generalization tests prove that each module in our network has positive influences on the final results, and furthermore, our network is resilient to even sparser depth.

preprint2020arXiv

Exploring Erasure Coding Techniques for High Availability of Intermediate Data

Scientific computing workflows generate enormous distributed data that is short-lived, yet critical for job completion time. This class of data is called intermediate data. A common way to achieve high data availability is to replicate data. However, an increasing scale of intermediate data generated in modern scientific applications demands new storage techniques to improve storage efficiency. Erasure Codes, as an alternative, can use less storage space while maintaining similar data availability. In this paper, we adopt erasure codes for storing intermediate data and compare its performance with replication. We also use the metric of Mean-Time-To-Data-Loss (MTTDL) to estimate the lifetime of intermediate data. We propose an algorithm to proactively relocate data redundancy from vulnerable machines to reliable ones to improve data availability with some extra network overhead. Furthermore, we propose an algorithm to assign redundancy units of data physically close to each other on the network to reduce the network bandwidth for reconstructing data when it is being accessed.

preprint2020arXiv

Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach

We propose to estimate 3D human pose from multi-view images and a few IMUs attached at person's limbs. It operates by firstly detecting 2D poses from the two signals, and then lifting them to the 3D space. We present a geometric approach to reinforce the visual features of each pair of joints based on the IMUs. This notably improves 2D pose estimation accuracy especially when one joint is occluded. We call this approach Orientation Regularized Network (ORN). Then we lift the multi-view 2D poses to the 3D space by an Orientation Regularized Pictorial Structure Model (ORPSM) which jointly minimizes the projection error between the 3D and 2D poses, along with the discrepancy between the 3D pose and IMU orientations. The simple two-step approach reduces the error of the state-of-the-art by a large margin on a public dataset. Our code will be released at https://github.com/CHUNYUWANG/imu-human-pose-pytorch.

preprint2020arXiv

Multi-objective multi-generation Gaussian process optimizer for design optimization

We present a multi-objective evolutionary optimization algorithm that uses Gaussian process (GP) regression-based models to select trial solutions in a multi-generation iterative procedure. In each generation, a surrogate model is constructed for each objective function with the sample data. The models are used to evaluate solutions and to select the ones with a high potential before they are evaluated on the actual system. Since the trial solutions selected by the GP models tend to have better performance than other methods that only rely on random operations, the new algorithm has much higher efficiency in exploring the parameter space. Simulations with multiple test cases show that the new algorithm has a substantially higher convergence speed and stability than NSGA-II, MOPSO, and some other more recent algorithms.

preprint2020arXiv

ROOT I/O compression improvements for HEP analysis

We overview recent changes in the ROOT I/O system, increasing performance and enhancing it and improving its interaction with other data analysis ecosystems. Both the newly introduced compression algorithms, the much faster bulk I/O data path, and a few additional techniques have the potential to significantly to improve experiment's software performance. The need for efficient lossless data compression has grown significantly as the amount of HEP data collected, transmitted, and stored has dramatically increased during the LHC era. While compression reduces storage space and, potentially, I/O bandwidth usage, it should not be applied blindly: there are significant trade-offs between the increased CPU cost for reading and writing files and the reduce storage space.

preprint2020arXiv

Simple and Lightweight Human Pose Estimation

Recent research on human pose estimation has achieved significant improvement. However, most existing methods tend to pursue higher scores using complex architecture or computationally expensive models on benchmark datasets, ignoring the deployment costs in practice. In this paper, we investigate the problem of simple and lightweight human pose estimation. We first redesign a lightweight bottleneck block with two non-novel concepts: depthwise convolution and attention mechanism. And then, based on the lightweight block, we present a Lightweight Pose Network (LPN) following the architecture design principles of SimpleBaseline. The model size (#Params) of our small network LPN-50 is only 9% of SimpleBaseline(ResNet50), and the computational complexity (FLOPs) is only 11%. To give full play to the potential of our LPN and get more accurate predicted results, we also propose an iterative training strategy and a model-agnostic post-processing function Beta-Soft-Argmax. We empirically demonstrate the effectiveness and efficiency of our methods on the benchmark dataset: the COCO keypoint detection dataset. Besides, we show the speed superiority of our lightweight network at inference time on a non-GPU platform. Specifically, our LPN-50 can achieve 68.7 in AP score on the COCO test-dev set, with only 2.7M parameters and 1.0 GFLOPs, while the inference speed is 17 FPS on an Intel i7-8700K CPU machine.

preprint2020arXiv

Trua: Efficient Task Replication for Flexible User-defined Availability in Scientific Grids

Failure is inevitable in scientific computing. As scientific applications and facilities increase their scales over the last decades, finding the root cause of a failure can be very complex or at times nearly impossible. Different scientific computing customers have varying availability demands as well as a diverse willingness to pay for availability. In contrast to existing solutions that try to provide higher and higher availability in scientific grids, we propose a model called Task Replication for User-defined Availability (Trua). Trua provides flexible, user-defined, availability in scientific grids, allowing customers to express their desire for availability to computational providers. Trua differs from existing task replication approaches in two folds. First, it relies on the historic failure information collected from the virtual layer of the scientific grids. The reliability model for the failures can be represented with a bimodal Johnson distribution which is different from any existing distributions. Second, it adopts an anomaly detector to filter out anomalous failures; it additionally adopts novel selection algorithms to mitigate the effects of temporary and spatial correlations of the failures without knowing the root cause of the failures. We apply the Trua on real-world traces collected from the Open Science Grid (OSG). Our results show that the Trua can successfully meet user-defined availability demands.

preprint2020arXiv

Varied fusion reaction probability induced by ion stopping modification in laser-driven plasma with different temperature

The dynamics of nuclear reaction in plasma is a fundamental issue in many high energy density researches, such as the astrophysical reactions and the inertial confinement fusion. The effective reaction cross-sections and ion stopping power in plasma need to be taken into account to analyze the reactivity. In this research, we have experimentally investigated the from D-D reactions from interactions between deuteron beams and deuterated polystyrene (CD) plasma, driven by two laser pulses respectively. The neutron yields, plasma density and deuteron energy loss in plasma have been measured, and the plasma temperature and deuteron stopping power have been analyzed from simulations. It is shown that, compared with a cold target, the reaction probability in plasma conditions can be enhanced or suppressed, which is ascribed to the deuteron stopping power modifications in plasma. In hotter CD plasma, the energy loss of moderate energetic deuterons reduces, which leads to higher D-D reaction probability, while the contrary happens in colder plasma. This work provides new understanding of fusion reactions in plasma environment.

preprint2019arXiv

Speeding HEP Analysis with ROOT Bulk I/O

Distinct HEP workflows have distinct I/O needs; while ROOT I/O excels at serializing complex C++ objects common to reconstruction, analysis workflows typically have simpler objects and can sustain higher event rates. To meet these workflows, we have developed a "bulk I/O" interface, allowing multiple events data to be returned per library call. This reduces ROOT-related overheads and increases event rates - orders-of-magnitude improvements are shown in microbenchmarks. Unfortunately, this bulk interface is difficult to use as it requires users to identify when it is applicable and they still "think" in terms of events, not arrays of data. We have integrated the bulk I/O interface into the new RDataFrame analysis framework inside ROOT. As RDataFrame's interface can provide improved type information, the framework itself can determine what data is readable via the bulk IO and automatically switch between interfaces. We demonstrate how this can improve event rates when reading analysis data formats, such as CMS's NanoAOD.

preprint2018arXiv

A Hybrid Neural Network Framework and Application to Radar Automatic Target Recognition

Deep neural networks (DNNs) have found applications in diverse signal processing (SP) problems. Most efforts either directly adopt the DNN as a black-box approach to perform certain SP tasks without taking into account of any known properties of the signal models, or insert a pre-defined SP operator into a DNN as an add-on data processing stage. This paper presents a novel hybrid-NN framework in which one or more SP layers are inserted into the DNN architecture in a coherent manner to enhance the network capability and efficiency in feature extraction. These SP layers are properly designed to make good use of the available models and properties of the data. The network training algorithm of hybrid-NN is designed to actively involve the SP layers in the learning goal, by simultaneously optimizing both the weights of the DNN and the unknown tuning parameters of the SP operators. The proposed hybrid-NN is tested on a radar automatic target recognition (ATR) problem. It achieves high validation accuracy of 96\% with 5,000 training images in radar ATR. Compared with ordinary DNN, hybrid-NN can markedly reduce the required amount of training data and improve the learning performance.

preprint2018arXiv

ANM-PhaseLift: Structured Line Spectrum Estimation from Quadratic Measurements

PhaseLift is a noted convex optimization technique for phase retrieval that can recover a signal exactly from amplitude measurements only, with high probability. Conventional PhaseLift requires a relatively large number of samples that sometimes can be costly to acquire. % to compensate for the missing phase information and achieve effective phase retrieval. This paper focuses on some practical applications where the signal of interest is composed of a few Vandermonde components, such as line spectra.A novel phase retrieval framework, namely ANM-PhaseLift, is developed that exploits the Vandermonde structure to alleviate the sampling requirements. Specifically, the atom set of amplitude-based quadratic measurements is identified, and atomic norm minimization (ANM) is introduced into PhaseLift to considerably reduce the number of measurements that are needed for accurate phase retrieval. The benefit of ANM-PhaseLift is particularly attractive in applications where the Vandermonde structure is presented, such as massive MIMO and radar imaging.

preprint2018arXiv

Efficient Two-Dimensional Line Spectrum Estimation Based on Decoupled Atomic Norm Minimization

This paper presents an efficient optimization technique for gridless {2-D} line spectrum estimation, named decoupled atomic norm minimization (D-ANM). The framework of atomic norm minimization (ANM) is considered, which has been successfully applied in 1-D problems to allow super-resolution frequency estimation for correlated sources even when the number of snapshots is highly limited. The state-of-the-art 2-D ANM approach vectorizes the 2-D measurements to their 1-D equivalence, which incurs huge computational cost and may become too costly for practical applications. We develop a novel decoupled approach of 2-D ANM via semi-definite programming (SDP), which introduces a new matrix-form atom set to naturally decouple the joint observations in both dimensions without loss of optimality. Accordingly, the original large-scale 2-D problem is equivalently reformulated via two decoupled one-level Toeplitz matrices, which can be solved by simple 1-D frequency estimation with pairing. Compared with the conventional vectorized approach, the proposed D-ANM technique reduces the computational complexity by several orders of magnitude with respect to the problem size. It also retains the benefits of ANM in terms of precise signal recovery, small number of required measurements, and robustness to source correlation. The complexity benefits are particularly attractive for large-scale antenna systems such as massive MIMO, radar signal processing and radio astronomy.

preprint2018arXiv

Low-complexity optimization for Two-Dimensional Direction-of-arrival Estimation via Decoupled Atomic Norm Minimization

This paper presents an efficient optimization technique for super-resolution two-dimensional (2D) direction of arrival (DOA) estimation by introducing a new formulation of atomic norm minimization (ANM). ANM allows gridless angle estimation for correlated sources even when the number of snapshots is far less than the antenna size, yet it incurs huge computational cost in 2D processing. This paper introduces a novel formulation of ANM via semi-definite programming, which expresses the original high-dimensional problem by two decoupled Toeplitz matrices in one dimension, followed by 1D angle estimation with automatic angle pairing. Compared with the state-of-the-art 2D ANM, the proposed technique reduces the computational complexity by several orders of magnitude with respect to the antenna size, while retaining the benefits of ANM in terms of super-resolution performance with use of a small number of measurements, and robustness to source correlation and noise. The complexity benefits are particularly attractive for large-scale antenna systems such as massive MIMO and radio astronomy.