Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
61works
0followers
34topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

61 published item(s)

preprint2026arXiv

A quantum machine learning classifier to search for new physics

Due to the success of the Standard Model~(SM), it is reasonable to anticipate that the signal of new physics~(NP) beyond the SM is small. Consequently, future searches for NP and precision tests of the SM will require high luminosity collider experiments. Moreover, as precision tests advance, rare processes with many final-state particles require consideration which demands the analysis of a vast number of observables. The high luminosity produces a large amount of experimental data spanning a large observable space, posing a significant data-processing challenge. In recent years, quantum machine learning has emerged as a promising approach for processing large amounts of complex data on a quantum computer. In this study, we propose quantum searching neighbor~(QSN) and variational QSN~(VQSN) algorithms to search for NP. The QSN is a classification algorithm. The VQSN introduces variation to the QSN to process classical data. As applications, we apply the (V)QSN in the phenomenological study of the NP at the Large Hadron Collider and muon colliders. Examples are implemented on a real quantum hardware, which confirms reliable performance under noisy conditions. The results indicate that the VQSN demonstrates superior efficiency in the sense of computational complexity to a classical counterpart k-nearest neighbor algorithm, even when dealing with classical data.

preprint2026arXiv

A Survey on Mapping Digital Systems with Bill of Materials: Development, Practices, and Challenges

Modern digital ecosystems, spanning software, hardware, learning models, datasets, and cryptographic products, continue to grow in complexity, making it difficult for organizations to understand and manage component dependencies. Bills of Materials (BOMs) have emerged as a structured way to document product components, their interrelationships, and key metadata, improving visibility and security across digital supply chains. This survey provides the first comprehensive cross-domain review of BOM developments and practices. We start by examining the evolution of BOM frameworks in three stages (i.e., pre-development, initial, and accelerated) and summarizing their core principles, key stakeholders, and standardization efforts for hardware, software, artificial intelligence (AI) models, datasets, and cryptographic assets. We then review industry practices for generating BOM data, evaluating its quality, and securely sharing it. Next, we review practical downstream uses of BOM data, including dependency modeling, compliance verification, operational risk assessment, and vulnerability tracking. We also discuss academic efforts to address limitations in current BOM frameworks through refinements, extensions, or new models tailored to emerging domains such as data ecosystems and AI supply chains. Finally, we identify four key gaps that limit the usability and reliability of today's BOM frameworks, motivating future research directions.

preprint2026arXiv

Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning

The integration of large language models (LLMs) with external tools has significantly expanded the capabilities of AI agents. However, as the diversity of both LLMs and tools increases, selecting the optimal model-tool combination becomes a high-dimensional optimization challenge. Existing approaches often rely on a single model or fixed tool-calling logic, failing to exploit the performance variations across heterogeneous model-tool pairs. In this paper, we present ATLAS (Adaptive Tool-LLM Alignment and Synergistic Invocation), a dual-path framework for dynamic tool usage in cross-domain complex reasoning. ATLAS operates via a dual-path approach: (1) \textbf{training-free cluster-based routing} that exploits empirical priors for domain-specific alignment, and (2) \textbf{RL-based multi-step routing} that explores autonomous trajectories for out-of-distribution generalization. Extensive experiments across 15 benchmarks demonstrate that our method outperforms closed-source models like GPT-4o, surpassing existing routing methods on both in-distribution (+10.1%) and out-of-distribution (+13.1%) tasks. Furthermore, our framework shows significant gains in visual reasoning by orchestrating specialized multi-modal tools.

preprint2026arXiv

DualKV: Shared-Prompt Flash Attention for Efficient RL Training with Large Rollouts and Long Contexts

Modern RL post-training methods such as GRPO and DAPO train on $N$ response sequences of $R$ tokens sampled from a shared prompt of $P$ tokens, but standard FlashAttention replicates all $P$ prompt tokens $N$ times across both forward and backward passes -- duplicating compute and memory on identical hidden states. In large-rollout, long-context RL training ($N{\geq}16$, $P{\geq}8\text{K}$), this redundancy dominates the policy update cost. We observe that in decoder-only models, causal masking makes prompt representations invariant across sequences at every layer, so all per-token operations (norms, projections, MLP) and attention can process the prompt once -- a property not yet exploited at the kernel level for training. We propose \textbf{DualKV}, the first FlashAttention kernel variant that eliminates shared-prompt replication during RL training, via (1)~fused CUDA forward and backward kernels that iterate over two disjoint KV regions -- shared context and per-sequence response -- in a single kernel launch, and (2)~a data-pipeline redesign in veRL that repacks $N(P{+}R)$ tokens into $P{+}NR$ tokens per micro-batch, extending the token reduction from attention to the entire model by a factor $ρ= N(P{+}R)/(P{+}NR)$. DualKV is mathematically equivalent to standard attention and introduces no approximation. On Qwen3-8B GRPO training with 8$\times$H100 GPUs ($N{=}32$, 8K-context), DualKV achieves $1.63$--$2.09\times$ policy-update speedup, enables $2\times$ larger micro-batches, and raises MFU from $36\%$ to $76\%$. Similar gains hold for DAPO ($2.47\times$ speedup, $77\%$ MFU). At 30B MoE scale on 16$\times$H100, DualKV achieves $3.82\times$ policy-update and $3.38\times$ end-to-end step speedup over FlashAttention (which requires 4-way Ulysses sequence parallelism to avoid OOM).

preprint2026arXiv

MaxCode: A Max-Reward Reinforcement Learning Framework for Automated Code Optimization

Large Language Models (LLMs) demonstrate strong capabilities in general coding tasks but encounter two key challenges when optimizing code: (i) the complexity of writing optimized code (such as performant CUDA kernels and competition-level CPU code) requires expertise in systems, algorithms and specific languages and (ii) requires interpretation of performance metrics like timing and device utilization beyond binary correctness. In this work, we explore inference-time search algorithms that guide the LLM to discover better solutions through iterative refinement based on execution feedback. Our approach, called MaxCode unifies existing search methods under a max-reward reinforcement learning framework, making the observation and action-value functions modular for modification. To enhance the observation space, we integrate a natural language critique model that converts raw execution feedback into diagnostic insights about errors and performance bottlenecks, and the best-discounted reward seen so far. Together, these provide richer input to the code proposal function. To improve exploration during search, we train a generative reward-to-go model using action values from rollouts to rerank potential solutions. Testing on the KernelBench (CUDA) and PIE (C++) optimization benchmarks shows that MaxCode improves optimized code performance compared to baselines, achieving 20.3% and 10.1% relative improvements in absolute speedup value and relative speedup ranking, respectively.

preprint2026arXiv

Revisiting Privacy Preservation in Brain-Computer Interfaces: Conceptual Boundaries, Risk Pathways, and a Protection-Strength Grading Framework

Brain-computer interfaces (BCIs) are moving rapidly from laboratory research into clinical, edge, and real-world settings. Under ISO/IEC 8663:2025, a BCI is a direct communication link between central nervous system activity and external software or hardware systems. This link expands privacy risk beyond raw neural-signal leakage: neural data, derived representations, model assets, and decoded outputs can be re-associated with individuals across collection, transmission, storage, training, inference, and feedback, or used to infer information beyond what a task requires. Starting from the general BCI paradigm, this review deffnes privacy-protection boundaries, protection objects, and the relationship between user data privacy and model privacy within a shared risk pathway. It then proposes a three-dimensional framework - protection object, lifecycle stage, and dominant protection-strength level - to classify existing work into four levels of protection strength. Finally, mental privacy and neuroethical risks are treated as open issues, emphasizing that BCI privacy protection should not only obscure data but also disentangle task-irrelevant sensitive information while preserving downstream utility. Keywords: Brain-computer interface, Neural data privacy, User data privacy, Model privacy, Disentanglement of task-irrelevant sensitive information, Protection-strength grading, Neuroethical risks

preprint2026arXiv

Superconductivity in Electron Liquids: Precision Many-Body Treatment of Coulomb Interaction

More than a century after discovery, the theory of conventional superconductivity remains incomplete. While the importance of electron-phonon coupling is understood, a controlled first-principles treatment of Coulomb interaction is lacking. Current ab initio calculations of superconductivity rely on a phenomenological downfolding approximation, replacing Coulomb interaction with a repulsive pseudopotential μ*, and leaving ambiguities in electron-phonon coupling with dynamical Coulomb interactions unresolved. We address this via an effective field theory approach, integrating out high-energy electronic degrees of freedom using variational Diagrammatic Monte Carlo. Applied to the uniform electron gas, this establishes a microscopic procedure to implement downfolding, define the pseudopotential, and express dynamical Coulomb effects on electron-phonon coupling via the electron vertex function. We find the bare pseudopotential significantly larger than conventional values. This yields improved pseudopotential estimates in simple metals and tests density functional perturbation theory accuracy for effective electron-phonon coupling. We present an ab initio workflow computing superconducting Tc from the anomalous vertex's precursory Cooper flow. This infers Tc from normal state calculations, enabling reliable estimates of very low Tc (including near quantum phase transitions) beyond conventional reach. Validating our approach on simple metals without empirical tuning, we resolve long-standing discrepancies and predict a pressure-induced transition in Al from superconducting to non-superconducting above ~60GPa. We propose ambient-pressure Mg and Na are proximal to a similar critical point. Our work establishes a controlled ab initio framework for electron-phonon superconductivity beyond the weak-correlation limit, paving the way for reliable Tc calculations and novel material design.

preprint2026arXiv

UniD-Shift: Towards Unified Semantic Segmentation via Interpretable Share-Private Multimodal Decomposition

Semantic segmentation of large-scale 3D point clouds is crucial for applications such as autonomous driving and urban digital twins. However, the sparse sampling pattern of LiDAR and the view-dependent geometric distortion in image observations complicate cross-modal alignment and hinder stable fusion. Inspired by the fact that 2D images captured by cameras are representations of the 3D world, we recognize that the features learned from 2D and 3D segmentation share some common semantics, while other aspects remain modality-specific. This insight motivates a unified multimodal framework for joint 2D-3D semantic segmentation. We combine a SAM-based vision encoder with a SPTNet-based geometric encoder to extract complementary semantic and geometric representations. The resulting features from both modalities are explicitly decomposed into shared and private subspaces, where the shared components summarize semantic factors common to both domains, and the private components preserve properties that are unique to each modality. A lightweight attention-based fusion module aggregates the shared features into a consistent cross-modal representation, and a regularized training objective ensures both semantic alignment and subspace independence. Experiments on the SemanticKITTI and nuScenes benchmarks demonstrate consistent improvements in segmentation accuracy over representative multimodal baselines, accompanied by competitive computational efficiency. Cross-domain evaluation on nuScenes USA-Singapore shows stable performance under distribution shifts, demonstrating strong generalization. The implementation code is publicly available at: https://github.com/shuaizhang69/UniD-Shift.

preprint2025arXiv

Comprehensive Study of Phonon Chirality under Symmetry Constraints

Phonons are quanta of lattice vibrations, and their modes (linear, circular, or stationary) are symmetry-determined. Circularly polarized phonons, possessing nonzero angular momentum (AM), have drawn widespread attention recently. Despite widespread use of pseudo-angular momentum (PAM) and circularly polarized light polarization flips to identify chiral phonons in Raman scattering, their reliability is debated due to symmetry dependence, and experimental verification standards remain lacking. Here, we systematically study phonon chirality and associated phenomena across magnetic point groups. We establish that the AM-PAM correlation is governed by both crystalline symmetry and Wyckoff positions, dictating conditions where nonzero AM manifests in PAM signatures. Crucially, phonons belonging to distinct irreducible representations exhibit distinct experimental benchmarks, enabling direct determination of crystalline chirality and symmetry classification. Furthermore, we report the discovery of a signature for symmetry-induced phenomena, notably a half-wave plate-analogous effect induced by mirror-odd phonons. Meanwhile, we conducted five experiments to validate our theory.

preprint2023arXiv

Offline Imitation Learning with Variational Counterfactual Reasoning

In offline imitation learning (IL), an agent aims to learn an optimal expert behavior policy without additional online environment interactions. However, in many real-world scenarios, such as robotics manipulation, the offline dataset is collected from suboptimal behaviors without rewards. Due to the scarce expert data, the agents usually suffer from simply memorizing poor trajectories and are vulnerable to variations in the environments, lacking the capability of generalizing to new environments. To automatically generate high-quality expert data and improve the generalization ability of the agent, we propose a framework named \underline{O}ffline \underline{I}mitation \underline{L}earning with \underline{C}ounterfactual data \underline{A}ugmentation (OILCA) by doing counterfactual inference. In particular, we leverage identifiable variational autoencoder to generate \textit{counterfactual} samples for expert data augmentation. We theoretically analyze the influence of the generated expert data and the improvement of generalization. Moreover, we conduct extensive experiments to demonstrate that our approach significantly outperforms various baselines on both \textsc{DeepMind Control Suite} benchmark for in-distribution performance and \textsc{CausalWorld} benchmark for out-of-distribution generalization. Our code is available at \url{https://github.com/ZexuSun/OILCA-NeurIPS23}.

preprint2022arXiv

A Tight Three-parameter Correlation and Related Classification on Gamma-Ray Bursts

Gamma-ray bursts (GRBs) are widely believed to be from massive collapsars and/or compact binary mergers, which accordingly, would generate long and short GRBs, respectively. The details on this classification scheme have been in constant debate given more and more observational data available to us. In this work, we apply a series of data mining methods to studying the potential classification information contained in the prompt emission of GRBs detected by the Fermi Gamma-ray Burst Monitor. A tight global correlation is found between fluence ($f$), peak flux ($F$) and prompt duration ($T_{90}$) which takes the form of $ \log {\it f}= 0.75 \log T_{90} +0.92 \log F -7.14$. Based on this correlation, we can define a new parameter $L = 1.66\log T_{90} + 0.84 \log {\it f} - 0.46 \log F + 3.24$ by linear discriminant analysis that would distinguish between long and short GRBs with much less ambiguity than $T_{90}$. We also discussed the three subclasses scheme of GRB classification derived from clusters analysis based on a Gaussian mixture model, and suggest that, besides SGRBs, LGRBs may be divided into long-bright gamma-ray bursts (LBGRBs) and long-faint gamma-ray bursts (LFGRBs), LBGRBs have statistical higher $f$ and $F$ than LFGRBs; further statistical analysis found that LBGRBs also have higher number of GRB pulses than LFGRBs.

preprint2022arXiv

Charge Carrier Mediation and Ferromagnetism induced in MnBi6Te10 Magnetic Topological Insulators by antimony doping

A new kind of intrinsic magnetic topological insulators (MTI) MnBi2Te4 family have shed light on the observation of novel topological quantum effect such as quantum anomalous Hall effect (QAHE). However, the strong anti-ferromagnetic (AFM) coupling and high carrier concentration in the bulk hinder the practical applications. In closely related materials MnBi4Te7 and MnBi6Te10, the interlayer magnetic coupling is greatly suppressed by Bi2Te3 layer intercalation. However, AFM is still the ground state in these compounds. Here by magnetic and transport measurements, we demonstrate that Sb substitutional dopant plays a dual role in MnBi6Te10, which can not only adjust the charge carrier type and the concentration, but also induce the solid into a ferromagnetic (FM) ground state. AFM ground state region which is also close to the charge neutral point can be found in the phase diagram of Mn(SbxBi1-x)6Te10 when x ~ 0.25. An intrinsic FM-MTI candidate is thus demonstrated, and it may take a step further for the realization of high-quality and high-temperature QAHE and the related topological quantum effects in the future.

preprint2022arXiv

Chat-to-Design: AI Assisted Personalized Fashion Design

In this demo, we present Chat-to-Design, a new multimodal interaction system for personalized fashion design. Compared to classic systems that recommend apparel based on keywords, Chat-to-Design enables users to design clothes in two steps: 1) coarse-grained selection via conversation and 2) fine-grained editing via an interactive interface. It encompasses three sub-systems to deliver an immersive user experience: A conversation system empowered by natural language understanding to accept users' requests and manages dialogs; A multimodal fashion retrieval system empowered by a large-scale pretrained language-image network to retrieve requested apparel; A fashion design system empowered by emerging generative techniques to edit attributes of retrieved clothes.

preprint2022arXiv

Divergence-aware Federated Self-Supervised Learning

Self-supervised learning (SSL) is capable of learning remarkable representations from centrally available data. Recent works further implement federated learning with SSL to learn from rapidly growing decentralized unlabeled images (e.g., from cameras and phones), often resulted from privacy constraints. Extensive attention has been paid to SSL approaches based on Siamese networks. However, such an effort has not yet revealed deep insights into various fundamental building blocks for the federated self-supervised learning (FedSSL) architecture. We aim to fill in this gap via in-depth empirical study and propose a new method to tackle the non-independently and identically distributed (non-IID) data problem of decentralized data. Firstly, we introduce a generalized FedSSL framework that embraces existing SSL methods based on Siamese networks and presents flexibility catering to future methods. In this framework, a server coordinates multiple clients to conduct SSL training and periodically updates local models of clients with the aggregated global model. Using the framework, our study uncovers unique insights of FedSSL: 1) stop-gradient operation, previously reported to be essential, is not always necessary in FedSSL; 2) retaining local knowledge of clients in FedSSL is particularly beneficial for non-IID data. Inspired by the insights, we then propose a new approach for model update, Federated Divergence-aware Exponential Moving Average update (FedEMA). FedEMA updates local models of clients adaptively using EMA of the global model, where the decay rate is dynamically measured by model divergence. Extensive experiments demonstrate that FedEMA outperforms existing methods by 3-4% on linear evaluation. We hope that this work will provide useful insights for future research.

preprint2022arXiv

EasyFL: A Low-code Federated Learning Platform For Dummies

Academia and industry have developed several platforms to support the popular privacy-preserving distributed learning method -- Federated Learning (FL). However, these platforms are complex to use and require a deep understanding of FL, which imposes high barriers to entry for beginners, limits the productivity of researchers, and compromises deployment efficiency. In this paper, we propose the first low-code FL platform, EasyFL, to enable users with various levels of expertise to experiment and prototype FL applications with little coding. We achieve this goal while ensuring great flexibility and extensibility for customization by unifying simple API design, modular design, and granular training flow abstraction. With only a few lines of code, EasyFL empowers them with many out-of-the-box functionalities to accelerate experimentation and deployment. These practical functionalities are heterogeneity simulation, comprehensive tracking, distributed training optimization, and seamless deployment. They are proposed based on challenges identified in the proposed FL life cycle. Compared with other platforms, EasyFL not only requires just three lines of code (at least 10x lesser) to build a vanilla FL application but also incurs lower training overhead. Besides, our evaluations demonstrate that EasyFL expedites distributed training by 1.5x. It also improves the efficiency of deployment. We believe that EasyFL will increase the productivity of researchers and democratize FL to wider audiences.

preprint2022arXiv

Federated Unsupervised Domain Adaptation for Face Recognition

Given labeled data in a source domain, unsupervised domain adaptation has been widely adopted to generalize models for unlabeled data in a target domain, whose data distributions are different. However, existing works are inapplicable to face recognition under privacy constraints because they require sharing of sensitive face images between domains. To address this problem, we propose federated unsupervised domain adaptation for face recognition, FedFR. FedFR jointly optimizes clustering-based domain adaptation and federated learning to elevate performance on the target domain. Specifically, for unlabeled data in the target domain, we enhance a clustering algorithm with distance constrain to improve the quality of predicted pseudo labels. Besides, we propose a new domain constraint loss (DCL) to regularize source domain training in federated learning. Extensive experiments on a newly constructed benchmark demonstrate that FedFR outperforms the baseline and classic methods on the target domain by 3% to 14% on different evaluation metrics.

preprint2022arXiv

Giant and Reversible Electronic Structure Evolution in a Magnetic Topological Material EuCd2As2

The electronic structure and the physical properties of quantum materials can be significantly altered by charge carrier doping and magnetic state transition. Here we report a discovery of a giant and reversible electronic structure evolution with doping in a magnetic topological material. By performing high-resolution angle-resolved photoemission measurements on EuCd2As2,we found that a huge amount of hole doping can be introduced into the sample surface due to surface absorption. The electronic structure exhibits a dramatic change with the hole doping which can not be described by a rigid band shift. Prominent band splitting is observed at high doping which corresponds to a doping-induced magnetic transition at low temperature (below -15 K) from an antiferromagnetic state to a ferromagnetic state. These results have established a detailed electronic phase diagram of EuCd2As2 where the electronic structure and the magnetic structure change systematically and dramatically with the doping level. They further suggest that the transport, magnetic and topological properties of EuCd2As2 can be greatly modified by doping. These work will stimulate further investigations to explore for new phenomena and properties in doping this magnetic topological material.

preprint2022arXiv

How does unlabeled data improve generalization in self-training? A one-hidden-layer theoretical analysis

Self-training, a semi-supervised learning algorithm, leverages a large amount of unlabeled data to improve learning when the labeled data are limited. Despite empirical successes, its theoretical characterization remains elusive. To the best of our knowledge, this work establishes the first theoretical analysis for the known iterative self-training paradigm and proves the benefits of unlabeled data in both training convergence and generalization ability. To make our theoretical analysis feasible, we focus on the case of one-hidden-layer neural networks. However, theoretical understanding of iterative self-training is non-trivial even for a shallow neural network. One of the key challenges is that existing neural network landscape analysis built upon supervised learning no longer holds in the (semi-supervised) self-training paradigm. We address this challenge and prove that iterative self-training converges linearly with both convergence rate and generalization accuracy improved in the order of $1/\sqrt{M}$, where $M$ is the number of unlabeled samples. Experiments from shallow neural networks to deep neural networks are also provided to justify the correctness of our established theoretical insights on self-training.

preprint2022arXiv

Large Exchange Bias Effect and Coverage-Dependent Interfacial Coupling in CrI3/MnBi2Te4 van der Waals Heterostructures

Igniting interface magnetic ordering of magnetic topological insulators by building a van der Waals heterostructure can help to reveal novel quantum states and design functional devices. Here, we observe an interesting exchange bias effect, indicating successful interfacial magnetic coupling, in CrI3/MnBi2Te4 ferromagnetic insulator/antiferromagnetic topological insulator (FMI/AFM-TI) heterostructure devices. The devices originally exhibit a negative exchange bias field, which decays with increasing temperature and is unaffected by the back-gate voltage. When we change the device configuration to be half-covered by CrI3, the exchange bias becomes positive with a very large exchange bias field exceeding 300 mT. Such sensitive manipulation is explained by the competition between the FM and AFM coupling at the interface of CrI3 and MnBi2Te4, pointing to coverage-dependent interfacial magnetic interactions. Our work will facilitate the development of topological and antiferromagnetic devices.

preprint2022arXiv

Modelling graph dynamics in fraud detection with "Attention"

At online retail platforms, detecting fraudulent accounts and transactions is crucial to improve customer experience, minimize loss, and avoid unauthorized transactions. Despite the variety of different models for deep learning on graphs, few approaches have been proposed for dealing with graphs that are both heterogeneous and dynamic. In this paper, we propose DyHGN (Dynamic Heterogeneous Graph Neural Network) and its variants to capture both temporal and heterogeneous information. We first construct dynamic heterogeneous graphs from registration and transaction data from eBay. Then, we build models with diachronic entity embedding and heterogeneous graph transformer. We also use model explainability techniques to understand the behaviors of DyHGN-* models. Our findings reveal that modelling graph dynamics with heterogeneous inputs need to be conducted with "attention" depending on the data structure, distribution, and computation cost.

preprint2022arXiv

Nanoscale three-dimensional magnetic sensing with a probabilistic nanomagnet driven by spin-orbit torque

Detection of vector magnetic fields at nanoscale dimensions is critical in applications ranging from basic material science, to medical diagnostic. Meanwhile, an all-electric operation is of great significance for achieving a simple and compact sensing system. Here, we propose and experimentally demonstrate a simple approach to sensing a vector magnetic field at nanoscale dimensions, by monitoring a probabilistic nanomagnet's transition probability from a metastable state, excited by a driving current due to SOT, to a settled state. We achieve sensitivities for Hx, Hy, and Hz of 1.02%/Oe, 1.09%/Oe and 3.43%/Oe, respectively, with a 200 x 200 nm^2 nanomagnet. The minimum detectable field is dependent on the driving pulse events N, and is expected to be as low as 1 uT if N = 3 x 10^6.

preprint2022arXiv

Optimizing Performance of Federated Person Re-identification: Benchmarking and Analysis

The increasingly stringent data privacy regulations limit the development of person re-identification (ReID) because person ReID training requires centralizing an enormous amount of data that contains sensitive personal information. To address this problem, we introduce federated person re-identification (FedReID) -- implementing federated learning, an emerging distributed training method, to person ReID. FedReID preserves data privacy by aggregating model updates, instead of raw data, from clients to a central server. Furthermore, we optimize the performance of FedReID under statistical heterogeneity via benchmark analysis. We first construct a benchmark with an enhanced algorithm, two architectures, and nine person ReID datasets with large variances to simulate the real-world statistical heterogeneity. The benchmark results present insights and bottlenecks of FedReID under statistical heterogeneity, including challenges in convergence and poor performance on datasets with large volumes. Based on these insights, we propose three optimization approaches: (1) We adopt knowledge distillation to facilitate the convergence of FedReID by better transferring knowledge from clients to the server; (2) We introduce client clustering to improve the performance of large datasets by aggregating clients with similar data distributions; (3) We propose cosine distance weight to elevate performance by dynamically updating the weights for aggregation depending on how well models are trained in clients. Extensive experiments demonstrate that these approaches achieve satisfying convergence with much better performance on all datasets. We believe that FedReID will shed light on implementing and optimizing federated learning on more computer vision applications.

preprint2022arXiv

Reducing language context confusion for end-to-end code-switching automatic speech recognition

Code-switching deals with alternative languages in communication process. Training end-to-end (E2E) automatic speech recognition (ASR) systems for code-switching is especially challenging as code-switching training data are always insufficient to combat the increased multilingual context confusion due to the presence of more than one language. We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model based on the Equivalence Constraint (EC) Theory. The linguistic theory requires that any monolingual fragment that occurs in the code-switching sentence must occur in one of the monolingual sentences. The theory establishes a bridge between monolingual data and code-switching data. We leverage this linguistics theory to design the code-switching E2E ASR model. The proposed model efficiently transfers language knowledge from rich monolingual data to improve the performance of the code-switching ASR model. We evaluate our model on ASRU 2019 Mandarin-English code-switching challenge dataset. Compared to the baseline model, our proposed model achieves a 17.12% relative error reduction.

preprint2022arXiv

Secure two-way fiber-optic time transfer against sub-ns asymmetric delay attack

Two-way fiber-optic time transfer is a promising precise time synchronization technique with sub-nanosecond accuracy. However, asymmetric delay attack is a serious threat which cannot be prevent by any encryption method. In this paper, a dynamic model based scheme is proposed to defense the sub-nanosecond asymmetric delay attack. A threshold is set according to the estimated time difference by a two-state clock model where the fixed frequency difference is excluded from the time difference to detect the asymmetric delay attack which is smaller than the time difference induced by the fixed frequency difference. Theoretical simulation and experimental demonstration are implemented to prove the feasibility of the scheme. A two-way fiber-optic time transfer system with time stability with 24.5ps, 3.98ps, and 2.95ps at 1s, 10s, and 100s averaging time is shown under sub-ns asymmetric time delay attack experimentally. The proposed method provides a promising secure sub-ns precise time synchronization technique against asymmetric delay attack.

preprint2022arXiv

SLAM-TKA: Real-time Intra-operative Measurement of Tibial Resection Plane in Conventional Total Knee Arthroplasty

Total knee arthroplasty (TKA) is a common orthopaedic surgery to replace a damaged knee joint with artificial implants. The inaccuracy of achieving the planned implant position can result in the risk of implant component aseptic loosening, wear out, and even a joint revision, and those failures most of the time occur on the tibial side in the conventional jig-based TKA (CON-TKA). This study aims to precisely evaluate the accuracy of the proximal tibial resection plane intra-operatively in real-time such that the evaluation processing changes very little on the CON-TKA operative procedure. Two X-ray radiographs captured during the proximal tibial resection phase together with a pre-operative patient-specific tibia 3D mesh model segmented from computed tomography (CT) scans and a trocar pin 3D mesh model are used in the proposed simultaneous localisation and mapping (SLAM) system to estimate the proximal tibial resection plane. Validations using both simulation and in-vivo datasets are performed to demonstrate the robustness and the potential clinical value of the proposed algorithm.

preprint2022arXiv

Smart Multi-tenant Federated Learning

Federated learning (FL) is an emerging distributed machine learning method that empowers in-situ model training on decentralized edge devices. However, multiple simultaneous training activities could overload resource-constrained devices. In this work, we propose a smart multi-tenant FL system, MuFL, to effectively coordinate and execute simultaneous training activities. We first formalize the problem of multi-tenant FL, define multi-tenant FL scenarios, and introduce a vanilla multi-tenant FL system that trains activities sequentially to form baselines. Then, we propose two approaches to optimize multi-tenant FL: 1) activity consolidation merges training activities into one activity with a multi-task architecture; 2) after training it for rounds, activity splitting divides it into groups by employing affinities among activities such that activities within a group have better synergy. Extensive experiments demonstrate that MuFL outperforms other methods while consuming 40% less energy. We hope this work will inspire the community to further study and optimize multi-tenant FL.

preprint2022arXiv

Topological States in Chevrel Phase Materials from First-principle Calculations

Chevrel phase materials form a family of ternary molybdenum chalcogenides with a general chemical formula $A_x{\rm Mo}_6X_8$ ($A$ = metal elements, $X$ = chalcogen). The variety of $A$ atoms makes a large number of family members and leads to many tunable physical properties, such as the superconductivity, thermoelectricity and the ionic conductivity. In this work, we have further found various nontrivial band topological states in these materials by using first-principle calculations. The compounds having time-reversal symmetry, such as ${\rm BaMo}_6{\rm S}_8$, ${\rm SrMo}_6{\rm S}_8$, and ${\rm Mo}_6{\rm S}_8$, are topological insulators in both of the $R\bar{3}$ and $P\bar{1}$ phases, whereas ${\rm EuMo}_6{\rm S}_8$ within ferromagnetic state, it is an axion insulator in the $R\bar{3}$ phase and a trivial one in the $P\bar{1}$ phase. This indicates that the change of $A$ ions can modify the chemical potential, lattice distortion, and magnetic orders, which offers a unique way to influence the topological states and other properties. We hope this work can stimulate further studies of Chevrel phase materials to find more intriguing phenomena, such as topological superconducting states and Majorana modes.

preprint2022arXiv

Two-dimensional Obstructed Atomic Insulators with Fractional Corner Charge in MA$_2$Z$_4$ Family

According to topological quantum chemistry, a class of electronic materials have been called obstructed atomic insulators (OAIs), in which a portion of valence electrons necessarily have their centers located on some empty $\textit{Wyckoff}$ positions without atoms occupation in the lattice. The obstruction of centering these electrons coinciding with their host atoms is nontrivial and results in metallic boundary states when the boundary is properly cut. Here, on basis of first-principles calculations in combination with topological quantum chemistry analysis, we propose two dimensional MA$_2$Z$_4$ (M = Cr, Mo and W; A = Si and Ge, Z = N, P and As) monolayer family are all OAIs. A typical case is the recently synthesized MoSi$_2$N$_4$. Although it is a topological trivial insulator with the occupied electronic states being integer combination of elementary band representations, it has valence electrons centering empty $\textit{Wyckoff}$ positions. It exhibits unique OAI-induced metallic edge states along the (1$\bar{1}$0) edge of MoSi$_2$N$_4$ monolayer and the in-gap corner states at three vertices of certain hexagonal nanodisk samples respecting C$_3$ rotation symmetry. The readily synthesized MoSi$_2$N$_4$ is quite stable and has a large bulk band gap of 1.94 eV, which makes the identification of these edge and corner states most possible for experimental clarification.

preprint2022arXiv

xFraud: Explainable Fraud Transaction Detection

At online retail platforms, it is crucial to actively detect the risks of transactions to improve customer experience and minimize financial loss. In this work, we propose xFraud, an explainable fraud transaction prediction framework which is mainly composed of a detector and an explainer. The xFraud detector can effectively and efficiently predict the legitimacy of incoming transactions. Specifically, it utilizes a heterogeneous graph neural network to learn expressive representations from the informative heterogeneously typed entities in the transaction logs. The explainer in xFraud can generate meaningful and human-understandable explanations from graphs to facilitate further processes in the business unit. In our experiments with xFraud on real transaction networks with up to 1.1 billion nodes and 3.7 billion edges, xFraud is able to outperform various baseline models in many evaluation metrics while remaining scalable in distributed settings. In addition, we show that xFraud explainer can generate reasonable explanations to significantly assist the business analysis via both quantitative and qualitative evaluations.

preprint2021arXiv

Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with $1/n$ Parameters

Recent works have demonstrated reasonable success of representation learning in hypercomplex space. Specifically, "fully-connected layers with Quaternions" (4D hypercomplex numbers), which replace real-valued matrix multiplications in fully-connected layers with Hamilton products of Quaternions, both enjoy parameter savings with only 1/4 learnable parameters and achieve comparable performance in various applications. However, one key caveat is that hypercomplex space only exists at very few predefined dimensions (4D, 8D, and 16D). This restricts the flexibility of models that leverage hypercomplex multiplications. To this end, we propose parameterizing hypercomplex multiplications, allowing models to learn multiplication rules from data regardless of whether such rules are predefined. As a result, our method not only subsumes the Hamilton product, but also learns to operate on any arbitrary nD hypercomplex space, providing more architectural flexibility using arbitrarily $1/n$ learnable parameters compared with the fully-connected layer counterpart. Experiments of applications to the LSTM and Transformer models on natural language inference, machine translation, text style transfer, and subject verb agreement demonstrate architectural flexibility and effectiveness of the proposed approach.

preprint2021arXiv

Crystal and Electronic Structure of GaTa$_4$Se$_8$ From First-Principle Calculations

GaTa$_4$Se$_8$ belongs to the lacunar spinel family. Its crystal structures is still a puzzle though there have been intensive studies on its novel properties, such as the Mott insulator phase and superconductivity under pressure. In this work, we investigate its phonon spectra through first-principle calculations and proposed it most probably has crystal structure phase transition, which is consistent with several experimental observations. For the prototype lacunar spinel with cubic symmetry of space group $F\bar{4}3m$, its phonon spectra have three soft modes in the whole Brillouin zone, indicating the strong dynamical instability of such crystal structure. In order to find the dynamically stable crystal structure, further calculations indicate two new structures of GaTa$_4$Se$_8$, corresponding to $R3m$ and $P\bar{4}2_{1}m$, verifying that at the ambient pressure, there does exist structure phase transition of GaTa$_4$Se$_8$ from $F\bar{4}3m$ to other structures when the temperature is lowered. We also performed electronic structure calculation for $R3m$ and $P\bar{4}2_{1}m$ structure, showing that $P\bar{4}2_{1}m$ structure GaTa$_4$Se$_8$ is band insulator, and obtained Mott insulator state for $R3m$ structure by DMFT calculation under single-band Hubbard model picture when interaction parameter U is larger than 0.40 eV vs. band width of 0.25 eV. It is reasonable to assume that while lowering the temperature, $F\bar{4}3m$ structure GaTa$_4$Se$_8$ becomes $R3m$ structure GaTa$_4$Se$_8$ first, then $P\bar{4}2_{1}m$ structure GaTa$_4$Se$_8$, because of the symmetry of $P\bar{4}2_{1}m$ is lower than $R3m$ after Jahn-Teller distortion. The structure transition may explain the magnetic susceptibility anomalous at low temperature.

preprint2021arXiv

Experimental evidence on the dissipationless transport of chiral edge state of the high-field Chern insulator in MnBi2Te4 nanodevices

We demonstrate the dissipationless transport of the chiral edge state (CES) in the nanodevices of quantum anomalous Hall insulator candidate MnBi2Te4. The device presents a near-zero longitudinal resistance together with a quantized Hall plateau in excess of 0.97 h/e2 over a range of temperatures from very low up to the Neel temperature of 22 K. Each of four-probe nonlocal measurements gives near-zero resistance and two-probe measurements exhibit a plateau of +1 h/e2, while the results of three-probe nonlocal measurements depend on the magnetic field. This indicates non-dissipation as well as the chirality of the edge state. The CES shows three regimes of temperature dependence, i.e., well-preserved dissipationless transport below 6 K, variable range hopping while increasing the temperature and thermal activation at higher than 22 K. Even at the lowest temperature, a current of over 1.4 μA breaks the dissipationless transport. These form a complete set of evidences of the Chern insulator state in the MnBi2Te4 systems.

preprint2021arXiv

MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions

Deep neural networks are widely used in personalized recommendation systems. Unlike regular DNN inference workloads, recommendation inference is memory-bound due to the many random memory accesses needed to lookup the embedding tables. The inference is also heavily constrained in terms of latency because producing a recommendation for a user must be done in about tens of milliseconds. In this paper, we propose MicroRec, a high-performance inference engine for recommendation systems. MicroRec accelerates recommendation inference by (1) redesigning the data structures involved in the embeddings to reduce the number of lookups needed and (2) taking advantage of the availability of High-Bandwidth Memory (HBM) in FPGA accelerators to tackle the latency by enabling parallel lookups. We have implemented the resulting design on an FPGA board including the embedding lookup step as well as the complete inference process. Compared to the optimized CPU baseline (16 vCPU, AVX2-enabled), MicroRec achieves 13.8~14.7x speedup on embedding lookup alone and 2.5$~5.4x speedup for the entire recommendation inference in terms of throughput. As for latency, CPU-based engines needs milliseconds for inferring a recommendation while MicroRec only takes microseconds, a significant advantage in real-time recommendation systems.

preprint2021arXiv

Mimicing the Kane-Mele type spin orbit interaction by spin-flexual phonon coupling in graphene devices

On the efforts of enhancing the spin orbit interaction (SOI) of graphene for seeking the dissipationless quantum spin Hall devices, unique Kane-Mele type SOI and high mobility samples are desired. However, common external decoration often introduces extrinsic Rashba-type SOI and simultaneous impurity scattering. Here we show, by the EDTA-Dy molecule decorating, the Kane-Mele type SOI is mimicked with even improved carrier mobility. It is evidenced by the suppressed weak localization at equal carrier densities and simultaneous Elliot-Yafet spin relaxation. The extracted spin scattering time is monotonically dependent on the carrier elastic scattering time, where the Elliot-Yafet plot gives the interaction strength of 3.3 meV. Improved quantum Hall plateaus can be even seen after the external operation. This is attributed to the spin-flexural phonon coupling induced by the enhanced graphene ripples, as revealed by the in-plane magnetotransport measurement.

preprint2021arXiv

Nature of the bonded-to-atomic transition in liquid silica to TPa pressures

First-principles calculations and analysis of the thermodynamic, structural, and electronic properties of liquid SiO$_2$ characterize the bonded-to-atomic transition at 0.1--1.6 TPa and 10$^4$--10$^5$ K (1--7 eV), the high-energy-density regime relevant to understanding planetary interiors. We find strong ionic bonds that become short-lived due to high kinetics during the transition, with sensitivity of the transition temperature to pressure, and our calculated Hugoniots agree with past experimental data. These results reconcile previous experimental and theoretical findings by clarifying the nature of the bond dissociation process in early Earth and "rocky" (oxide) constituents of large planets.

preprint2021arXiv

Quantitative analysis of diffraction by liquids using a pink-spectrum X-ray source

We describes a new approach for performing quantitative structure-factor analysis and density measurements of liquids using x-ray diffraction with a pink-spectrum x-ray source. The methodology corrects for the pink beam effect by performing a Taylor series expansion of the diffraction signal. The mean density, background scale factor, peak x-ray energy about which the expansion is performed, and the cutoff radius for density measurement are estimated using the derivative-free optimization scheme. The formalism is demonstrated for a simulated radial distribution function for tin. Finally, the proposed methodology is applied to experimental data on shock compressed tin recorded at the Dynamic Compression Sector at the Advanced Photon Source, with derived densities comparing favorably to other experimental results and the equations of state of tin.

preprint2021arXiv

Topology Aware Deep Learning for Wireless Network Optimization

Data-driven machine learning approaches have recently been proposed to facilitate wireless network optimization by learning latent knowledge from historical optimization instances. However, existing methods do not well handle the topology information that directly impacts the network optimization results. Directly operating on simple representations, e.g., adjacency matrices, results in poor generalization performance as the learned results depend on specific ordering of the network elements in the training data. To address this issue, we propose a two-stage topology-aware machine learning framework (TALF), which trains a graph embedding unit and a deep feed-forward network (DFN) jointly. By propagating and summarizing the underlying graph topological information, TALF encodes the topology in the vector representation of the optimization instance, which is used by the later DFN to infer critical structures of an optimal or near-optimal solution. The proposed approach is evaluated on a canonical wireless network flow problem with diverse network typologies and flow deployments. In-depth study on trade-off between efficiency and effectiveness of the inference results is also conducted, and we show that our approach is better at differentiate links by saving up to 60% computation time at over 90% solution quality.

preprint2021arXiv

TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition

The autoregressive (AR) models, such as attention-based encoder-decoder models and RNN-Transducer, have achieved great success in speech recognition. They predict the output sequence conditioned on the previous tokens and acoustic encoded states, which is inefficient on GPUs. The non-autoregressive (NAR) models can get rid of the temporal dependency between the output tokens and predict the entire output tokens in at least one step. However, the NAR model still faces two major problems. On the one hand, there is still a great gap in performance between the NAR models and the advanced AR models. On the other hand, it's difficult for most of the NAR models to train and converge. To address these two problems, we propose a new model named the two-step non-autoregressive transformer(TSNAT), which improves the performance and accelerating the convergence of the NAR model by learning prior knowledge from a parameters-sharing AR model. Furthermore, we introduce the two-stage method into the inference process, which improves the model performance greatly. All the experiments are conducted on a public Chinese mandarin dataset ASIEHLL-1. The results show that the TSNAT can achieve a competitive performance with the AR model and outperform many complicated NAR models.

preprint2020arXiv

A Gd@C82-based single molecular electret device with switchable electrical polarization

Single molecular electrets exhibiting single molecule electric polarization switching have been long desired as a platform for extremely small non-volatile storage devices, although it is controversial because of the poor stability of single molecular electric dipoles. Here we study the single molecular device of GdC82, where the encapsulated Gd atom forms a charge center, and we have observed a gate controlled switching behavior between two sets of single electron transport stability diagrams. The switching is operated in a hysteresis loop with a coercive gate field of around 0.5Vnm. Theoretical calculations have assigned the two conductance diagrams to corresponding energy levels of two states that the Gd atom is trapped at two different sites of the C82 cage, which possess two different permanent electrical dipole orientations. The two dipole states are stabilized by the anisotropic energy and separated by a transition energy barrier of 70 meV. Such switching is then accessed to the electric field driven reorientation of individual dipole while overcoming the barriers by the coercive gate field, and demonstrates the creation of a single molecular electret.

preprint2020arXiv

A Practical Chinese Dependency Parser Based on A Large-scale Dataset

Dependency parsing is a longstanding natural language processing task, with its outputs crucial to various downstream tasks. Recently, neural network based (NN-based) dependency parsing has achieved significant progress and obtained the state-of-the-art results. As we all know, NN-based approaches require massive amounts of labeled training data, which is very expensive because it requires human annotation by experts. Thus few industrial-oriented dependency parser tools are publicly available. In this report, we present Baidu Dependency Parser (DDParser), a new Chinese dependency parser trained on a large-scale manually labeled dataset called Baidu Chinese Treebank (DuCTB). DuCTB consists of about one million annotated sentences from multiple sources including search logs, Chinese newswire, various forum discourses, and conversation programs. DDParser is extended on the graph-based biaffine parser to accommodate to the characteristics of Chinese dataset. We conduct experiments on two test sets: the standard test set with the same distribution as the training set and the random test set sampled from other sources, and the labeled attachment scores (LAS) of them are 92.9% and 86.9% respectively. DDParser achieves the state-of-the-art results, and is released at https://github.com/baidu/DDParser.

preprint2020arXiv

Accelerating Auxiliary-Field Quantum Monte Carlo Simulations of Solids with Graphical Processing Unit

We outline how auxiliary-field quantum Monte Carlo (AFQMC) can leverage graphical processing units (GPUs) to accelerate the simulation of solid state sytems. By exploiting conservation of crystal momentum in the one- and two-electron integrals we show how to efficiently formulate the algorithm to best utilize current GPU architectures. We provide a detailed description of different optimization strategies and profile our implementation relative to standard approaches, demonstrating a factor of 40 speed up over a CPU implementation. With this increase in computational power we demonstrate the ability of AFQMC to systematically converge solid state calculations with respect to basis set and system size by computing the cohesive energy of Carbon in the diamond structure to within 0.02 eV of the experimental result.

preprint2020arXiv

Automated Radiological Report Generation For Chest X-Rays With Weakly-Supervised End-to-End Deep Learning

The chest X-Ray (CXR) is the one of the most common clinical exam used to diagnose thoracic diseases and abnormalities. The volume of CXR scans generated daily in hospitals is huge. Therefore, an automated diagnosis system able to save the effort of doctors is of great value. At present, the applications of artificial intelligence in CXR diagnosis usually use pattern recognition to classify the scans. However, such methods rely on labeled databases, which are costly and usually have large error rates. In this work, we built a database containing more than 12,000 CXR scans and radiological reports, and developed a model based on deep convolutional neural network and recurrent network with attention mechanism. The model learns features from the CXR scans and the associated raw radiological reports directly; no additional labeling of the scans are needed. The model provides automated recognition of given scans and generation of reports. The quality of the generated reports was evaluated with both the CIDEr scores and by radiologists as well. The CIDEr scores are found to be around 5.8 on average for the testing dataset. Further blind evaluation suggested a comparable performance against human radiologist.

preprint2020arXiv

Computational prediction of RNA tertiary structures using machine learning methods

RNAs play crucial and versatile roles in biological processes. Computational prediction approaches can help to understand RNA structures and their stabilizing factors, thus providing information on their functions, and facilitating the design of new RNAs. Machine learning (ML) techniques have made tremendous progress in many fields in the past few years. Although their usage in protein-related fields has a long history, the use of ML methods in predicting RNA tertiary structures is new and rare. Here, we review the recent advances of using ML methods on RNA structure predictions and discuss the advantages and limitation, the difficulties and potentials of these approaches when applied in the field.

preprint2020arXiv

DeGNN: Characterizing and Improving Graph Neural Networks with Graph Decomposition

Despite the wide application of Graph Convolutional Network (GCN), one major limitation is that it does not benefit from the increasing depth and suffers from the oversmoothing problem. In this work, we first characterize this phenomenon from the information-theoretic perspective and show that under certain conditions, the mutual information between the output after $l$ layers and the input of GCN converges to 0 exponentially with respect to $l$. We also show that, on the other hand, graph decomposition can potentially weaken the condition of such convergence rate, which enabled our analysis for GraphCNN. While different graph structures can only benefit from the corresponding decomposition, in practice, we propose an automatic connectivity-aware graph decomposition algorithm, DeGNN, to improve the performance of general graph neural networks. Extensive experiments on widely adopted benchmark datasets demonstrate that DeGNN can not only significantly boost the performance of corresponding GNNs, but also achieves the state-of-the-art performances.

preprint2020arXiv

Fast Learning of Graph Neural Networks with Guaranteed Generalizability: One-hidden-layer Case

Although graph neural networks (GNNs) have made great progress recently on learning from graph-structured data in practice, their theoretical guarantee on generalizability remains elusive in the literature. In this paper, we provide a theoretically-grounded generalizability analysis of GNNs with one hidden layer for both regression and binary classification problems. Under the assumption that there exists a ground-truth GNN model (with zero generalization error), the objective of GNN learning is to estimate the ground-truth GNN parameters from the training data. To achieve this objective, we propose a learning algorithm that is built on tensor initialization and accelerated gradient descent. We then show that the proposed learning algorithm converges to the ground-truth GNN model for the regression problem, and to a model sufficiently close to the ground-truth for the binary classification problem. Moreover, for both cases, the convergence rate of the proposed learning algorithm is proven to be linear and faster than the vanilla gradient descent algorithm. We further explore the relationship between the sample complexity of GNNs and their underlying graph properties. Lastly, we provide numerical experiments to demonstrate the validity of our analysis and the effectiveness of the proposed learning algorithm for GNNs.

preprint2020arXiv

First-Principles Equation of State Database for Warm Dense Matter Computation

We put together a first-principles equation of state (FPEOS) database for matter at extreme conditions by combining results from path integral Monte Carlo and density functional molecular dynamics simulations of the elements H, He, B, C, N, O, Ne, Na, Mg, Al and Si as well as the compounds LiF, B4C, BN, CH4, CH2, C2H3, CH, C2H, MgO, and MgSiO3. For all these materials, we provide the pressure and internal energy over a density-temperature range from ~0.5 to 50 g/cc and from ~10^4 to 10^9 K, which are based on ~5000 different first-principles simulations. We compute isobars, adiabats and shock Hugoniot curves in the regime of L and K shell ionization. Invoking the linear mixing approximation, we study the properties of mixtures at high density and temperature. We derive the Hugoniot curves for water and alumina as well as for carbon-oxygen, helium-neon, and CH-silicon mixtures. We predict the maximal shock compression ratios of H2O, H2O2, Al2O3, CO, and CO2 to be 4.61, 4.64, 4.64, 4.89, and 4.83, respectively. Finally we use the FPEOS database to determine the points of maximum shock compression for all available binary mixtures. We identify mixtures that reach higher shock compression ratios than their endmembers. We discuss trends common to all mixtures in pressure-temperature and particle-shock velocity spaces. In the supplementary material, we provide all FPEOS tables as well as computer codes for interpolation, Hugoniot calculations, and plots of various thermodynamic functions.

preprint2020arXiv

Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition

Although attention based end-to-end models have achieved promising performance in speech recognition, the multi-pass forward computation in beam-search increases inference time cost, which limits their practical applications. To address this issue, we propose a non-autoregressive end-to-end speech recognition system called LASO (listen attentively, and spell once). Because of the non-autoregressive property, LASO predicts a textual token in the sequence without the dependence on other tokens. Without beam-search, the one-pass propagation much reduces inference time cost of LASO. And because the model is based on the attention based feedforward structure, the computation can be implemented in parallel efficiently. We conduct experiments on publicly available Chinese dataset AISHELL-1. LASO achieves a character error rate of 6.4%, which outperforms the state-of-the-art autoregressive transformer model (6.7%). The average inference latency is 21 ms, which is 1/50 of the autoregressive transformer model.

preprint2020arXiv

Magnesium Oxide at Extreme Temperatures and Pressures Studied with First-Principles Simulations

We combine two first-principles computer simulation techniques, path integral Monte-Carlo and density functional theory molecular dynamics, to determine the equation of state of magnesium oxide in the regime of warm dense matter, with densities ranging from 0.35 to 71~g$\,$cm$^{-3}$ and temperatures from 10,000 K to $5\times10^8$~K. These conditions are relevant for the interiors of giant planets and stars as well as for shock wave compression measurements and inertial confinement fusion experiments. We study the electronic structure of MgO and the ionization mechanisms as a function of density and temperature. We show that the L-shell orbitals of magnesium and oxygen hybridize at high density. This results into a gradual ionization of the L-shell with increasing density and temperature. In this regard, MgO behaves differently from pure oxygen, which is reflected in the shape of the MgO principal shock Hugoniot curve. The curve of oxygen shows two compression maxima, while that of MgO shows only one. We predict a maximum compression ratio of 4.66 to occur for a temperature of 6.73 $\times 10^7$ K. Finally we study how multiple shocks and ramp waves can be used to cover a large range of densities and temperatures.

preprint2020arXiv

Phase transformation in boron under shock compression

Using first-principles molecular dynamics, we calculated the equation of state and shock Hugoniot of various boron phases. We find a large mismatch between Hugoniots based on existing knowledge of the equilibrium phase diagram and those measured by shock experiments, which could be reconciled if the $α$-B$_{12}$/$β\rightarrowγ$-B$_{28}$ transition is significantly over-pressurized in boron under shock compression. Our results also indicate that there exists an anomaly and negative Clapeyron slope along the melting curve of boron at 100 GPa and 1500--3000 Kelvin. These results enable in-depth understanding of matter under shock compression, in particular the significance of compression-rate dependence of phase transitions and kinetic effects in experimental measurements.

preprint2020arXiv

QMCPACK: Advances in the development, efficiency, and application of auxiliary field and real-space variational and diffusion Quantum Monte Carlo

We review recent advances in the capabilities of the open source ab initio Quantum Monte Carlo (QMC) package QMCPACK and the workflow tool Nexus used for greater efficiency and reproducibility. The auxiliary field QMC (AFQMC) implementation has been greatly expanded to include k-point symmetries, tensor-hypercontraction, and accelerated graphical processing unit (GPU) support. These scaling and memory reductions greatly increase the number of orbitals that can practically be included in AFQMC calculations, increasing accuracy. Advances in real space methods include techniques for accurate computation of band gaps and for systematically improving the nodal surface of ground state wavefunctions. Results of these calculations can be used to validate application of more approximate electronic structure methods including GW and density functional based techniques. To provide an improved foundation for these calculations we utilize a new set of correlation-consistent effective core potentials (pseudopotentials) that are more accurate than previous sets; these can also be applied in quantum-chemical and other many-body applications, not only QMC. These advances increase the efficiency, accuracy, and range of properties that can be studied in both molecules and materials with QMC and QMCPACK.

preprint2020arXiv

Quantifying the dynamics of protein self-organization using deep learning analysis of atomic force microscopy data

Dynamics of protein self-assembly on the inorganic surface and the resultant geometric patterns are visualized using high-speed atomic force microscopy. The time dynamics of the classical macroscopic descriptors such as 2D Fast Fourier Transforms (FFT), correlation and pair distribution function are explored using the unsupervised linear unmixing, demonstrating the presence of static ordered and dynamic disordered phases and establishing their time dynamics. The deep learning (DL)-based workflow is developed to analyze detailed particle dynamics on the particle-by-particle level. Beyond the macroscopic descriptors, we utilize the knowledge of local particle geometries and configurations to explore the evolution of local geometries and reconstruct the interaction potential between the particles. Finally, we use the machine learning-based feature extraction to define particle neighborhood free of physics constraints. This approach allowed separating the possible classes of particle behavior, identify the associated transition probabilities, and further extend this analysis to identify slow modes and associated configurations, allowing for systematic exploration and predictive modeling of the time dynamics of the system. Overall, this work establishes the DL based workflow for the analysis of the self-organization processes in complex systems from observational data and provides insight into the fundamental mechanisms.

preprint2020arXiv

Rnn-transducer with language bias for end-to-end Mandarin-English code-switching speech recognition

Recently, language identity information has been utilized to improve the performance of end-to-end code-switching (CS) speech recognition. However, previous works use an additional language identification (LID) model as an auxiliary module, which causes the system complex. In this work, we propose an improved recurrent neural network transducer (RNN-T) model with language bias to alleviate the problem. We use the language identities to bias the model to predict the CS points. This promotes the model to learn the language identity information directly from transcription, and no additional LID model is needed. We evaluate the approach on a Mandarin-English CS corpus SEAME. Compared to our RNN-T baseline, the proposed method can achieve 16.2% and 12.9% relative error reduction on two test sets, respectively.

preprint2020arXiv

Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition

Non-autoregressive transformer models have achieved extremely fast inference speed and comparable performance with autoregressive sequence-to-sequence models in neural machine translation. Most of the non-autoregressive transformers decode the target sequence from a predefined-length mask sequence. If the predefined length is too long, it will cause a lot of redundant calculations. If the predefined length is shorter than the length of the target sequence, it will hurt the performance of the model. To address this problem and improve the inference speed, we propose a spike-triggered non-autoregressive transformer model for end-to-end speech recognition, which introduces a CTC module to predict the length of the target sequence and accelerate the convergence. All the experiments are conducted on a public Chinese mandarin dataset AISHELL-1. The results show that the proposed model can accurately predict the length of the target sequence and achieve a competitive performance with the advanced transformers. What's more, the model even achieves a real-time factor of 0.0056, which exceeds all mainstream speech recognition models.

preprint2020arXiv

Synchronous Transformers for End-to-End Speech Recognition

For most of the attention-based sequence-to-sequence models, the decoder predicts the output sequence conditioned on the entire input sequence processed by the encoder. The asynchronous problem between the encoding and decoding makes these models difficult to be applied for online speech recognition. In this paper, we propose a model named synchronous transformer to address this problem, which can predict the output sequence chunk by chunk. Once a fixed-length chunk of the input sequence is processed by the encoder, the decoder begins to predict symbols immediately. During training, a forward-backward algorithm is introduced to optimize all the possible alignment paths. Our model is evaluated on a Mandarin dataset AISHELL-1. The experiments show that the synchronous transformer is able to perform encoding and decoding synchronously, and achieves a character error rate of 8.91% on the test set.

preprint2020arXiv

TensorCoder: Dimension-Wise Attention via Tensor Representation for Natural Language Modeling

Transformer has been widely-used in many Natural Language Processing (NLP) tasks and the scaled dot-product attention between tokens is a core module of Transformer. This attention is a token-wise design and its complexity is quadratic to the length of sequence, limiting its application potential for long sequence tasks. In this paper, we propose a dimension-wise attention mechanism based on which a novel language modeling approach (namely TensorCoder) can be developed. The dimension-wise attention can reduce the attention complexity from the original $O(N^2d)$ to $O(Nd^2)$, where $N$ is the length of the sequence and $d$ is the dimensionality of head. We verify TensorCoder on two tasks including masked language modeling and neural machine translation. Compared with the original Transformer, TensorCoder not only greatly reduces the calculation of the original model but also obtains improved performance on masked language modeling task (in PTB dataset) and comparable performance on machine translation tasks.

preprint2020arXiv

Trained Rank Pruning for Efficient Deep Neural Networks

To accelerate DNNs inference, low-rank approximation has been widely adopted because of its solid theoretical rationale and efficient implementations. Several previous works attempted to directly approximate a pre-trained model by low-rank decomposition; however, small approximation errors in parameters can ripple over a large prediction loss. Apparently, it is not optimal to separate low-rank approximation from training. Unlike previous works, this paper integrates low rank approximation and regularization into the training process. We propose Trained Rank Pruning (TRP), which alternates between low rank approximation and training. TRP maintains the capacity of the original network while imposing low-rank constraints during training. A nuclear regularization optimized by stochastic sub-gradient descent is utilized to further promote low rank in TRP. Networks trained with TRP has a low-rank structure in nature, and is approximated with negligible performance loss, thus eliminating fine-tuning after low rank approximation. The proposed method is comprehensively evaluated on CIFAR-10 and ImageNet, outperforming previous compression counterparts using low rank approximation. Our code is available at: https://github.com/yuhuixu1993/Trained-Rank-Pruning.

preprint2020arXiv

Trained Rank Pruning for Efficient Deep Neural Networks

The performance of Deep Neural Networks (DNNs) keeps elevating in recent years with increasing network depth and width. To enable DNNs on edge devices like mobile phones, researchers proposed several network compression methods including pruning, quantization and factorization. Among the factorization-based approaches, low-rank approximation has been widely adopted because of its solid theoretical rationale and efficient implementations. Several previous works attempted to directly approximate a pre-trained model by low-rank decomposition; however, small approximation errors in parameters can ripple a large prediction loss. As a result, performance usually drops significantly and a sophisticated fine-tuning is required to recover accuracy. We argue that it is not optimal to separate low-rank approximation from training. Unlike previous works, this paper integrates low rank approximation and regularization into the training. We propose Trained Rank Pruning (TRP), which iterates low rank approximation and training. TRP maintains the capacity of original network while imposes low-rank constraints during training. A stochastic sub-gradient descent optimized nuclear regularization is utilized to further encourage low rank in TRP. The TRP trained network has low-rank structure in nature, and can be approximated with negligible performance loss, eliminating fine-tuning after low rank approximation. The methods are comprehensively evaluated on CIFAR-10 and ImageNet, outperforming previous compression methods using low rank approximation. Code is available: https://github.com/yuhuixu1993/Trained-Rank-Pruning

preprint2020arXiv

TRP: Trained Rank Pruning for Efficient Deep Neural Networks

To enable DNNs on edge devices like mobile phones, low-rank approximation has been widely adopted because of its solid theoretical rationale and efficient implementations. Several previous works attempted to directly approximate a pretrained model by low-rank decomposition; however, small approximation errors in parameters can ripple over a large prediction loss. As a result, performance usually drops significantly and a sophisticated effort on fine-tuning is required to recover accuracy. Apparently, it is not optimal to separate low-rank approximation from training. Unlike previous works, this paper integrates low rank approximation and regularization into the training process. We propose Trained Rank Pruning (TRP), which alternates between low rank approximation and training. TRP maintains the capacity of the original network while imposing low-rank constraints during training. A nuclear regularization optimized by stochastic sub-gradient descent is utilized to further promote low rank in TRP. The TRP trained network inherently has a low-rank structure, and is approximated with negligible performance loss, thus eliminating the fine-tuning process after low rank decomposition. The proposed method is comprehensively evaluated on CIFAR-10 and ImageNet, outperforming previous compression methods using low rank approximation.

preprint2019arXiv

Experimental observation of the gate-controlled reversal of the anomalous Hall effect in the intrinsic magnetic topological insulator MnBi2Te4 device

Here we report the reserved anomalous Hall effect (AHE) in the 5-septuple-layer van der Waals device of the intrinsic magnetic topological insulator MnBi2Te4. By employing the top/bottom gate, a negative AHE loop gradually decreases to zero and changes to a reversed sign. The reversed AHE exhibits distinct coercive fields and temperature dependence from the previous AHE. It reaches the maximum inside the gap of the Dirac cone. The newly-seen reversed AHE is attributed to the competition of the intrinsic Berry curvature and the Dirac-gap enhanced extrinsic skew scattering. Its gate-controlled switching contributes a scheme for the topological spin field-effect transistors.

preprint2019arXiv

Magneto-transport and Shubnikov-de Haas oscillations in the layered ternary telluride Ta3SiTe6 topological semimetal

Topological semimetals characterize a novel class of quantum materials hosting Dirac/Weyl fermions. The important features of topological fermions can be exhibited by quantum oscillations. Here we report the magnetoresistance and Shubnikov-de Haas (SdH) quantum oscillation of longitudinal resistance in the single crystal of topological semimetal Ta3SiTe6 with the magnetic field up to 38 T. Periodic amplitude of the oscillations reveals related information about the Fermi surface. The fast Fourier transformation spectra represent a single oscillatory frequency. The analysis of the oscillations shows the Fermi pocket with a cross-section area of 0.13 angstrom power minus 2. Combining magneto-transport measurements and the first-principles calculation, we find that these oscillations come from the hole pocket. Hall resistivity and the SdH oscillations recommend that Ta3SiTe6 is a hole dominated system.

preprint2019arXiv

Quantum-critical phase out of frustrated magnetism in a strongly correlated metal

Strange-metal phenomena often develop at the border of antiferromagnetic order in strongly correlated metals. It has been well established that they can originate from the fluctuations anchored by the point of continuous quantum phase transition out of the antiferromagnetic order, i.e., a quantum critical point. What has been unclear is how these phenomena can be associated with a potential new phase of matter at zero temperature. Here we show that magnetic frustration of the 4f-local moments in the distorted Kagome intermetallic compound CePdAl gives rise to such a paramagnetic quantum-critical phase. Moreover, we demonstrate that this phase turns into a Fermi liquid through a Mott-like crossover; in a two-dimensional parameter space of pressure and magnetic field, this crossover is linked to a line of zero-temperature 4f-electron localization-delocalization phase transitions at low and moderate pressures. Our discovery motivates a new design principle for strongly correlated metallic states with unconventional excitations that may underlie the development of such effects as high temperature superconductivity.