Source author record

Shuai Zhang

Shuai Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

74works

38topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A quantum machine learning classifier to search for new physics

Due to the success of the Standard Model~(SM), it is reasonable to anticipate that the signal of new physics~(NP) beyond the SM is small. Consequently, future searches for NP and precision tests of the SM will require high luminosity collider experiments. Moreover, as precision tests advance, rare processes with many final-state particles require consideration which demands the analysis of a vast number of observables. The high luminosity produces a large amount of experimental data spanning a large observable space, posing a significant data-processing challenge. In recent years, quantum machine learning has emerged as a promising approach for processing large amounts of complex data on a quantum computer. In this study, we propose quantum searching neighbor~(QSN) and variational QSN~(VQSN) algorithms to search for NP. The QSN is a classification algorithm. The VQSN introduces variation to the QSN to process classical data. As applications, we apply the (V)QSN in the phenomenological study of the NP at the Large Hadron Collider and muon colliders. Examples are implemented on a real quantum hardware, which confirms reliable performance under noisy conditions. The results indicate that the VQSN demonstrates superior efficiency in the sense of computational complexity to a classical counterpart k-nearest neighbor algorithm, even when dealing with classical data.

preprint2026arXiv

A Survey on Mapping Digital Systems with Bill of Materials: Development, Practices, and Challenges

Modern digital ecosystems, spanning software, hardware, learning models, datasets, and cryptographic products, continue to grow in complexity, making it difficult for organizations to understand and manage component dependencies. Bills of Materials (BOMs) have emerged as a structured way to document product components, their interrelationships, and key metadata, improving visibility and security across digital supply chains. This survey provides the first comprehensive cross-domain review of BOM developments and practices. We start by examining the evolution of BOM frameworks in three stages (i.e., pre-development, initial, and accelerated) and summarizing their core principles, key stakeholders, and standardization efforts for hardware, software, artificial intelligence (AI) models, datasets, and cryptographic assets. We then review industry practices for generating BOM data, evaluating its quality, and securely sharing it. Next, we review practical downstream uses of BOM data, including dependency modeling, compliance verification, operational risk assessment, and vulnerability tracking. We also discuss academic efforts to address limitations in current BOM frameworks through refinements, extensions, or new models tailored to emerging domains such as data ecosystems and AI supply chains. Finally, we identify four key gaps that limit the usability and reliability of today's BOM frameworks, motivating future research directions.

preprint2026arXiv

Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning

The integration of large language models (LLMs) with external tools has significantly expanded the capabilities of AI agents. However, as the diversity of both LLMs and tools increases, selecting the optimal model-tool combination becomes a high-dimensional optimization challenge. Existing approaches often rely on a single model or fixed tool-calling logic, failing to exploit the performance variations across heterogeneous model-tool pairs. In this paper, we present ATLAS (Adaptive Tool-LLM Alignment and Synergistic Invocation), a dual-path framework for dynamic tool usage in cross-domain complex reasoning. ATLAS operates via a dual-path approach: (1) \textbf{training-free cluster-based routing} that exploits empirical priors for domain-specific alignment, and (2) \textbf{RL-based multi-step routing} that explores autonomous trajectories for out-of-distribution generalization. Extensive experiments across 15 benchmarks demonstrate that our method outperforms closed-source models like GPT-4o, surpassing existing routing methods on both in-distribution (+10.1%) and out-of-distribution (+13.1%) tasks. Furthermore, our framework shows significant gains in visual reasoning by orchestrating specialized multi-modal tools.

preprint2026arXiv

DualKV: Shared-Prompt Flash Attention for Efficient RL Training with Large Rollouts and Long Contexts

Modern RL post-training methods such as GRPO and DAPO train on $N$ response sequences of $R$ tokens sampled from a shared prompt of $P$ tokens, but standard FlashAttention replicates all $P$ prompt tokens $N$ times across both forward and backward passes -- duplicating compute and memory on identical hidden states. In large-rollout, long-context RL training ($N{\geq}16$, $P{\geq}8\text{K}$), this redundancy dominates the policy update cost. We observe that in decoder-only models, causal masking makes prompt representations invariant across sequences at every layer, so all per-token operations (norms, projections, MLP) and attention can process the prompt once -- a property not yet exploited at the kernel level for training. We propose \textbf{DualKV}, the first FlashAttention kernel variant that eliminates shared-prompt replication during RL training, via (1)~fused CUDA forward and backward kernels that iterate over two disjoint KV regions -- shared context and per-sequence response -- in a single kernel launch, and (2)~a data-pipeline redesign in veRL that repacks $N(P{+}R)$ tokens into $P{+}NR$ tokens per micro-batch, extending the token reduction from attention to the entire model by a factor $ρ= N(P{+}R)/(P{+}NR)$. DualKV is mathematically equivalent to standard attention and introduces no approximation. On Qwen3-8B GRPO training with 8$\times$H100 GPUs ($N{=}32$, 8K-context), DualKV achieves $1.63$--$2.09\times$ policy-update speedup, enables $2\times$ larger micro-batches, and raises MFU from $36\%$ to $76\%$. Similar gains hold for DAPO ($2.47\times$ speedup, $77\%$ MFU). At 30B MoE scale on 16$\times$H100, DualKV achieves $3.82\times$ policy-update and $3.38\times$ end-to-end step speedup over FlashAttention (which requires 4-way Ulysses sequence parallelism to avoid OOM).

preprint2026arXiv

MaxCode: A Max-Reward Reinforcement Learning Framework for Automated Code Optimization

Large Language Models (LLMs) demonstrate strong capabilities in general coding tasks but encounter two key challenges when optimizing code: (i) the complexity of writing optimized code (such as performant CUDA kernels and competition-level CPU code) requires expertise in systems, algorithms and specific languages and (ii) requires interpretation of performance metrics like timing and device utilization beyond binary correctness. In this work, we explore inference-time search algorithms that guide the LLM to discover better solutions through iterative refinement based on execution feedback. Our approach, called MaxCode unifies existing search methods under a max-reward reinforcement learning framework, making the observation and action-value functions modular for modification. To enhance the observation space, we integrate a natural language critique model that converts raw execution feedback into diagnostic insights about errors and performance bottlenecks, and the best-discounted reward seen so far. Together, these provide richer input to the code proposal function. To improve exploration during search, we train a generative reward-to-go model using action values from rollouts to rerank potential solutions. Testing on the KernelBench (CUDA) and PIE (C++) optimization benchmarks shows that MaxCode improves optimized code performance compared to baselines, achieving 20.3% and 10.1% relative improvements in absolute speedup value and relative speedup ranking, respectively.

preprint2026arXiv

Revisiting Privacy Preservation in Brain-Computer Interfaces: Conceptual Boundaries, Risk Pathways, and a Protection-Strength Grading Framework

Brain-computer interfaces (BCIs) are moving rapidly from laboratory research into clinical, edge, and real-world settings. Under ISO/IEC 8663:2025, a BCI is a direct communication link between central nervous system activity and external software or hardware systems. This link expands privacy risk beyond raw neural-signal leakage: neural data, derived representations, model assets, and decoded outputs can be re-associated with individuals across collection, transmission, storage, training, inference, and feedback, or used to infer information beyond what a task requires. Starting from the general BCI paradigm, this review deffnes privacy-protection boundaries, protection objects, and the relationship between user data privacy and model privacy within a shared risk pathway. It then proposes a three-dimensional framework - protection object, lifecycle stage, and dominant protection-strength level - to classify existing work into four levels of protection strength. Finally, mental privacy and neuroethical risks are treated as open issues, emphasizing that BCI privacy protection should not only obscure data but also disentangle task-irrelevant sensitive information while preserving downstream utility. Keywords: Brain-computer interface, Neural data privacy, User data privacy, Model privacy, Disentanglement of task-irrelevant sensitive information, Protection-strength grading, Neuroethical risks

preprint2026arXiv

Superconductivity in Electron Liquids: Precision Many-Body Treatment of Coulomb Interaction

More than a century after discovery, the theory of conventional superconductivity remains incomplete. While the importance of electron-phonon coupling is understood, a controlled first-principles treatment of Coulomb interaction is lacking. Current ab initio calculations of superconductivity rely on a phenomenological downfolding approximation, replacing Coulomb interaction with a repulsive pseudopotential μ*, and leaving ambiguities in electron-phonon coupling with dynamical Coulomb interactions unresolved. We address this via an effective field theory approach, integrating out high-energy electronic degrees of freedom using variational Diagrammatic Monte Carlo. Applied to the uniform electron gas, this establishes a microscopic procedure to implement downfolding, define the pseudopotential, and express dynamical Coulomb effects on electron-phonon coupling via the electron vertex function. We find the bare pseudopotential significantly larger than conventional values. This yields improved pseudopotential estimates in simple metals and tests density functional perturbation theory accuracy for effective electron-phonon coupling. We present an ab initio workflow computing superconducting Tc from the anomalous vertex's precursory Cooper flow. This infers Tc from normal state calculations, enabling reliable estimates of very low Tc (including near quantum phase transitions) beyond conventional reach. Validating our approach on simple metals without empirical tuning, we resolve long-standing discrepancies and predict a pressure-induced transition in Al from superconducting to non-superconducting above ~60GPa. We propose ambient-pressure Mg and Na are proximal to a similar critical point. Our work establishes a controlled ab initio framework for electron-phonon superconductivity beyond the weak-correlation limit, paving the way for reliable Tc calculations and novel material design.

preprint2026arXiv

UniD-Shift: Towards Unified Semantic Segmentation via Interpretable Share-Private Multimodal Decomposition

Semantic segmentation of large-scale 3D point clouds is crucial for applications such as autonomous driving and urban digital twins. However, the sparse sampling pattern of LiDAR and the view-dependent geometric distortion in image observations complicate cross-modal alignment and hinder stable fusion. Inspired by the fact that 2D images captured by cameras are representations of the 3D world, we recognize that the features learned from 2D and 3D segmentation share some common semantics, while other aspects remain modality-specific. This insight motivates a unified multimodal framework for joint 2D-3D semantic segmentation. We combine a SAM-based vision encoder with a SPTNet-based geometric encoder to extract complementary semantic and geometric representations. The resulting features from both modalities are explicitly decomposed into shared and private subspaces, where the shared components summarize semantic factors common to both domains, and the private components preserve properties that are unique to each modality. A lightweight attention-based fusion module aggregates the shared features into a consistent cross-modal representation, and a regularized training objective ensures both semantic alignment and subspace independence. Experiments on the SemanticKITTI and nuScenes benchmarks demonstrate consistent improvements in segmentation accuracy over representative multimodal baselines, accompanied by competitive computational efficiency. Cross-domain evaluation on nuScenes USA-Singapore shows stable performance under distribution shifts, demonstrating strong generalization. The implementation code is publicly available at: https://github.com/shuaizhang69/UniD-Shift.

preprint2025arXiv

Comprehensive Study of Phonon Chirality under Symmetry Constraints

Phonons are quanta of lattice vibrations, and their modes (linear, circular, or stationary) are symmetry-determined. Circularly polarized phonons, possessing nonzero angular momentum (AM), have drawn widespread attention recently. Despite widespread use of pseudo-angular momentum (PAM) and circularly polarized light polarization flips to identify chiral phonons in Raman scattering, their reliability is debated due to symmetry dependence, and experimental verification standards remain lacking. Here, we systematically study phonon chirality and associated phenomena across magnetic point groups. We establish that the AM-PAM correlation is governed by both crystalline symmetry and Wyckoff positions, dictating conditions where nonzero AM manifests in PAM signatures. Crucially, phonons belonging to distinct irreducible representations exhibit distinct experimental benchmarks, enabling direct determination of crystalline chirality and symmetry classification. Furthermore, we report the discovery of a signature for symmetry-induced phenomena, notably a half-wave plate-analogous effect induced by mirror-odd phonons. Meanwhile, we conducted five experiments to validate our theory.

preprint2023arXiv

Offline Imitation Learning with Variational Counterfactual Reasoning

In offline imitation learning (IL), an agent aims to learn an optimal expert behavior policy without additional online environment interactions. However, in many real-world scenarios, such as robotics manipulation, the offline dataset is collected from suboptimal behaviors without rewards. Due to the scarce expert data, the agents usually suffer from simply memorizing poor trajectories and are vulnerable to variations in the environments, lacking the capability of generalizing to new environments. To automatically generate high-quality expert data and improve the generalization ability of the agent, we propose a framework named \underline{O}ffline \underline{I}mitation \underline{L}earning with \underline{C}ounterfactual data \underline{A}ugmentation (OILCA) by doing counterfactual inference. In particular, we leverage identifiable variational autoencoder to generate \textit{counterfactual} samples for expert data augmentation. We theoretically analyze the influence of the generated expert data and the improvement of generalization. Moreover, we conduct extensive experiments to demonstrate that our approach significantly outperforms various baselines on both \textsc{DeepMind Control Suite} benchmark for in-distribution performance and \textsc{CausalWorld} benchmark for out-of-distribution generalization. Our code is available at \url{https://github.com/ZexuSun/OILCA-NeurIPS23}.

preprint2022arXiv

A Tight Three-parameter Correlation and Related Classification on Gamma-Ray Bursts

Gamma-ray bursts (GRBs) are widely believed to be from massive collapsars and/or compact binary mergers, which accordingly, would generate long and short GRBs, respectively. The details on this classification scheme have been in constant debate given more and more observational data available to us. In this work, we apply a series of data mining methods to studying the potential classification information contained in the prompt emission of GRBs detected by the Fermi Gamma-ray Burst Monitor. A tight global correlation is found between fluence ($f$), peak flux ($F$) and prompt duration ($T_{90}$) which takes the form of $ \log {\it f}= 0.75 \log T_{90} +0.92 \log F -7.14$. Based on this correlation, we can define a new parameter $L = 1.66\log T_{90} + 0.84 \log {\it f} - 0.46 \log F + 3.24$ by linear discriminant analysis that would distinguish between long and short GRBs with much less ambiguity than $T_{90}$. We also discussed the three subclasses scheme of GRB classification derived from clusters analysis based on a Gaussian mixture model, and suggest that, besides SGRBs, LGRBs may be divided into long-bright gamma-ray bursts (LBGRBs) and long-faint gamma-ray bursts (LFGRBs), LBGRBs have statistical higher $f$ and $F$ than LFGRBs; further statistical analysis found that LBGRBs also have higher number of GRB pulses than LFGRBs.

preprint2022arXiv

Charge Carrier Mediation and Ferromagnetism induced in MnBi6Te10 Magnetic Topological Insulators by antimony doping

A new kind of intrinsic magnetic topological insulators (MTI) MnBi2Te4 family have shed light on the observation of novel topological quantum effect such as quantum anomalous Hall effect (QAHE). However, the strong anti-ferromagnetic (AFM) coupling and high carrier concentration in the bulk hinder the practical applications. In closely related materials MnBi4Te7 and MnBi6Te10, the interlayer magnetic coupling is greatly suppressed by Bi2Te3 layer intercalation. However, AFM is still the ground state in these compounds. Here by magnetic and transport measurements, we demonstrate that Sb substitutional dopant plays a dual role in MnBi6Te10, which can not only adjust the charge carrier type and the concentration, but also induce the solid into a ferromagnetic (FM) ground state. AFM ground state region which is also close to the charge neutral point can be found in the phase diagram of Mn(SbxBi1-x)6Te10 when x ~ 0.25. An intrinsic FM-MTI candidate is thus demonstrated, and it may take a step further for the realization of high-quality and high-temperature QAHE and the related topological quantum effects in the future.

preprint2022arXiv

Chat-to-Design: AI Assisted Personalized Fashion Design

In this demo, we present Chat-to-Design, a new multimodal interaction system for personalized fashion design. Compared to classic systems that recommend apparel based on keywords, Chat-to-Design enables users to design clothes in two steps: 1) coarse-grained selection via conversation and 2) fine-grained editing via an interactive interface. It encompasses three sub-systems to deliver an immersive user experience: A conversation system empowered by natural language understanding to accept users' requests and manages dialogs; A multimodal fashion retrieval system empowered by a large-scale pretrained language-image network to retrieve requested apparel; A fashion design system empowered by emerging generative techniques to edit attributes of retrieved clothes.

preprint2022arXiv

Divergence-aware Federated Self-Supervised Learning

Self-supervised learning (SSL) is capable of learning remarkable representations from centrally available data. Recent works further implement federated learning with SSL to learn from rapidly growing decentralized unlabeled images (e.g., from cameras and phones), often resulted from privacy constraints. Extensive attention has been paid to SSL approaches based on Siamese networks. However, such an effort has not yet revealed deep insights into various fundamental building blocks for the federated self-supervised learning (FedSSL) architecture. We aim to fill in this gap via in-depth empirical study and propose a new method to tackle the non-independently and identically distributed (non-IID) data problem of decentralized data. Firstly, we introduce a generalized FedSSL framework that embraces existing SSL methods based on Siamese networks and presents flexibility catering to future methods. In this framework, a server coordinates multiple clients to conduct SSL training and periodically updates local models of clients with the aggregated global model. Using the framework, our study uncovers unique insights of FedSSL: 1) stop-gradient operation, previously reported to be essential, is not always necessary in FedSSL; 2) retaining local knowledge of clients in FedSSL is particularly beneficial for non-IID data. Inspired by the insights, we then propose a new approach for model update, Federated Divergence-aware Exponential Moving Average update (FedEMA). FedEMA updates local models of clients adaptively using EMA of the global model, where the decay rate is dynamically measured by model divergence. Extensive experiments demonstrate that FedEMA outperforms existing methods by 3-4% on linear evaluation. We hope that this work will provide useful insights for future research.

preprint2022arXiv

EasyFL: A Low-code Federated Learning Platform For Dummies

Academia and industry have developed several platforms to support the popular privacy-preserving distributed learning method -- Federated Learning (FL). However, these platforms are complex to use and require a deep understanding of FL, which imposes high barriers to entry for beginners, limits the productivity of researchers, and compromises deployment efficiency. In this paper, we propose the first low-code FL platform, EasyFL, to enable users with various levels of expertise to experiment and prototype FL applications with little coding. We achieve this goal while ensuring great flexibility and extensibility for customization by unifying simple API design, modular design, and granular training flow abstraction. With only a few lines of code, EasyFL empowers them with many out-of-the-box functionalities to accelerate experimentation and deployment. These practical functionalities are heterogeneity simulation, comprehensive tracking, distributed training optimization, and seamless deployment. They are proposed based on challenges identified in the proposed FL life cycle. Compared with other platforms, EasyFL not only requires just three lines of code (at least 10x lesser) to build a vanilla FL application but also incurs lower training overhead. Besides, our evaluations demonstrate that EasyFL expedites distributed training by 1.5x. It also improves the efficiency of deployment. We believe that EasyFL will increase the productivity of researchers and democratize FL to wider audiences.

preprint2022arXiv

Federated Unsupervised Domain Adaptation for Face Recognition

Given labeled data in a source domain, unsupervised domain adaptation has been widely adopted to generalize models for unlabeled data in a target domain, whose data distributions are different. However, existing works are inapplicable to face recognition under privacy constraints because they require sharing of sensitive face images between domains. To address this problem, we propose federated unsupervised domain adaptation for face recognition, FedFR. FedFR jointly optimizes clustering-based domain adaptation and federated learning to elevate performance on the target domain. Specifically, for unlabeled data in the target domain, we enhance a clustering algorithm with distance constrain to improve the quality of predicted pseudo labels. Besides, we propose a new domain constraint loss (DCL) to regularize source domain training in federated learning. Extensive experiments on a newly constructed benchmark demonstrate that FedFR outperforms the baseline and classic methods on the target domain by 3% to 14% on different evaluation metrics.

preprint2022arXiv

Giant and Reversible Electronic Structure Evolution in a Magnetic Topological Material EuCd2As2

The electronic structure and the physical properties of quantum materials can be significantly altered by charge carrier doping and magnetic state transition. Here we report a discovery of a giant and reversible electronic structure evolution with doping in a magnetic topological material. By performing high-resolution angle-resolved photoemission measurements on EuCd2As2,we found that a huge amount of hole doping can be introduced into the sample surface due to surface absorption. The electronic structure exhibits a dramatic change with the hole doping which can not be described by a rigid band shift. Prominent band splitting is observed at high doping which corresponds to a doping-induced magnetic transition at low temperature (below -15 K) from an antiferromagnetic state to a ferromagnetic state. These results have established a detailed electronic phase diagram of EuCd2As2 where the electronic structure and the magnetic structure change systematically and dramatically with the doping level. They further suggest that the transport, magnetic and topological properties of EuCd2As2 can be greatly modified by doping. These work will stimulate further investigations to explore for new phenomena and properties in doping this magnetic topological material.

preprint2022arXiv

How does unlabeled data improve generalization in self-training? A one-hidden-layer theoretical analysis

Self-training, a semi-supervised learning algorithm, leverages a large amount of unlabeled data to improve learning when the labeled data are limited. Despite empirical successes, its theoretical characterization remains elusive. To the best of our knowledge, this work establishes the first theoretical analysis for the known iterative self-training paradigm and proves the benefits of unlabeled data in both training convergence and generalization ability. To make our theoretical analysis feasible, we focus on the case of one-hidden-layer neural networks. However, theoretical understanding of iterative self-training is non-trivial even for a shallow neural network. One of the key challenges is that existing neural network landscape analysis built upon supervised learning no longer holds in the (semi-supervised) self-training paradigm. We address this challenge and prove that iterative self-training converges linearly with both convergence rate and generalization accuracy improved in the order of $1/\sqrt{M}$, where $M$ is the number of unlabeled samples. Experiments from shallow neural networks to deep neural networks are also provided to justify the correctness of our established theoretical insights on self-training.

preprint2022arXiv

Large Exchange Bias Effect and Coverage-Dependent Interfacial Coupling in CrI3/MnBi2Te4 van der Waals Heterostructures

Igniting interface magnetic ordering of magnetic topological insulators by building a van der Waals heterostructure can help to reveal novel quantum states and design functional devices. Here, we observe an interesting exchange bias effect, indicating successful interfacial magnetic coupling, in CrI3/MnBi2Te4 ferromagnetic insulator/antiferromagnetic topological insulator (FMI/AFM-TI) heterostructure devices. The devices originally exhibit a negative exchange bias field, which decays with increasing temperature and is unaffected by the back-gate voltage. When we change the device configuration to be half-covered by CrI3, the exchange bias becomes positive with a very large exchange bias field exceeding 300 mT. Such sensitive manipulation is explained by the competition between the FM and AFM coupling at the interface of CrI3 and MnBi2Te4, pointing to coverage-dependent interfacial magnetic interactions. Our work will facilitate the development of topological and antiferromagnetic devices.

preprint2022arXiv

Modelling graph dynamics in fraud detection with "Attention"

At online retail platforms, detecting fraudulent accounts and transactions is crucial to improve customer experience, minimize loss, and avoid unauthorized transactions. Despite the variety of different models for deep learning on graphs, few approaches have been proposed for dealing with graphs that are both heterogeneous and dynamic. In this paper, we propose DyHGN (Dynamic Heterogeneous Graph Neural Network) and its variants to capture both temporal and heterogeneous information. We first construct dynamic heterogeneous graphs from registration and transaction data from eBay. Then, we build models with diachronic entity embedding and heterogeneous graph transformer. We also use model explainability techniques to understand the behaviors of DyHGN-* models. Our findings reveal that modelling graph dynamics with heterogeneous inputs need to be conducted with "attention" depending on the data structure, distribution, and computation cost.

preprint2022arXiv

Nanoscale three-dimensional magnetic sensing with a probabilistic nanomagnet driven by spin-orbit torque

Detection of vector magnetic fields at nanoscale dimensions is critical in applications ranging from basic material science, to medical diagnostic. Meanwhile, an all-electric operation is of great significance for achieving a simple and compact sensing system. Here, we propose and experimentally demonstrate a simple approach to sensing a vector magnetic field at nanoscale dimensions, by monitoring a probabilistic nanomagnet's transition probability from a metastable state, excited by a driving current due to SOT, to a settled state. We achieve sensitivities for Hx, Hy, and Hz of 1.02%/Oe, 1.09%/Oe and 3.43%/Oe, respectively, with a 200 x 200 nm^2 nanomagnet. The minimum detectable field is dependent on the driving pulse events N, and is expected to be as low as 1 uT if N = 3 x 10^6.

preprint2022arXiv

Optimizing Performance of Federated Person Re-identification: Benchmarking and Analysis

The increasingly stringent data privacy regulations limit the development of person re-identification (ReID) because person ReID training requires centralizing an enormous amount of data that contains sensitive personal information. To address this problem, we introduce federated person re-identification (FedReID) -- implementing federated learning, an emerging distributed training method, to person ReID. FedReID preserves data privacy by aggregating model updates, instead of raw data, from clients to a central server. Furthermore, we optimize the performance of FedReID under statistical heterogeneity via benchmark analysis. We first construct a benchmark with an enhanced algorithm, two architectures, and nine person ReID datasets with large variances to simulate the real-world statistical heterogeneity. The benchmark results present insights and bottlenecks of FedReID under statistical heterogeneity, including challenges in convergence and poor performance on datasets with large volumes. Based on these insights, we propose three optimization approaches: (1) We adopt knowledge distillation to facilitate the convergence of FedReID by better transferring knowledge from clients to the server; (2) We introduce client clustering to improve the performance of large datasets by aggregating clients with similar data distributions; (3) We propose cosine distance weight to elevate performance by dynamically updating the weights for aggregation depending on how well models are trained in clients. Extensive experiments demonstrate that these approaches achieve satisfying convergence with much better performance on all datasets. We believe that FedReID will shed light on implementing and optimizing federated learning on more computer vision applications.

preprint2022arXiv

Reducing language context confusion for end-to-end code-switching automatic speech recognition

Code-switching deals with alternative languages in communication process. Training end-to-end (E2E) automatic speech recognition (ASR) systems for code-switching is especially challenging as code-switching training data are always insufficient to combat the increased multilingual context confusion due to the presence of more than one language. We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model based on the Equivalence Constraint (EC) Theory. The linguistic theory requires that any monolingual fragment that occurs in the code-switching sentence must occur in one of the monolingual sentences. The theory establishes a bridge between monolingual data and code-switching data. We leverage this linguistics theory to design the code-switching E2E ASR model. The proposed model efficiently transfers language knowledge from rich monolingual data to improve the performance of the code-switching ASR model. We evaluate our model on ASRU 2019 Mandarin-English code-switching challenge dataset. Compared to the baseline model, our proposed model achieves a 17.12% relative error reduction.

preprint2022arXiv

Secure two-way fiber-optic time transfer against sub-ns asymmetric delay attack

Two-way fiber-optic time transfer is a promising precise time synchronization technique with sub-nanosecond accuracy. However, asymmetric delay attack is a serious threat which cannot be prevent by any encryption method. In this paper, a dynamic model based scheme is proposed to defense the sub-nanosecond asymmetric delay attack. A threshold is set according to the estimated time difference by a two-state clock model where the fixed frequency difference is excluded from the time difference to detect the asymmetric delay attack which is smaller than the time difference induced by the fixed frequency difference. Theoretical simulation and experimental demonstration are implemented to prove the feasibility of the scheme. A two-way fiber-optic time transfer system with time stability with 24.5ps, 3.98ps, and 2.95ps at 1s, 10s, and 100s averaging time is shown under sub-ns asymmetric time delay attack experimentally. The proposed method provides a promising secure sub-ns precise time synchronization technique against asymmetric delay attack.

preprint2022arXiv

SLAM-TKA: Real-time Intra-operative Measurement of Tibial Resection Plane in Conventional Total Knee Arthroplasty

Total knee arthroplasty (TKA) is a common orthopaedic surgery to replace a damaged knee joint with artificial implants. The inaccuracy of achieving the planned implant position can result in the risk of implant component aseptic loosening, wear out, and even a joint revision, and those failures most of the time occur on the tibial side in the conventional jig-based TKA (CON-TKA). This study aims to precisely evaluate the accuracy of the proximal tibial resection plane intra-operatively in real-time such that the evaluation processing changes very little on the CON-TKA operative procedure. Two X-ray radiographs captured during the proximal tibial resection phase together with a pre-operative patient-specific tibia 3D mesh model segmented from computed tomography (CT) scans and a trocar pin 3D mesh model are used in the proposed simultaneous localisation and mapping (SLAM) system to estimate the proximal tibial resection plane. Validations using both simulation and in-vivo datasets are performed to demonstrate the robustness and the potential clinical value of the proposed algorithm.

preprint2022arXiv

Smart Multi-tenant Federated Learning

Federated learning (FL) is an emerging distributed machine learning method that empowers in-situ model training on decentralized edge devices. However, multiple simultaneous training activities could overload resource-constrained devices. In this work, we propose a smart multi-tenant FL system, MuFL, to effectively coordinate and execute simultaneous training activities. We first formalize the problem of multi-tenant FL, define multi-tenant FL scenarios, and introduce a vanilla multi-tenant FL system that trains activities sequentially to form baselines. Then, we propose two approaches to optimize multi-tenant FL: 1) activity consolidation merges training activities into one activity with a multi-task architecture; 2) after training it for rounds, activity splitting divides it into groups by employing affinities among activities such that activities within a group have better synergy. Extensive experiments demonstrate that MuFL outperforms other methods while consuming 40% less energy. We hope this work will inspire the community to further study and optimize multi-tenant FL.

preprint2022arXiv

Topological States in Chevrel Phase Materials from First-principle Calculations

Chevrel phase materials form a family of ternary molybdenum chalcogenides with a general chemical formula $A_x{\rm Mo}_6X_8$ ($A$ = metal elements, $X$ = chalcogen). The variety of $A$ atoms makes a large number of family members and leads to many tunable physical properties, such as the superconductivity, thermoelectricity and the ionic conductivity. In this work, we have further found various nontrivial band topological states in these materials by using first-principle calculations. The compounds having time-reversal symmetry, such as ${\rm BaMo}_6{\rm S}_8$, ${\rm SrMo}_6{\rm S}_8$, and ${\rm Mo}_6{\rm S}_8$, are topological insulators in both of the $R\bar{3}$ and $P\bar{1}$ phases, whereas ${\rm EuMo}_6{\rm S}_8$ within ferromagnetic state, it is an axion insulator in the $R\bar{3}$ phase and a trivial one in the $P\bar{1}$ phase. This indicates that the change of $A$ ions can modify the chemical potential, lattice distortion, and magnetic orders, which offers a unique way to influence the topological states and other properties. We hope this work can stimulate further studies of Chevrel phase materials to find more intriguing phenomena, such as topological superconducting states and Majorana modes.

preprint2022arXiv

Two-dimensional Obstructed Atomic Insulators with Fractional Corner Charge in MA$_2$Z$_4$ Family

According to topological quantum chemistry, a class of electronic materials have been called obstructed atomic insulators (OAIs), in which a portion of valence electrons necessarily have their centers located on some empty $\textit{Wyckoff}$ positions without atoms occupation in the lattice. The obstruction of centering these electrons coinciding with their host atoms is nontrivial and results in metallic boundary states when the boundary is properly cut. Here, on basis of first-principles calculations in combination with topological quantum chemistry analysis, we propose two dimensional MA$_2$Z$_4$ (M = Cr, Mo and W; A = Si and Ge, Z = N, P and As) monolayer family are all OAIs. A typical case is the recently synthesized MoSi$_2$N$_4$. Although it is a topological trivial insulator with the occupied electronic states being integer combination of elementary band representations, it has valence electrons centering empty $\textit{Wyckoff}$ positions. It exhibits unique OAI-induced metallic edge states along the (1$\bar{1}$0) edge of MoSi$_2$N$_4$ monolayer and the in-gap corner states at three vertices of certain hexagonal nanodisk samples respecting C$_3$ rotation symmetry. The readily synthesized MoSi$_2$N$_4$ is quite stable and has a large bulk band gap of 1.94 eV, which makes the identification of these edge and corner states most possible for experimental clarification.

preprint2022arXiv

xFraud: Explainable Fraud Transaction Detection

At online retail platforms, it is crucial to actively detect the risks of transactions to improve customer experience and minimize financial loss. In this work, we propose xFraud, an explainable fraud transaction prediction framework which is mainly composed of a detector and an explainer. The xFraud detector can effectively and efficiently predict the legitimacy of incoming transactions. Specifically, it utilizes a heterogeneous graph neural network to learn expressive representations from the informative heterogeneously typed entities in the transaction logs. The explainer in xFraud can generate meaningful and human-understandable explanations from graphs to facilitate further processes in the business unit. In our experiments with xFraud on real transaction networks with up to 1.1 billion nodes and 3.7 billion edges, xFraud is able to outperform various baseline models in many evaluation metrics while remaining scalable in distributed settings. In addition, we show that xFraud explainer can generate reasonable explanations to significantly assist the business analysis via both quantitative and qualitative evaluations.

preprint2021arXiv

Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with $1/n$ Parameters

Recent works have demonstrated reasonable success of representation learning in hypercomplex space. Specifically, "fully-connected layers with Quaternions" (4D hypercomplex numbers), which replace real-valued matrix multiplications in fully-connected layers with Hamilton products of Quaternions, both enjoy parameter savings with only 1/4 learnable parameters and achieve comparable performance in various applications. However, one key caveat is that hypercomplex space only exists at very few predefined dimensions (4D, 8D, and 16D). This restricts the flexibility of models that leverage hypercomplex multiplications. To this end, we propose parameterizing hypercomplex multiplications, allowing models to learn multiplication rules from data regardless of whether such rules are predefined. As a result, our method not only subsumes the Hamilton product, but also learns to operate on any arbitrary nD hypercomplex space, providing more architectural flexibility using arbitrarily $1/n$ learnable parameters compared with the fully-connected layer counterpart. Experiments of applications to the LSTM and Transformer models on natural language inference, machine translation, text style transfer, and subject verb agreement demonstrate architectural flexibility and effectiveness of the proposed approach.

preprint2021arXiv

Crystal and Electronic Structure of GaTa$_4$Se$_8$ From First-Principle Calculations

GaTa$_4$Se$_8$ belongs to the lacunar spinel family. Its crystal structures is still a puzzle though there have been intensive studies on its novel properties, such as the Mott insulator phase and superconductivity under pressure. In this work, we investigate its phonon spectra through first-principle calculations and proposed it most probably has crystal structure phase transition, which is consistent with several experimental observations. For the prototype lacunar spinel with cubic symmetry of space group $F\bar{4}3m$, its phonon spectra have three soft modes in the whole Brillouin zone, indicating the strong dynamical instability of such crystal structure. In order to find the dynamically stable crystal structure, further calculations indicate two new structures of GaTa$_4$Se$_8$, corresponding to $R3m$ and $P\bar{4}2_{1}m$, verifying that at the ambient pressure, there does exist structure phase transition of GaTa$_4$Se$_8$ from $F\bar{4}3m$ to other structures when the temperature is lowered. We also performed electronic structure calculation for $R3m$ and $P\bar{4}2_{1}m$ structure, showing that $P\bar{4}2_{1}m$ structure GaTa$_4$Se$_8$ is band insulator, and obtained Mott insulator state for $R3m$ structure by DMFT calculation under single-band Hubbard model picture when interaction parameter U is larger than 0.40 eV vs. band width of 0.25 eV. It is reasonable to assume that while lowering the temperature, $F\bar{4}3m$ structure GaTa$_4$Se$_8$ becomes $R3m$ structure GaTa$_4$Se$_8$ first, then $P\bar{4}2_{1}m$ structure GaTa$_4$Se$_8$, because of the symmetry of $P\bar{4}2_{1}m$ is lower than $R3m$ after Jahn-Teller distortion. The structure transition may explain the magnetic susceptibility anomalous at low temperature.

preprint2021arXiv

Experimental evidence on the dissipationless transport of chiral edge state of the high-field Chern insulator in MnBi2Te4 nanodevices

We demonstrate the dissipationless transport of the chiral edge state (CES) in the nanodevices of quantum anomalous Hall insulator candidate MnBi2Te4. The device presents a near-zero longitudinal resistance together with a quantized Hall plateau in excess of 0.97 h/e2 over a range of temperatures from very low up to the Neel temperature of 22 K. Each of four-probe nonlocal measurements gives near-zero resistance and two-probe measurements exhibit a plateau of +1 h/e2, while the results of three-probe nonlocal measurements depend on the magnetic field. This indicates non-dissipation as well as the chirality of the edge state. The CES shows three regimes of temperature dependence, i.e., well-preserved dissipationless transport below 6 K, variable range hopping while increasing the temperature and thermal activation at higher than 22 K. Even at the lowest temperature, a current of over 1.4 μA breaks the dissipationless transport. These form a complete set of evidences of the Chern insulator state in the MnBi2Te4 systems.

preprint2021arXiv

MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions

Deep neural networks are widely used in personalized recommendation systems. Unlike regular DNN inference workloads, recommendation inference is memory-bound due to the many random memory accesses needed to lookup the embedding tables. The inference is also heavily constrained in terms of latency because producing a recommendation for a user must be done in about tens of milliseconds. In this paper, we propose MicroRec, a high-performance inference engine for recommendation systems. MicroRec accelerates recommendation inference by (1) redesigning the data structures involved in the embeddings to reduce the number of lookups needed and (2) taking advantage of the availability of High-Bandwidth Memory (HBM) in FPGA accelerators to tackle the latency by enabling parallel lookups. We have implemented the resulting design on an FPGA board including the embedding lookup step as well as the complete inference process. Compared to the optimized CPU baseline (16 vCPU, AVX2-enabled), MicroRec achieves 13.8~14.7x speedup on embedding lookup alone and 2.5$~5.4x speedup for the entire recommendation inference in terms of throughput. As for latency, CPU-based engines needs milliseconds for inferring a recommendation while MicroRec only takes microseconds, a significant advantage in real-time recommendation systems.

preprint2021arXiv

Mimicing the Kane-Mele type spin orbit interaction by spin-flexual phonon coupling in graphene devices

On the efforts of enhancing the spin orbit interaction (SOI) of graphene for seeking the dissipationless quantum spin Hall devices, unique Kane-Mele type SOI and high mobility samples are desired. However, common external decoration often introduces extrinsic Rashba-type SOI and simultaneous impurity scattering. Here we show, by the EDTA-Dy molecule decorating, the Kane-Mele type SOI is mimicked with even improved carrier mobility. It is evidenced by the suppressed weak localization at equal carrier densities and simultaneous Elliot-Yafet spin relaxation. The extracted spin scattering time is monotonically dependent on the carrier elastic scattering time, where the Elliot-Yafet plot gives the interaction strength of 3.3 meV. Improved quantum Hall plateaus can be even seen after the external operation. This is attributed to the spin-flexural phonon coupling induced by the enhanced graphene ripples, as revealed by the in-plane magnetotransport measurement.

preprint2021arXiv

Nature of the bonded-to-atomic transition in liquid silica to TPa pressures

First-principles calculations and analysis of the thermodynamic, structural, and electronic properties of liquid SiO$_2$ characterize the bonded-to-atomic transition at 0.1--1.6 TPa and 10$^4$--10$^5$ K (1--7 eV), the high-energy-density regime relevant to understanding planetary interiors. We find strong ionic bonds that become short-lived due to high kinetics during the transition, with sensitivity of the transition temperature to pressure, and our calculated Hugoniots agree with past experimental data. These results reconcile previous experimental and theoretical findings by clarifying the nature of the bond dissociation process in early Earth and "rocky" (oxide) constituents of large planets.

preprint2021arXiv

Quantitative analysis of diffraction by liquids using a pink-spectrum X-ray source

We describes a new approach for performing quantitative structure-factor analysis and density measurements of liquids using x-ray diffraction with a pink-spectrum x-ray source. The methodology corrects for the pink beam effect by performing a Taylor series expansion of the diffraction signal. The mean density, background scale factor, peak x-ray energy about which the expansion is performed, and the cutoff radius for density measurement are estimated using the derivative-free optimization scheme. The formalism is demonstrated for a simulated radial distribution function for tin. Finally, the proposed methodology is applied to experimental data on shock compressed tin recorded at the Dynamic Compression Sector at the Advanced Photon Source, with derived densities comparing favorably to other experimental results and the equations of state of tin.

preprint2021arXiv

Topology Aware Deep Learning for Wireless Network Optimization

Data-driven machine learning approaches have recently been proposed to facilitate wireless network optimization by learning latent knowledge from historical optimization instances. However, existing methods do not well handle the topology information that directly impacts the network optimization results. Directly operating on simple representations, e.g., adjacency matrices, results in poor generalization performance as the learned results depend on specific ordering of the network elements in the training data. To address this issue, we propose a two-stage topology-aware machine learning framework (TALF), which trains a graph embedding unit and a deep feed-forward network (DFN) jointly. By propagating and summarizing the underlying graph topological information, TALF encodes the topology in the vector representation of the optimization instance, which is used by the later DFN to infer critical structures of an optimal or near-optimal solution. The proposed approach is evaluated on a canonical wireless network flow problem with diverse network typologies and flow deployments. In-depth study on trade-off between efficiency and effectiveness of the inference results is also conducted, and we show that our approach is better at differentiate links by saving up to 60% computation time at over 90% solution quality.

preprint2021arXiv

TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition

The autoregressive (AR) models, such as attention-based encoder-decoder models and RNN-Transducer, have achieved great success in speech recognition. They predict the output sequence conditioned on the previous tokens and acoustic encoded states, which is inefficient on GPUs. The non-autoregressive (NAR) models can get rid of the temporal dependency between the output tokens and predict the entire output tokens in at least one step. However, the NAR model still faces two major problems. On the one hand, there is still a great gap in performance between the NAR models and the advanced AR models. On the other hand, it's difficult for most of the NAR models to train and converge. To address these two problems, we propose a new model named the two-step non-autoregressive transformer(TSNAT), which improves the performance and accelerating the convergence of the NAR model by learning prior knowledge from a parameters-sharing AR model. Furthermore, we introduce the two-stage method into the inference process, which improves the model performance greatly. All the experiments are conducted on a public Chinese mandarin dataset ASIEHLL-1. The results show that the TSNAT can achieve a competitive performance with the AR model and outperform many complicated NAR models.

Shuai Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

74 published item(s)

A quantum machine learning classifier to search for new physics

A Survey on Mapping Digital Systems with Bill of Materials: Development, Practices, and Challenges

Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning

DualKV: Shared-Prompt Flash Attention for Efficient RL Training with Large Rollouts and Long Contexts

MaxCode: A Max-Reward Reinforcement Learning Framework for Automated Code Optimization

Revisiting Privacy Preservation in Brain-Computer Interfaces: Conceptual Boundaries, Risk Pathways, and a Protection-Strength Grading Framework

Superconductivity in Electron Liquids: Precision Many-Body Treatment of Coulomb Interaction

UniD-Shift: Towards Unified Semantic Segmentation via Interpretable Share-Private Multimodal Decomposition

Comprehensive Study of Phonon Chirality under Symmetry Constraints

Offline Imitation Learning with Variational Counterfactual Reasoning

A Tight Three-parameter Correlation and Related Classification on Gamma-Ray Bursts

Charge Carrier Mediation and Ferromagnetism induced in MnBi6Te10 Magnetic Topological Insulators by antimony doping

Chat-to-Design: AI Assisted Personalized Fashion Design

Divergence-aware Federated Self-Supervised Learning

EasyFL: A Low-code Federated Learning Platform For Dummies

Federated Unsupervised Domain Adaptation for Face Recognition

Giant and Reversible Electronic Structure Evolution in a Magnetic Topological Material EuCd2As2

How does unlabeled data improve generalization in self-training? A one-hidden-layer theoretical analysis

Large Exchange Bias Effect and Coverage-Dependent Interfacial Coupling in CrI3/MnBi2Te4 van der Waals Heterostructures

Modelling graph dynamics in fraud detection with "Attention"

Nanoscale three-dimensional magnetic sensing with a probabilistic nanomagnet driven by spin-orbit torque

Optimizing Performance of Federated Person Re-identification: Benchmarking and Analysis

Reducing language context confusion for end-to-end code-switching automatic speech recognition

Secure two-way fiber-optic time transfer against sub-ns asymmetric delay attack

SLAM-TKA: Real-time Intra-operative Measurement of Tibial Resection Plane in Conventional Total Knee Arthroplasty

Smart Multi-tenant Federated Learning

Topological States in Chevrel Phase Materials from First-principle Calculations

Two-dimensional Obstructed Atomic Insulators with Fractional Corner Charge in MA$_2$Z$_4$ Family

xFraud: Explainable Fraud Transaction Detection

Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with $1/n$ Parameters

Crystal and Electronic Structure of GaTa$_4$Se$_8$ From First-Principle Calculations

Experimental evidence on the dissipationless transport of chiral edge state of the high-field Chern insulator in MnBi2Te4 nanodevices

MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions

Mimicing the Kane-Mele type spin orbit interaction by spin-flexual phonon coupling in graphene devices

Nature of the bonded-to-atomic transition in liquid silica to TPa pressures

Quantitative analysis of diffraction by liquids using a pink-spectrum X-ray source

Topology Aware Deep Learning for Wireless Network Optimization

TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition

A Gd@C82-based single molecular electret device with switchable electrical polarization

A Practical Chinese Dependency Parser Based on A Large-scale Dataset

Accelerating Auxiliary-Field Quantum Monte Carlo Simulations of Solids with Graphical Processing Unit

Automated Radiological Report Generation For Chest X-Rays With Weakly-Supervised End-to-End Deep Learning

Computational prediction of RNA tertiary structures using machine learning methods

DeGNN: Characterizing and Improving Graph Neural Networks with Graph Decomposition

Fast Learning of Graph Neural Networks with Guaranteed Generalizability: One-hidden-layer Case

First-Principles Equation of State Database for Warm Dense Matter Computation

Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition

Magnesium Oxide at Extreme Temperatures and Pressures Studied with First-Principles Simulations

Phase transformation in boron under shock compression

QMCPACK: Advances in the development, efficiency, and application of auxiliary field and real-space variational and diffusion Quantum Monte Carlo

Quantifying the dynamics of protein self-organization using deep learning analysis of atomic force microscopy data

Rnn-transducer with language bias for end-to-end Mandarin-English code-switching speech recognition

Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition

Synchronous Transformers for End-to-End Speech Recognition

TensorCoder: Dimension-Wise Attention via Tensor Representation for Natural Language Modeling

Trained Rank Pruning for Efficient Deep Neural Networks

Trained Rank Pruning for Efficient Deep Neural Networks

TRP: Trained Rank Pruning for Efficient Deep Neural Networks

Experimental observation of the gate-controlled reversal of the anomalous Hall effect in the intrinsic magnetic topological insulator MnBi2Te4 device

Magneto-transport and Shubnikov-de Haas oscillations in the layered ternary telluride Ta3SiTe6 topological semimetal

Quantum-critical phase out of frustrated magnetism in a strongly correlated metal

First-Principles Equation of State and Electronic Properties of Warm Dense Oxygen

Minimization of Transformed $L_1$ Penalty: Closed Form Representation and Iterative Thresholding Algorithms

Study on temperature coefficient of CdTe detector used for X-rays detection

Transformed Schatten-1 Iterative Thresholding Algorithms for Low Rank Matrix Completion

Weak antilocalization in Cd3As2 thin films

High-pressure, temperature elasticity of Fe- and Al-bearing MgSiO3: implications for the Earth's lower mantle

Orbital and Pauli limiting effects in heavily doped Ba$_{1-x}$K$_x$Fe$_2$As$_2$

Robust linear magnetoresistance in WTe2

The long-lasting optical afterglow plateau of short burst GRB 130912A

The magnetization degree of the outflow powering the highly-polarized reverse shock emission of GRB 120308A

Controlling the Error Floor in LDPC Decoding

H4O and other hydrogen-oxygen compounds at giant-planet core pressures

On the Dynamics of the Error Floor Behavior in (Regular) LDPC Codes