Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
19topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

BubbleSpec: Turning Long-Tail Bubbles into Speculative Rollout Drafts for Synchronous Reinforcement Learning

Reinforcement Learning (RL) has become a cornerstone for improving the performance of Large Language Models (LLMs). However, its rollout phase constitutes a significant efficiency bottleneck, mainly arising from the long-tail bubbles across data parallel ranks, particularly in long-context scenarios where faster GPUs remain idle while waiting for stragglers. Existing solutions, such as partial rollout or asynchronous RL, mitigate these bubbles by compromising the algorithm's strict synchronous nature. Instead, we propose BubbleSpec, a novel framework that accelerates RL rollouts while strictly keeping the mathematical exactness. Instead of attempting to eliminate bubbles, BubbleSpec exploits them. We exploit the idle time windows of faster ranks to pre-generate rollout results for subsequent steps, serving as drafts for speculative decoding. Unlike prior speculative methods that rely on historical epoch similarity and warm-ups, BubbleSpec is agnostic to dataset size and provides immediate acceleration from the onset of training. Extensive evaluations demonstrate that BubbleSpec reduces decoding steps by 50% and increases rollout throughput by up to 1.8x. Critically, BubbleSpec is seamlessly compatible with various RL frameworks and strategies as it sustains the strict synchronous property of RL algorithms.

preprint2026arXiv

CCL-D: A High-Precision Diagnostic System for Slow and Hang Anomalies in Large-Scale Model Training

As training scales grow, collective communication libraries (CCL) increasingly face anomalies arising from complex interactions among hardware, software, and environmental factors. These anomalies typically manifest as slow/hang communication, the most frequent and time-consuming category to diagnose. However, traditional diagnostic methods remain inaccurate and inefficient, frequently requiring hours or even days for root cause analysis. To address this, we propose CCL-D, a high-precision diagnostic system designed to detect and locate slow/hang anomalies in large-scale distributed training. CCL-D integrates a rank-level real-time probe with an intelligent decision analyzer. The probe measures cross-layer anomaly metrics using a lightweight distributed tracing framework to monitor communication traffic. The analyzer performs automated anomaly detection and root-cause location, precisely identifying the faulty GPU rank. Deployed on a 4,000-GPU cluster over one year, CCL-D achieved near-complete coverage of known slow/hang anomalies and pinpointed affected ranks within 6 minutes-substantially outperforming existing solutions.

preprint2026arXiv

Improving the accuracy and generalizability of molecular property regression models with a substructure-substitution-rule-informed framework

Artificial Intelligence (AI)-aided drug discovery is an active research field, yet AI models often exhibit poor accuracy in regression tasks for molecular property prediction, and perform catastrophically poorly for out-of-distribution (OOD) molecules. Here, we present MolRuleLoss, a substructure-substitution-rule-informed framework that improves the accuracy and generalizability of multiple molecular property regression models (MPRMs) such as GEM and UniMol for diverse molecular property prediction tasks. MolRuleLoss incorporates partial derivative constraints for substructure substitution rules (SSRs) into an MPRM's loss function. When using GEM models for predicting lipophilicity, water solubility, and solvation-free energy (using lipophilicity, ESOL, and freeSolv datasets from MoleculeNet), the root mean squared error (RMSE) values with and without MolRuleLoss were 0.587 vs. 0.660, 0.777 vs. 0.798, and 1.252 vs. 1.877, respectively, representing 2.6-33.3% performance improvements. We show that both the number and the quality of SSRs contribute to the magnitude of prediction accuracy gains obtained upon adding MolRuleLoss to an MPRM. MolRuleLoss improved the generalizability of MPRMs for "activity cliff" molecules in a lipophilicity prediction task and improved the generalizability of MPRMs for OOD molecules in a melting point prediction task. In a molecular weight prediction task for OOD molecules, MolRuleLoss reduced the RMSE value of a GEM model from 29.507 to 0.007. We also provide a formal demonstration that the upper bound of the variation for property change of SSRs is positively correlated with an MPRM's error. Together, we show that using the MolRuleLoss framework as a bolt-on boosts the prediction accuracy and generalizability of multiple MPRMs, supporting diverse applications in areas like cheminformatics and AI-aided drug discovery.

preprint2022arXiv

A unified theory of information transfer and causal relation

Information transfer between coupled stochastic dynamics, measured by transfer entropy and information flow, is suggested as a physical process underlying the causal relation of systems. While information transfer analysis has booming applications in both science and engineering fields, critical mysteries about its foundations remain unsolved. Fundamental yet difficult questions concern how information transfer and causal relation originate, what they depend on, how they differ from each other, and if they are created by a unified and general quantity. These questions essentially determine the validity of causal relation measurement via information transfer. Here we pursue to lay a complete theoretical basis of information transfer and causal relation. Beyond the well-known relations between these concepts that conditionally hold, we demonstrate that information transfer and causal relation universally originate from specific information synergy and redundancy phenomena characterized by high-order mutual information. More importantly, our theory analytically explains the mechanisms for information transfer and causal relation to originate, vanish, and differ from each other. Moreover, our theory naturally defines the effect sizes of information transfer and causal relation based on high-dimensional coupling events. These results may provide a unified view of information, synergy, and causal relation to bridge Pearl's causal inference theory in computer science and information transfer analysis in physics.

preprint2022arXiv

Bridging the information and dynamics attributes of neural activities

The brain works as a dynamic system to process information. Various challenges remain in understanding the connection between information and dynamics attributes in the brain. The present research pursues exploring how the characteristics of neural information functions are linked to neural dynamics. We attempt to bridge dynamics (e.g., Kolmogorov-Sinai entropy) and information (e.g., mutual information and Fisher information) metrics on the stimulus-triggered stochastic dynamics in neural populations. On the one hand, our unified analysis identifies various essential features of the information-processing-related neural dynamics. We discover spatiotemporal differences in the dynamic randomness and chaotic degrees of neural dynamics during neural information processing. On the other hand, our framework reveals the fundamental role of neural dynamics in shaping neural information processing. The neural dynamics creates an oppositely directed variation of encoding and decoding properties under specific conditions, and it determines the neural representation of stimulus distribution. Overall, our findings demonstrate a potential direction to explain the emergence of neural information processing from neural dynamics and help understand the intrinsic connections between the informational and the physical brain.

preprint2022arXiv

Information Evolution in Complex Networks

Many biological phenomena or social events critically depend on how information evolves in complex networks. However, a general theory to characterize information evolution is yet absent. Consequently, numerous unknowns remain about the mechanisms underlying information evolution. Among these unknowns, a fundamental problem, being a seeming paradox, lies in the coexistence of local randomness, manifested as the stochastic distortion of information content during individual-individual diffusion, and global regularity, illustrated by specific non-random patterns of information content on the network scale. Here, we attempt to formalize information evolution and explain the coexistence of randomness and regularity in complex networks. Applying network dynamics and information theory, we discover that a certain amount of information, determined by the selectivity of networks to the input information, frequently survives from random distortion. Other information will inevitably experience distortion or dissipation, whose speeds are shaped by the diversity of information selectivity in networks. The discovered laws exist irrespective of noise, but the noise accounts for the intensification. We further demonstrate the ubiquity of our discovered laws by analyzing the emergence of neural tuning properties in the primary visual and medial temporal cortices of animal brains and the emergence of extreme opinions in social networks.

preprint2022arXiv

Resilience in Industrial Internet of Things Systems: A Communication Perspective

Industrial Internet of Things is an ultra-large-scale system that is much more sophisticated and fragile than conventional industrial platforms. The effective management of such a system relies heavily on the resilience of the network, especially the communication part. Imperative as resilient communication is, there is not enough attention from literature and a standardized framework is still missing. In awareness of these, this paper intends to provide a systematic overview of resilience in IIoT with a communication perspective, aiming to answer the questions of why we need it, what it is, how to enhance it, and where it can be applied. Specifically, we emphasize the urgency of resilience studies via examining existing literature and analyzing malfunction data from a real satellite communication system. Resilience-related concepts and metrics, together with standardization efforts are then summarized and discussed, presenting a basic framework for analyzing the resilience of the system before, during, and after disruptive events. On the basis of the framework, key resilience concerns associated with the design, deployment, and operation of IIoT are briefly described to shed light on the methods for resilience enhancement. Promising resilient applications in different IIoT sectors are also introduced to highlight the opportunities and challenges in practical implementations.

preprint2022arXiv

Self-organized critical dynamics of RNA virus evolution

RNA virus (e.g., SARS-CoV-2) evolves in a complex manner. Studying RNA virus evolution is vital for understanding molecular evolution and medicine development. Scientists lack, however, general frameworks to characterize the dynamics of RNA virus evolution directly from empirical data and identify potential physical laws. To fill this gap, we present a theory to characterize the RNA virus evolution as a physical system with absorbing states and avalanche behaviors. This approach maps accessible biological data (e.g., phylogenetic tree and infection) to a general stochastic process of RNA virus infection and evolution, enabling researchers to verify potential self-organized criticality underlying RNA virus evolution. We apply our framework to SARS-CoV-2, the virus accounting for the global epidemic of COVID-19. We find that SARS-CoV-2 exhibits scale-invariant avalanches as mean-field theory predictions. The observed scaling relation, universal collapse, and slowly decaying auto-correlation suggest a self-organized critical dynamics of SARS-CoV-2 evolution. Interestingly, the lineages that emerge from critical evolution processes coincidentally match with threatening lineages of SARS-CoV-2 (e.g., the Delta virus). We anticipate our approach to be a general formalism to portray RNA virus evolution and help identify potential virus lineages to be concerned.

preprint2021arXiv

Thermodynamics of Encoding and Encoders

Non-isolated systems have diverse coupling relations with the external environment. These relations generate complex thermodynamics and information transmission between the system and its environment. The framework depicted in the current research attempts to glance at the critical role of the internal orders inside the non-isolated system in shaping the information thermodynamics coupling. We characterize the coupling as a generalized encoding process, where the system acts as an information thermodynamics encoder to encode the external information based on thermodynamics. We formalize the encoding process in the context of the nonequilibrium second law of thermodynamics, revealing an intrinsic difference in information thermodynamics characteristics between information thermodynamics encoders with and without internal correlations. During the information encoding process of an external source $\mathsf{Y}$, specific sub-systems in an encoder $\mathsf{X}$ with internal correlations can exceed the information thermodynamics bound on $\left(\mathsf{X},\mathsf{Y}\right)$ and encode more information than system $\mathsf{X}$ works as a whole. We computationally verify this theoretical finding in an Ising model with a random external field and a neural data set of the human brain during visual perception and recognition. Our analysis demonstrates that the stronger internal correlation inside these systems implies a higher possibility for specific sub-systems to encode more information than the global one. These findings may suggest a new perspective in studying information thermodynamics in diverse physical and biological systems.

preprint2020arXiv

A General Verification for Functional Completeness by Abstract Operators

An operator set is functionally incomplete if it can not represent the full set $\lbrace \neg,\vee,\wedge,\rightarrow,\leftrightarrow\rbrace$. The verification for the functional incompleteness highly relies on constructive proofs. The judgement with a large untested operator set can be inefficient. Given with a mass of potential operators proposed in various logic systems, a general verification method for their functional completeness is demanded. This paper offers an universal verification for the functional completeness. Firstly, we propose two abstract operators $\widehat{R}$ and $\breve{R}$, both of which have no fixed form and are only defined by several weak constraints. Specially, $\widehat{R}_{\geq}$ and $\breve{R}_{\geq}$ are the abstract operators defined with the total order relation $\geq$. Then, we prove that any operator set $\mathfrak{R}$ is functionally complete if and only if it can represent the composite operator $\widehat{R}_{\geq}\circ\breve{R}_{\geq}$ or $\breve{R}_{\geq}\circ\widehat{R}_{\geq}$. Otherwise $\mathfrak{R}$ is determined to be functionally incomplete. This theory can be generally applied to any untested operator set to determine whether it is functionally complete.

preprint2020arXiv

Comparing SNNs and RNNs on Neuromorphic Vision Datasets: Similarities and Differences

Neuromorphic data, recording frameless spike events, have attracted considerable attention for the spatiotemporal information components and the event-driven processing fashion. Spiking neural networks (SNNs) represent a family of event-driven models with spatiotemporal dynamics for neuromorphic computing, which are widely benchmarked on neuromorphic data. Interestingly, researchers in the machine learning community can argue that recurrent (artificial) neural networks (RNNs) also have the capability to extract spatiotemporal features although they are not event-driven. Thus, the question of "what will happen if we benchmark these two kinds of models together on neuromorphic data" comes out but remains unclear. In this work, we make a systematic study to compare SNNs and RNNs on neuromorphic data, taking the vision datasets as a case study. First, we identify the similarities and differences between SNNs and RNNs (including the vanilla RNNs and LSTM) from the modeling and learning perspectives. To improve comparability and fairness, we unify the supervised learning algorithm based on backpropagation through time (BPTT), the loss function exploiting the outputs at all timesteps, the network structure with stacked fully-connected or convolutional layers, and the hyper-parameters during training. Especially, given the mainstream loss function used in RNNs, we modify it inspired by the rate coding scheme to approach that of SNNs. Furthermore, we tune the temporal resolution of datasets to test model robustness and generalization. At last, a series of contrast experiments are conducted on two types of neuromorphic datasets: DVS-converted (N-MNIST) and DVS-captured (DVS Gesture).