Source author record

He Bai

He Bai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Systems and Control eess.SY math.OC cond-mat.mtrl-sci Machine Learning Computation and Language Artificial Intelligence cond-mat.mes-hall Multiagent Systems cond-mat.str-el Distributed, Parallel, and Cluster Computing eess.AS Human-Computer Interaction math.DS math.ST Sound Statistics Theory

Catalog footprint

What is connected

18works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

TRACER: Verifiable Generative Provenance for Multimodal Tool-Using Agents

Multimodal large language models increasingly solve vision-centric tasks by calling external tools for visual inspection, OCR, retrieval, calculation, and multi-step reasoning. Current tool-using agents usually expose the executed tool trajectory and the final answer, but they rarely specify which tool observation supports each generated claim. We call this missing claim-level dependency structure the provenance gap. The gap makes tool use hard to verify and hard to optimize, because useful evidence, redundant exploration, and unsupported reasoning are mixed in the same trajectory. We introduce TRACER, a framework for verifiable generative provenance in multimodal tool-using agents. Instead of adding citations after generation, TRACER generates each answer sentence together with a structured provenance record that identifies the supporting tool turn, evidence unit, and semantic support relation. Its relation space contains Quotation, Compression, and Inference, covering direct reuse, faithful condensation, and grounded derivation. TRACER verifies each record through schema checking, tool-turn alignment, source authenticity, and relation rationality, and then converts verified provenance into traceability constraints and provenance-derived local credit for reinforcement learning. We further construct TRACE-Bench, a benchmark for sentence-level provenance reconstruction from coarse multimodal tool trajectories. On TRACE-Bench, simply adding tools often introduces noise. With Qwen3-VL-8B, TRACER reaches 78.23% answer accuracy and 95.72% summary accuracy, outperforming the strongest closed-source tool-augmented baseline by 23.80 percentage points. Compared with tool-only supervised fine-tuning, it also reduces total test-set tool calls from 4949 to 3486. These results show that reliable multimodal tool reasoning depends on provenance-aware use of observations, not on more tool calls alone.

preprint2024arXiv

Asynchronous Local Computations in Distributed Bayesian Learning

Due to the expanding scope of machine learning (ML) to the fields of sensor networking, cooperative robotics and many other multi-agent systems, distributed deployment of inference algorithms has received a lot of attention. These algorithms involve collaboratively learning unknown parameters from dispersed data collected by multiple agents. There are two competing aspects in such algorithms, namely, intra-agent computation and inter-agent communication. Traditionally, algorithms are designed to perform both synchronously. However, certain circumstances need frugal use of communication channels as they are either unreliable, time-consuming, or resource-expensive. In this paper, we propose gossip-based asynchronous communication to leverage fast computations and reduce communication overhead simultaneously. We analyze the effects of multiple (local) intra-agent computations by the active agents between successive inter-agent communications. For local computations, Bayesian sampling via unadjusted Langevin algorithm (ULA) MCMC is utilized. The communication is assumed to be over a connected graph (e.g., as in decentralized learning), however, the results can be extended to coordinated communication where there is a central server (e.g., federated learning). We theoretically quantify the convergence rates in the process. To demonstrate the efficacy of the proposed algorithm, we present simulations on a toy problem as well as on real world data sets to train ML models to perform classification tasks. We observe faster initial convergence and improved performance accuracy, especially in the low data range. We achieve on average 78% and over 90% classification accuracy respectively on the Gamma Telescope and mHealth data sets from the UCI ML repository.

preprint2022arXiv

A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing

Recently, speech representation learning has improved many speech-related tasks such as speech recognition, speech classification, and speech-to-text translation. However, all the above tasks are in the direction of speech understanding, but for the inverse direction, speech synthesis, the potential of representation learning is yet to be realized, due to the challenging nature of generating high-quality speech. To address this problem, we propose our framework, Alignment-Aware Acoustic-Text Pretraining (A$^3$T), which reconstructs masked acoustic signals with text input and acoustic-text alignment during training. In this way, the pretrained model can generate high quality reconstructed spectrogram, which can be applied to the speech editing and unseen speaker TTS directly. Experiments show A$^3$T outperforms SOTA models on speech editing, and improves multi-speaker speech synthesis without the external speaker verification model.

preprint2022arXiv

Better Language Model with Hypernym Class Prediction

Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs. In this study, we revisit this approach in the context of neural LMs. We hypothesize that class-based prediction leads to an implicit context aggregation for similar words and thus can improve generalization for rare words. We map words that have a common WordNet hypernym to the same class and train large neural LMs by gradually annealing from predicting the class to token prediction during training. Empirically, this curriculum learning strategy consistently improves perplexity over various large, highly-performant state-of-the-art Transformer-based models on two datasets, WikiText-103 and Arxiv. Our analysis shows that the performance improvement is achieved without sacrificing performance on rare words. Finally, we document other attempts that failed to yield empirical gains, and discuss future directions for the adoption of class-based LMs on a larger scale.

preprint2022arXiv

Braiding lateral morphotropic grain boundary in homogeneitic oxides

Interfaces formed by correlated oxides offer a critical avenue for discovering emergent phenomena and quantum states. However, the fabrication of oxide interfaces with variable crystallographic orientations and strain states integrated along a film plane is extremely challenge by conventional layer-by-layer stacking or self-assembling. Here, we report the creation of morphotropic grain boundaries (GBs) in laterally interconnected cobaltite homostructures. Single-crystalline substrates and suspended ultrathin freestanding membranes provide independent templates for coherent epitaxy and constraint on the growth orientation, resulting in seamless and atomically sharp GBs. Electronic states and magnetic behavior in hybrid structures are laterally modulated and isolated by GBs, enabling artificially engineered functionalities in the planar matrix. Our work offers a simple and scalable method for fabricating unprecedented innovative interfaces through controlled synthesis routes as well as provides a platform for exploring potential applications in neuromorphics, solid state batteries, and catalysis.

preprint2022arXiv

Distributed Cooperative Multi-Agent Reinforcement Learning with Directed Coordination Graph

Existing distributed cooperative multi-agent reinforcement learning (MARL) frameworks usually assume undirected coordination graphs and communication graphs while estimating a global reward via consensus algorithms for policy evaluation. Such a framework may induce expensive communication costs and exhibit poor scalability due to requirement of global consensus. In this work, we study MARLs with directed coordination graphs, and propose a distributed RL algorithm where the local policy evaluations are based on local value functions. The local value function of each agent is obtained by local communication with its neighbors through a directed learning-induced communication graph, without using any consensus algorithm. A zeroth-order optimization (ZOO) approach based on parameter perturbation is employed to achieve gradient estimation. By comparing with existing ZOO-based RL algorithms, we show that our proposed distributed RL algorithm guarantees high scalability. A distributed resource allocation example is shown to illustrate the effectiveness of our algorithm.

preprint2022arXiv

Emergent magnetic states and tunable exchange bias at all 3d nitride heterointerfaces

Interfacial magnetism stimulates the discovery of giant magnetoresistance and spin-orbital coupling across the heterointerfaces, facilitating the intimate correlation between spin transport and complex magnetic structures. Over decades, functional heterointerfaces composed of nitrides are seldomly explored due to the difficulty in synthesizing high-quality and correct composition nitride films. Here we report the fabrication of single-crystalline ferromagnetic Fe3N thin films with precisely controlled thickness. As film thickness decreasing, the magnetization deteriorates dramatically, and electronic state transits from metallic to insulating. Strikingly, the high-temperature ferromagnetism maintains in a Fe3N layer with a thickness down to 2 u. c. (~ 8 Å). The magnetoresistance exhibits a strong in-plane anisotropy and meanwhile the anomalous Hall resistance reserves its sign when Fe3N layer thickness exceeds 5 u. c. Furthermore, we observe a sizable exchange bias at the interfaces between a ferromagnetic Fe3N and an antiferromagnetic CrN. The exchange bias field and saturation moment strongly depend on the controllable bending curvature using cylinder diameter engineering (CDE) technique, implying the tunable magnetic states under lattice deformation. This work provides a guideline for exploring functional nitride films and applying their interfacial phenomena for innovative perspectives towards the practical applications.

preprint2022arXiv

Quantifying the Dzyaloshinskii-Moriya Interaction Induced by the Bulk Magnetic Asymmetry

A broken interfacial inversion symmetry in ultrathin ferromagnet/heavy metal (FM/HM) bilayers is generally believed to be a prerequisite for accommodating the Dzyaloshinskii-Moriya interaction (DMI) and for stabilizing chiral spin textures. In these bilayers, the strength of the DMI decays as the thickness of the FM layer increases and vanishes around a few nanometers. In the present study, through synthesizing relatively thick films of compositions CoPt or FePt, CoCu or FeCu, FeGd and FeNi, contributions to DMI from the composition gradient induced bulk magnetic asymmetry (BMA) and spin-orbit coupling (SOC) are systematically examined. Using Brillouin light scattering spectroscopy, both the sign and amplitude of DMI in films with controllable direction and strength of BMA, in the presence and absence of SOC are experimentally studied. In particular, we show that a sizable amplitude of DMI (0.15 mJ/m^2) can be realized in CoPt or FePt films with BMA and strong SOC, whereas negligible DMI strengths are observed in other thick films with BMA but without significant SOC. The pivotal roles of BMA and SOC are further examined based on the three-site Fert-Levy model and first-principles calculations. It is expected that our findings may help to further understand the origin of chiral magnetism and to design novel non-collinear spin textures.

preprint2021arXiv

A Decentralized Approach to Bayesian Learning

Motivated by decentralized approaches to machine learning, we propose a collaborative Bayesian learning algorithm taking the form of decentralized Langevin dynamics in a non-convex setting. Our analysis show that the initial KL-divergence between the Markov Chain and the target posterior distribution is exponentially decreasing while the error contributions to the overall KL-divergence from the additive noise is decreasing in polynomial time. We further show that the polynomial-term experiences speed-up with number of agents and provide sufficient conditions on the time-varying step-sizes to guarantee convergence to the desired distribution. The performance of the proposed algorithm is evaluated on a wide variety of machine learning tasks. The empirical results show that the performance of individual agents with locally available data is on par with the centralized setting with considerable improvement in the convergence rate.

preprint2021arXiv

Decomposability and Parallel Computation of Multi-Agent LQR

Individual agents in a multi-agent system (MAS) may have decoupled open-loop dynamics, but a cooperative control objective usually results in coupled closed-loop dynamics thereby making the control design computationally expensive. The computation time becomes even higher when a learning strategy such as reinforcement learning (RL) needs to be applied to deal with the situation when the agents dynamics are not known. To resolve this problem, we propose a parallel RL scheme for a linear quadratic regulator (LQR) design in a continuous-time linear MAS. The idea is to exploit the structural properties of two graphs embedded in the $Q$ and $R$ weighting matrices in the LQR objective to define an orthogonal transformation that can convert the original LQR design to multiple decoupled smaller-sized LQR designs. We show that if the MAS is homogeneous then this decomposition retains closed-loop optimality. Conditions for decomposability, an algorithm for constructing the transformation matrix, a parallel RL algorithm, and robustness analysis when the design is applied to non-homogeneous MAS are presented. Simulations show that the proposed approach can guarantee significant speed-up in learning without any loss in the cumulative value of the LQR cost.

preprint2021arXiv

Dynamic Control Allocation between Onboard and Delayed Remote Control for Unmanned Aircraft System Detect-and-Avoid

This paper develops and evaluates the performance of an allocation agent to be potentially integrated into the onboard Detect and Avoid (DAA) computer of an Unmanned Aircraft System (UAS). We consider a UAS that can be fully controlled by the onboard DAA system and by a remote human pilot. With a communication channel prone to latency, we consider a mixed initiative interaction environment, where the control authority of the UAS is dynamically allocated by the allocation agent. In an encounter with a dynamic intruder, the probability of collision may increase in the absence of pilot commands in the presence of latency. Moreover, a delayed pilot command may not result in safe resolution of the current scenario and need to be improvised. We design an optimization algorithm to reduce collision risk and refine delayed pilot commands. Towards this end, a Markov Decision Process (MDP)and its solution are employed to create a wait time map. The map consists of estimated times that the UAS can wait for the remote pilot commands at each state. A command blending algorithm is designed to select an avoidance maneuver that prioritizes the pilot intention extracted from the pilot commands. The wait time map and the command blending algorithm are implemented and integrated into a closed-loop simulator. We conduct ten thousands fast-time Monte Carlo simulations and compare the performance of the integrated setup with a standalone DAA setup. The simulation results show that the allocation agent enables the UAS to wait without inducing any near mid air collision (NMAC) and severe loss of well clear (LoWC) while positively improve pilot involvement in the encounter resolution.

preprint2021arXiv

Learning Distributed Stabilizing Controllers for Multi-Agent Systems

We address the problem of model-free distributed stabilization of heterogeneous multi-agent systems using reinforcement learning (RL). Two algorithms are developed. The first algorithm solves a centralized linear quadratic regulator (LQR) problem without knowing any initial stabilizing gain in advance. The second algorithm builds upon the results of the first algorithm, and extends it to distributed stabilization of multi-agent systems with predefined interaction graphs. Rigorous proofs are provided to show that the proposed algorithms achieve guaranteed convergence if specific conditions hold. A simulation example is presented to demonstrate the theoretical results.

preprint2021arXiv

Room-temperature ferromagnetism at an oxide/nitride interface

Heterointerfaces have led to the discovery of novel electronic and magnetic states because of their strongly entangled electronic degrees of freedom. Single-phase chromium compounds always exhibit antiferromagnetism following the prediction of Goodenough-Kanamori rules. So far, exchange coupling between chromium ions via hetero-anions has not been explored and the associated quantum states is unknown. Here we report the successful epitaxial synthesis and characterizations of chromium oxide (Cr2O3)-chromium nitride (CrN) superlattices. Room-temperature ferromagnetic spin ordering is achieved at the interfaces between these two antiferromagnets, and the magnitude of the effect decays with increasing layer thickness. First-principles calculations indicate that robust ferromagnetic spin interaction between Cr3+ ions via anion-hybridizations across the interface yields the lowest total energy. This work opens the door to fundamental understanding of the unexpected and exceptional properties of oxide-nitride interfaces and provides access to hidden phases at low-dimensional quantum heterostructures.

preprint2021arXiv

Variance Reduction of Quadcopter Trajectory Tracking in Turbulent Wind

We consider a quadcopter operating in a turbulent windy environment. The turbulent environment may be imposed on a quadcopter by structures, landscapes, terrains and most importantly by the unique physical phenomena in the lower atmosphere. Turbulence can negatively impact quadcopter's performance and operations. Modeling turbulence as a stochastic random input, we investigate control designs that can reduce the turbulence effects on the quadcopter's motion. In particular, we design a minimum cost variance (MCV) controller aiming to minimize the cost in terms of its weighted sum of mean and variance. We linearize the quadcopter dynamics and examine the MCV controller derived from a set of coupled algebraic Riccati equations (CARE) with full-state feedback. Our preliminary simulation results show reduction in variance and in mean trajectory tracking error compared to a traditional linear quadratic regulator (LQR).

preprint2020arXiv

Hierarchical Control of Multi-Agent Systems using Online Reinforcement Learning

We propose a new reinforcement learning based approach to designing hierarchical linear quadratic regulator (LQR) controllers for heterogeneous linear multi-agent systems with unknown state-space models and separated control objectives. The separation arises from grouping the agents into multiple non-overlapping groups, and defining the control goal as two distinct objectives. The first objective aims to minimize a group-wise block-decentralized LQR function that models group-level mission. The second objective, on the other hand, tries to minimize an LQR function between the average states (centroids) of the groups. Exploiting this separation, we redefine the weighting matrices of the LQR functions in a way that they allow us to decouple their respective algebraic Riccati equations. Thereafter, we develop a reinforcement learning strategy that uses online measurements of the agent states and the average states to learn the respective controllers based on the approximate Riccati equations. Since the first controller is block-decentralized and, therefore, can be learned in parallel, while the second controller is reduced-dimensional due to averaging, the overall design enjoys a significantly reduced learning time compared to centralized reinforcement learning.

preprint2020arXiv

Reduced-Dimensional Reinforcement Learning Control using Singular Perturbation Approximations

We present a set of model-free, reduced-dimensional reinforcement learning (RL) based optimal control designs for linear time-invariant singularly perturbed (SP) systems. We first present a state-feedback and output-feedback based RL control design for a generic SP system with unknown state and input matrices. We take advantage of the underlying time-scale separation property of the plant to learn a linear quadratic regulator (LQR) for only its slow dynamics, thereby saving a significant amount of learning time compared to the conventional full-dimensional RL controller. We analyze the sub-optimality of the design using SP approximation theorems and provide sufficient conditions for closed-loop stability. Thereafter, we extend both designs to clustered multi-agent consensus networks, where the SP property reflects through clustering. We develop both centralized and cluster-wise block-decentralized RL controllers for such networks, in reduced dimensions. We demonstrate the details of the implementation of these controllers using simulations of relevant numerical examples and compare them with conventional RL designs to show the computational benefits of our approach.

preprint2015arXiv

GPS-Denied Relative Motion Estimation For Fixed-Wing UAV Using the Variational Pose Estimator

Relative pose estimation between fixed-wing unmanned aerial vehicles (UAVs) is treated using a stable and robust estimation scheme. The motivating application of this scheme is that of "handoff" of an object being tracked from one fixed-wing UAV to another in a team of UAVs, using onboard sensors in a GPS-denied environment. This estimation scheme uses optical measurements from cameras onboard a vehicle, to estimate both the relative pose and relative velocities of another vehicle or target object. It is obtained by applying the Lagrange-d'Alembert principle to a Lagrangian constructed from measurement residuals using only the optical measurements. This nonlinear pose estimation scheme is discretized for computer implementation using the discrete Lagrange-d'Alembert principle, with a discrete-time linear filter for obtaining relative velocity estimates from optical measurements. Computer simulations depict the stability and robustness of this estimator to noisy measurements and uncertainties in initial relative pose and velocities.

preprint2013arXiv

Output Synchronization of Nonlinear Systems under Input Disturbances

We study synchronization of nonlinear systems that satisfy an incremental passivity property. We consider the case where the control input is subject to a class of disturbances, including constant and sinusoidal disturbances with unknown phases and magnitudes and known frequencies. We design a distributed control law that recovers the synchronization of the nonlinear systems in the presence of the disturbances. Simulation results of Goodwin oscillators illustrate the effectiveness of the control law. Finally, we highlight the connection of the proposed control law to the dynamic average consensus estimator developed in [1].

He Bai

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

TRACER: Verifiable Generative Provenance for Multimodal Tool-Using Agents

Asynchronous Local Computations in Distributed Bayesian Learning

A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing

Better Language Model with Hypernym Class Prediction

Braiding lateral morphotropic grain boundary in homogeneitic oxides

Distributed Cooperative Multi-Agent Reinforcement Learning with Directed Coordination Graph

Emergent magnetic states and tunable exchange bias at all 3d nitride heterointerfaces

Quantifying the Dzyaloshinskii-Moriya Interaction Induced by the Bulk Magnetic Asymmetry

A Decentralized Approach to Bayesian Learning

Decomposability and Parallel Computation of Multi-Agent LQR

Dynamic Control Allocation between Onboard and Delayed Remote Control for Unmanned Aircraft System Detect-and-Avoid

Learning Distributed Stabilizing Controllers for Multi-Agent Systems

Room-temperature ferromagnetism at an oxide/nitride interface

Variance Reduction of Quadcopter Trajectory Tracking in Turbulent Wind

Hierarchical Control of Multi-Agent Systems using Online Reinforcement Learning

Reduced-Dimensional Reinforcement Learning Control using Singular Perturbation Approximations

GPS-Denied Relative Motion Estimation For Fixed-Wing UAV Using the Variational Pose Estimator

Output Synchronization of Nonlinear Systems under Input Disturbances