Source author record

Haibo Yang

Haibo Yang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning physics.ins-det Distributed, Parallel, and Cluster Computing Artificial Intelligence eess.SP hep-ex Information Theory math-ph math.AP math.IT math.MP nucl-ex

Catalog footprint

What is connected

10works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Learning Dynamics of Zeroth-Order Optimization: A Kernel Perspective

Classical optimization theory establishes that zeroth-order (ZO) algorithms suffer from a dimension-dependent slowdown, with convergence rates typically scaling with the model dimension compared to first-order methods. However, in contrast to these theoretical expectations, a growing body of recent work demonstrates the successful application of ZO methods to fine-tuning Large Language Models (LLMs) with billions of parameters. To explain this paradox, we derive the one-step learning dynamics of ZO SGD, where the empirical Neural Tangent Kernel (eNTK) naturally emerges as the key term governing the learning behavior. Inspection of the eNTK produced by ZO SGD reveals that each element corresponds to the inner product of neural tangent vectors projected onto a random low-dimensional subspace. Thus, by invoking the Johnson-Lindenstrauss Lemma, our analysis shows that the fidelity of the ZO eNTK is governed primarily by the number of perturbations. Crucially, the approximation error depends on the model output size rather than the massive parameter dimension. This dimension-free property provides a theoretical justification for the scalability of ZO methods to LLMs finetuning tasks. We believe that this kernel-based framework offers a novel perspective for understanding ZO methods within the context of learning dynamics.

preprint2024arXiv

Federated Multi-Objective Learning

In recent years, multi-objective optimization (MOO) emerges as a foundational problem underpinning many multi-agent multi-task learning applications. However, existing algorithms in MOO literature remain limited to centralized learning settings, which do not satisfy the distributed nature and data privacy needs of such multi-agent multi-task learning applications. This motivates us to propose a new federated multi-objective learning (FMOL) framework with multiple clients distributively and collaboratively solving an MOO problem while keeping their training data private. Notably, our FMOL framework allows a different set of objective functions across different clients to support a wide range of applications, which advances and generalizes the MOO formulation to the federated learning paradigm for the first time. For this FMOL framework, we propose two new federated multi-objective optimization (FMOO) algorithms called federated multi-gradient descent averaging (FMGDA) and federated stochastic multi-gradient descent averaging (FSMGDA). Both algorithms allow local updates to significantly reduce communication costs, while achieving the {\em same} convergence rates as those of their algorithmic counterparts in the single-objective federated learning. Our extensive experiments also corroborate the efficacy of our proposed FMOO algorithms.

preprint2022arXiv

Anarchic Federated Learning

Present-day federated learning (FL) systems deployed over edge networks consists of a large number of workers with high degrees of heterogeneity in data and/or computing capabilities, which call for flexible worker participation in terms of timing, effort, data heterogeneity, etc. To satisfy the need for flexible worker participation, we consider a new FL paradigm called "Anarchic Federated Learning" (AFL) in this paper. In stark contrast to conventional FL models, each worker in AFL has the freedom to choose i) when to participate in FL, and ii) the number of local steps to perform in each round based on its current situation (e.g., battery level, communication channels, privacy concerns). However, such chaotic worker behaviors in AFL impose many new open questions in algorithm design. In particular, it remains unclear whether one could develop convergent AFL training algorithms, and if yes, under what conditions and how fast the achievable convergence speed is. Toward this end, we propose two Anarchic Federated Averaging (AFA) algorithms with two-sided learning rates for both cross-device and cross-silo settings, which are named AFA-CD and AFA-CS, respectively. Somewhat surprisingly, we show that, under mild anarchic assumptions, both AFL algorithms achieve the best known convergence rate as the state-of-the-art algorithms for conventional FL. Moreover, they retain the highly desirable {\em linear speedup effect} with respect of both the number of workers and local steps in the new AFL paradigm. We validate the proposed algorithms with extensive experiments on real-world datasets.

preprint2022arXiv

CHARLES: Channel-Quality-Adaptive Over-the-Air Federated Learning over Wireless Networks

Over-the-air federated learning (OTA-FL) has emerged as an efficient mechanism that exploits the superposition property of the wireless medium and performs model aggregation for federated learning in the air. OTA-FL is naturally sensitive to wireless channel fading, which could significantly diminish its learning accuracy. To address this challenge, in this paper, we propose an OTA-FL algorithm called CHARLES (channel-quality-aware over-the-air local estimating and scaling). Our CHARLES algorithm performs channel state information (CSI) estimation and adaptive scaling to mitigate the impacts of wireless channel fading. We establish the theoretical convergence rate performance of CHARLES and analyze the impacts of CSI error on the convergence of CHARLES. We show that the adaptive channel inversion scaling scheme in CHARLES is robust under imperfect CSI scenarios. We also demonstrate through numerical results that CHARLES outperforms existing OTA-FL algorithms with heterogeneous data under imperfect CSI.

preprint2022arXiv

NET-FLEET: Achieving Linear Convergence Speedup for Fully Decentralized Federated Learning with Heterogeneous Data

Federated learning (FL) has received a surge of interest in recent years thanks to its benefits in data privacy protection, efficient communication, and parallel data processing. Also, with appropriate algorithmic designs, one could achieve the desirable linear speedup for convergence effect in FL. However, most existing works on FL are limited to systems with i.i.d. data and centralized parameter servers and results on decentralized FL with heterogeneous datasets remains limited. Moreover, whether or not the linear speedup for convergence is achievable under fully decentralized FL with data heterogeneity remains an open question. In this paper, we address these challenges by proposing a new algorithm, called NET-FLEET, for fully decentralized FL systems with data heterogeneity. The key idea of our algorithm is to enhance the local update scheme in FL (originally intended for communication efficiency) by incorporating a recursive gradient correction technique to handle heterogeneous datasets. We show that, under appropriate parameter settings, the proposed NET-FLEET algorithm achieves a linear speedup for convergence. We further conduct extensive numerical experiments to evaluate the performance of the proposed NET-FLEET algorithm and verify our theoretical findings.

preprint2022arXiv

Over-the-Air Federated Learning with Joint Adaptive Computation and Power Control

This paper considers over-the-air federated learning (OTA-FL). OTA-FL exploits the superposition property of the wireless medium, and performs model aggregation over the air for free. Thus, it can greatly reduce the communication cost incurred in communicating model updates from the edge devices. In order to fully utilize this advantage while providing comparable learning performance to conventional federated learning that presumes model aggregation via noiseless channels, we consider the joint design of transmission scaling and the number of local iterations at each round, given the power constraint at each edge device. We first characterize the training error due to such channel noise in OTA-FL by establishing a fundamental lower bound for general functions with Lipschitz-continuous gradients. Then, by introducing an adaptive transceiver power scaling scheme, we propose an over-the-air federated learning algorithm with joint adaptive computation and power control (ACPC-OTA-FL). We provide the convergence analysis for ACPC-OTA-FL in training with non-convex objective functions and heterogeneous data. We show that the convergence rate of ACPC-OTA-FL matches that of FL with noise-free communications.

preprint2022arXiv

Topmetal-M: a novel pixel sensor for compact tracking applications

The Topmetal-M is a large area pixel sensor (18 mm * 23 mm) prototype fabricated in a new 130 nm high-resistivity CMOS process in 2019. It contains 400 rows * 512 columns square pixels with the pitch of 40 μm. In Topmetal-M, a novel charge collection method combing the Monolithic Active Pixel Sensor (MAPS) and the Topmetal sensor has been proposed for the first time. Both the ionized charge deposited by the particle in the sensor and along the track over the sensor can be collected. The in-pixel circuit mainly consists of a low-noise charge sensitive amplifier to establish the signal for the energy reconstruction, and a discriminator with a Time-to-Amplitude Converter (TAC) for the Time of Arrival (TOA) measurement. With this mechanism, the trajectory, particle hit position, energy and arrival time of the particle can be measured. The analog signal from each pixel is accessible through time-shared multiplexing over the entire pixel array. This paper will discuss the design and preliminary test results of the Topmetal-M sensor.

preprint2020arXiv

Development of readout electronics a novel beam monitoring system for ion research facility accelerator

This article presents the readout electronics of a novel beam monitoring system for ion research facility accelerator. The readout electronics are divided into Front-end Card (FEC) and Readout Control Unit (RCU). FEC uses Topmetal II minus to processes the energy of the hitting particles and convert it into a voltage signal. The main function of RCU is to digitize the analog output signal of FEC and format the raw data. On the other hand, the RCU also processes the control commands from the host and distributes the commands according to the mapping. The readout electronic has been characterized and calibrated in the laboratory, and have been installed with the detector. Implementation and testing of readout electronics have been discussed.

preprint2020arXiv

Ill-posedness of Naiver-Stokes equations and critical Besov-Morrey spaces

The blow up phenomenon in the first step of the classical Picard's scheme was proved in this paper. For certain initial spaces, Bourgain-Pavlović and Yoneda proved the ill-posedness of the Navier-Stokes equations by showing the norm inflation in certain solution spaces. But Chemin and Gallagher said the space $\dot{B}^{-1,\infty}_{\infty}$ seems to be optimal for some solution spaces best chosen. In this paper, we consider more general initial spaces than Bourgain-Pavlović and Yoneda did and establish ill-posedness result independent of the choice of solution space. Our result is a complement of the previous ill-posedness results on Navier-Stokes equations.

preprint2014arXiv

Application of the DRS4 Chip for GHz Waveform Digitizing Circuit

At present, fast waveform digitizing circuit is more and more employed in modern physics experiments for processing the signals from an array detector. A new fast waveform sampling digitizing circuit developed by us is presented in this paper. Different with the traditional waveform digitizing circuit constructed with analog to digital converter(ADC) or time to digital converter(TDC), it is developed based on domino ring sampler(DRS), a switched capacitor array(SCA) chip. A DRS4 chip is used as a core device in our circuit, which has a fast sampling rate up to five gigabit samples per second (GSPS). The circuit has advantages of high resolution, low cost, low power dissipation, high channel density and small size. The quite satisfactory results are acquired by the preliminary performance test of this circuit board. Eight channels can be provided by one board, which has a 1-volt input dynamic range for each channel. The circuit linearity is better than 0.1%, the noise is less than 0.5 mV (root mean square, RMS), and its time resolution is about 50ps. The several boards can be cascaded to construct a multi-board system. The good performances make the circuit board to be used not only for physics experiments, but also for other applications.

Haibo Yang

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Learning Dynamics of Zeroth-Order Optimization: A Kernel Perspective

Federated Multi-Objective Learning

Anarchic Federated Learning

CHARLES: Channel-Quality-Adaptive Over-the-Air Federated Learning over Wireless Networks

NET-FLEET: Achieving Linear Convergence Speedup for Fully Decentralized Federated Learning with Heterogeneous Data

Over-the-Air Federated Learning with Joint Adaptive Computation and Power Control

Topmetal-M: a novel pixel sensor for compact tracking applications

Development of readout electronics a novel beam monitoring system for ion research facility accelerator

Ill-posedness of Naiver-Stokes equations and critical Besov-Morrey spaces

Application of the DRS4 Chip for GHz Waveform Digitizing Circuit