Researcher profile

Tao Yao

Tao Yao contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

MISA: Memory-Efficient LLMs Optimization with Module-wise Importance Sampling

The substantial memory demands of pre-training and fine-tuning large language models (LLMs) require memory-efficient optimization algorithms. One promising approach is layer-wise optimization, which treats each transformer block as a single layer and optimizes it sequentially, while freezing the other layers to save optimizer states and activations. Although effective, these methods ignore the varying importance of the modules within each layer, leading to suboptimal performance. Moreover, layer-wise sampling provides only limited memory savings, as at least one full layer must remain active during optimization. To overcome these limitations, we propose Module-wise Importance SAmpling (MISA), a novel method that divides each layer into smaller modules and assigns importance scores to each module. MISA uses a weighted random sampling mechanism to activate modules, provably reducing gradient variance compared to layer-wise sampling. Additionally, we establish an \(\mathcal{O}(1/\sqrt{K})\) convergence rate under non-convex and stochastic conditions, where $K$ is the total number of block updates, and provide a detailed memory analysis showcasing MISA's superiority over existing baseline methods. Experiments on diverse learning tasks validate the effectiveness of MISA. Source code is available at https://github.com/pkumelon/MISA.

preprint2020arXiv

Spectrum and rearrangement decays of tetraquark states with four different flavors

We have systematically investigated the mass spectrum and rearrangement decay properties of the exotic tetraquark states with four different flavors using a color-magnetic interaction model. Their masses are estimated by assuming that the $X(4140)$ is a $cs\bar{c}\bar{s}$ tetraquark state and their decay widths are obtained by assuming that the Hamiltonian for decay is a constant. According to the adopted method, we find that the most stable states are probably the isoscalar $bs\bar{u}\bar{d}$ and $cs\bar{u}\bar{d}$ with $J^P=0^+$ and $1^+$. The width for most unstable tetraquarks is about tens of MeVs, but that for unstable $cu\bar{s}\bar{d}$ and $cs\bar{u}\bar{d}$ can be around 100 MeV. For the $X(5568)$, our method cannot give consistent mass and width if it is a $bu\bar{s}\bar{d}$ tetraquark state. For the $I(J^P)=0(0^+),0(1^+)$ double-heavy $T_{bc}=bc\bar{u}\bar{d}$ states, their widths can be several MeVs.

preprint2019arXiv

Exclusive Production Ratio of Neutral over Charged Kaon Pair in $e^+e^-$ Annihilation Continuum via `Straton Model'

A completely relativistic quark model in the Bethe-Salpter framework is employed to calculate the exclusive production ratio of the neutral over charged Kaon pair in $e^+e^-$ annihilation continuum region for center of mass energies smaller than the $J/Ψ$ mass. The valence quark charge plays the key rôle. The cancellation of the diagrams for the same charge case (in $K_S + K_L$) and the non-cancellation of the diagrams for the different charge case (in $K^-+K^+$) lead to the ratio as $(m_s-m_d)^2/M_{Kaon}^2 \sim 1/10$.

preprint2010arXiv

Unitarity and Entropy Change in Exclusive Quark Combination Models

Entropy change in exclusive quark combination models is not an isolated problem. Contrary to adding and tuning some parameters to the relevant model(s) to fix the entropy, we show that it relates to the most general principles. Unitarity of the combination model is demonstrated to play the central rôle that guarantees the non-decrease of the entropy in the exclusive combination process.