Researcher profile

Zhibin Zhang

Zhibin Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2026arXiv

EGAD: Entropy-Guided Adaptive Distillation for Token-Level Knowledge Transfer

Large language models (LLMs) have achieved remarkable performance across diverse domains, yet their enormous computational and memory requirements hinder deployment in resource-constrained environments. Knowledge distillation offers a promising solution by transferring knowledge from a large teacher model to a smaller student model. However, existing distillation methods typically treat all tokens equally, ignoring the fact that different tokens contribute unequally to model decisions. This can lead to inefficient knowledge transfer and reduced learning effectiveness. To address this limitation, we propose an entropy-based adaptive distillation strategy that dynamically adjusts the training process at the token level. Our method leverages the teacher's output entropy to guide three aspects of distillation. Specifically, we introduce a token-level curriculum by dynamically shifting focus from low- to high-entropy tokens during training. We further adjust the distillation temperature based on token entropy to better capture teacher confidence patterns. Moreover, we employ a dual-branch architecture for efficient logits-only distillation on easy tokens and deeper feature-based distillation on difficult tokens. Extensive experiments validate the soundness and effectiveness of our method.

preprint2026arXiv

MI-PRUN: Optimize Large Language Model Pruning via Mutual Information

Large Language Models (LLMs) have become indispensable across various domains, but this comes at the cost of substantial computational and memory resources. Model pruning addresses this by removing redundant components from models. In particular, block pruning can achieve significant compression and inference acceleration. However, existing block pruning methods are often unstable and struggle to attain globally optimal solutions. In this paper, we propose a mutual information based pruning method MI-PRUN for LLMs. Specifically, we leverages mutual information to identify redundant blocks by evaluating transitions in hidden states. Additionally, we incorporate the Data Processing Inequality (DPI) to reveal the relationship between the importance of entire contiguous blocks and that of individual blocks. Moreover, we develop the Fast-Block-Select algorithm, which iteratively updates block combinations to achieve a globally optimal solution while significantly improving the efficiency. Extensive experiments across various models and datasets demonstrate the stability and effectiveness of our method.

preprint2022arXiv

The Statistical Similarity of Repeating and Non-Repeating Fast Radio Bursts

In this paper, we present a sample of 21 repeating fast radio bursts (FRBs) detected by different radio instruments before September 2021. Using the Anderson--Darling test, we compared the distributions of extra-Galactic dispersion measure ($DM_{\rm E}$) of non-repeating FRBs, repeating FRBs and all FRBs. It was found that the $ DM_{\rm E}$ values of three sub-samples are log-normally distributed. The $DM_{\rm E}$ of repeaters and non-repeaters were drawn from a different distribution on basis of the Mann--Whitney--Wilcoxon test. In addition, assuming that the non-repeating FRBs identified currently may be potentially repeators, i.e., the repeating FRBs to be universal and representative, one can utilize the averaged fluence of repeating FRBs as an indication from which to derive an apparent intensity distribution function (IDF) with a power-law index of $a_1=$ $1.10\pm 0.14$ ($a_2=$ $1.01\pm 0.16$, the observed fluence as a statistical variant), which is in good agreement with the previous IDF of 16 non-repeating FRBs found by Li et al. Based on the above statistics of repeating and non-repeating FRBs, we propose that both types of FRBs may have different cosmological origins, spatial distributions and circum-burst environments. Interestingly, the differential luminosity distributions of repeating and non-repeating FRBs can also be well described by a broken power-law function with the same power-law index of $-$1.4.