Researcher profile

Fangzhou Wu

Fangzhou Wu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

Capturing LLM Capabilities via Evidence-Calibrated Query Clustering

Query clustering organizes queries into groups that reflect shared latent capability demands, enabling capability-aware LLM evaluation. Existing clustering methods, which primarily rely on semantic taxonomies or embeddings, often fail to capture such latent capability requirements due to a misalignment between surface-level semantics and actual model performance. We propose ECC, an algorithm that calibrates prior semantic embeddings using limited posterior model comparisons to bridge the gap between surface-level semantics and latent capability requirements. ECC characterizes each cluster through a capability profile parameterized by a Bradley-Terry model and uses trainable mixture weights to accommodate queries with mixed capability demands, jointly learning a flexible, capability-aware clustering structure that supports query-specific inference of LLM capabilities. Extensive quantitative and qualitative evaluations demonstrate that ECC significantly improves LLM capability ranking quality, outperforming human-labeled and embedding-based baselines by an average of 17.64 and 18.02 percentage points, respectively, and proves effective in downstream tasks such as query routing.

preprint2026arXiv

DynMuon: A Dynamic Spectral Shaping View of Muon

In recent years, Muon has emerged as the dominant method for training large language models, and transformers more broadly. The essential difference, when compared to standard gradient descent methods, is to replace the usual update matrix $M=UΣV^\top$ with its polar factor $UV^\top$. In this work, we consider a class of Muon-like updates, where we replace the update $M$ with $UΣ^p V^\top$ for some parameter $p$. We call this a "spectral-shaping" operation, and develop a theory of how to pick $p$ which depends on (a) local curvature of the loss function, (b) noise stemming from stochastic gradients and label noise, and (c) training stage. Our theory and experimentation reveal a previously overlooked behavior: positive $p$ helps early by emphasizing high-curvature directions and accelerating signal contraction, while mildly negative $p$ helps later by reallocating update strength toward low-curvature directions that still contain useful training signals. Building on the insight, we propose DynMuon, an efficient dynamic spectral shaping method that schedules $p$ from positive to mildly negative over training. Extensive experiments across model sizes, architectures, and training settings show that DynMuon consistently achieves lower validation loss than Muon, while requiring 10.6-26.5% fewer steps to reach the same target loss.

preprint2022arXiv

Reliability and Latency Analysis of Sliding Network Coding With Re-Transmission

Future networks are expected to support various ultra-reliable low-latency communications via wireless links. To avoid the loss of packets and keep the low latency, sliding network coding (SNC) is an emerging technology by generating redundant packets that are the linear combination of the original data packets from the current block and some previous blocks. However, how to take the advantage of re-transmission for SNC is still an open problem since higher reliability could be achieved at the expense of large latency caused by round-trip time (RTT). To deal with this issue, in this paper, we consider the idea of adjusting the transmission phase and the number of the redundant packets for SNC with re-transmission. Specifically, If RTT is large, most of the redundant packets are sent at the first transmission, otherwise, re-transmission will be used. We first derive a concise and tight lower bound of the block error probability of SNC without re-transmission. Then, based on the bound, the theoretical expressions of the proposed re-transmission schemes are derived regarding the block error probability, the average code length, and the average packet latency. Results show that the proposed SNC with re-transmission improves block error probability and keeps the low latency.

preprint2022arXiv

Towards Efficient Data-Centric Robust Machine Learning with Noise-based Augmentation

The data-centric machine learning aims to find effective ways to build appropriate datasets which can improve the performance of AI models. In this paper, we mainly focus on designing an efficient data-centric scheme to improve robustness for models towards unforeseen malicious inputs in the black-box test settings. Specifically, we introduce a noised-based data augmentation method which is composed of Gaussian Noise, Salt-and-Pepper noise, and the PGD adversarial perturbations. The proposed method is built on lightweight algorithms and proved highly effective based on comprehensive evaluations, showing good efficiency on computation cost and robustness enhancement. In addition, we share our insights about the data-centric robust machine learning gained from our experiments.