Researcher profile

Huimin Xu

Huimin Xu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

Understanding and Preventing Entropy Collapse in RLVR with On-Policy Entropy Flow Optimization

Reinforcement learning with verifiable rewards (RLVR) has become an effective paradigm for improving the reasoning ability of large language models. However, widely used RLVR algorithms, such as GRPO, often suffer from entropy collapse, leading to premature determinism and unstable optimization. Existing remedies, including entropy regularization and ratio-based clipping heuristics, either control entropy in a coarse-grained manner or rely on approximate on-policy training. In this paper, we revisit entropy collapse from a token-level entropy flow perspective. Our analysis reveals that entropy-decreasing tokens consistently outweigh entropy-increasing ones, resulting in a severely imbalanced entropy flow. This perspective provides a unified explanation of entropy collapse in existing RLVR algorithms and highlights the importance of balancing entropy dynamics. Motivated by this analysis, we propose On-Policy Entropy Flow Optimization (OPEFO), an adaptive entropy flow balancing mechanism that rescales entropy-increasing and entropy-decreasing updates according to their contributions to entropy change, while remaining strict on-policy. Experiments on six mathematical reasoning benchmarks demonstrate that OPEFO improves training stability and final performance. We will release the code and models upon publication.

preprint2022arXiv

Team Power Dynamics and Team Impact: New Perspectives on Scientific Collaboration using Career Age as a Proxy for Team Power

Power dynamics influence every aspect of scientific collaboration. Team power dynamics can be measured by team power level and team power hierarchy. Team power level is conceptualized as the average level of the possession of resources, expertise, or decision-making authorities of a team. Team power hierarchy represents the vertical differences of the possessions of resources in a team. In Science of Science, few studies have looked at scientific collaboration from the perspective of team power dynamics. This research examines how team power dynamics affect team impact to fill the research gap. In this research, all co-authors of one publication are treated as one team. Team power level and team power hierarchy of one team are measured by the mean and Gini index of career age of co-authors in this team. Team impact is quantified by citations of a paper authored by this team. By analyzing over 7.7 million teams from Science (e.g., Computer Science, Physics), Social Sciences (e.g., Sociology, Library & Information Science), and Arts & Humanities (e.g., Art), we find that flat team structure is associated with higher team impact, especially when teams have high team power level. These findings have been repeated in all five disciplines except Art, and are consistent in various types of teams from Computer Science including teams from industry or academia, teams with different gender groups, teams with geographical contrast, and teams with distinct size.

preprint2020arXiv

Relational Reflection Entity Alignment

Entity alignment aims to identify equivalent entity pairs from different Knowledge Graphs (KGs), which is essential in integrating multi-source KGs. Recently, with the introduction of GNNs into entity alignment, the architectures of recent models have become more and more complicated. We even find two counter-intuitive phenomena within these methods: (1) The standard linear transformation in GNNs is not working well. (2) Many advanced KG embedding models designed for link prediction task perform poorly in entity alignment. In this paper, we abstract existing entity alignment methods into a unified framework, Shape-Builder & Alignment, which not only successfully explains the above phenomena but also derives two key criteria for an ideal transformation operation. Furthermore, we propose a novel GNNs-based method, Relational Reflection Entity Alignment (RREA). RREA leverages Relational Reflection Transformation to obtain relation specific embeddings for each entity in a more efficient way. The experimental results on real-world datasets show that our model significantly outperforms the state-of-the-art methods, exceeding by 5.8%-10.9% on Hits@1.