Researcher profile

Zhenpeng Su

Zhenpeng Su contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2026arXiv

GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment

We present GoLongRL, a fully open-source, capability-oriented post-training recipe for long-context reinforcement learning with verifiable rewards (RLVR). Existing long-context RL methods often treat data construction as a matter of designing increasingly complex retrieval paths, leading to homogeneous task coverage and reward formulations that inadequately reflect practical long-context requirements. Our work offers two contributions. (1) Capability-oriented data construction with full open release. We openly release a dataset of 23K RLVR samples, the complete construction pipeline, and all training code. Guided by a taxonomy of long-context capabilities, the dataset spans 9 task types, each paired with its natural evaluation metric. It comprises curated open-source samples from established corpora and synthetic samples whose QA pairs are generated from real source documents such as books, academic papers, and multi-turn dialogues. Under the same vanilla GRPO setup, our dataset alone outperforms the closed-source QwenLong-L1.5 dataset. Moreover, our Qwen3-30B-A3B model trained on this data delivers long-context performance comparable to DeepSeek-R1-0528 and Qwen3-235B-A22B-Thinking-2507, suggesting that broader coverage and greater reward diversity substantially benefit long-context capability improvement. (2) TMN-Reweight for heterogeneous multitask optimization. To address optimization challenges from heterogeneous rewards, we propose TMN-Reweight, which combines task-level mean normalization for cross-task reward scale alignment with difficulty-adaptive weighting for more reliable advantage estimation. TMN-Reweight further improves average performance over vanilla GRPO, with general capabilities preserved or improved across reported evaluations.

preprint2026arXiv

MiLe Loss: a New Entropy-Weighed Loss for Mitigating the Bias of Learning Difficulties in Large Language Models

Generative language models are usually pretrained on large text corpus via predicting the next token (i.e., sub-word/word/phrase) given the previous ones. Recent works have demonstrated the impressive performance of large generative language models on downstream tasks. However, existing generative language models generally neglect an inherent challenge in text corpus during training, i.e., the imbalance between frequent tokens and infrequent ones. It can lead a language model to be dominated by common and easy-to-learn tokens, thereby overlooking the infrequent and difficult-to-learn ones. To alleviate that, we propose a MiLe Loss function for mitigating the bias of learning difficulties with tokens. During training, it can dynamically assess the learning difficulty of a to-be-learned token, according to the information entropy of the corresponding predicted probability distribution over the vocabulary. Then it scales the training loss adaptively, trying to lead the model to focus more on the difficult-to-learn tokens. On the Pile dataset, we train generative language models at different scales of 468M, 1.2B, and 6.7B parameters. Experiments reveal that models incorporating the proposed MiLe Loss can gain consistent performance improvement on downstream benchmarks.

preprint2023arXiv

The Mars Orbiter Magnetometer of Tianwen-1: In-flight Performance and First Science Results

Mars Orbiter MAGnetometer (MOMAG) is a scientifc instrument onboard the orbiter of China's first mission for Mars -- Tianwen-1. It started to routinely measure the magnetic field from the solar wind to magnetic pile-up region surrounding Mars since November 13, 2021. Here we present its in-flight performance and first science results based on the first one and a half months' data. By comparing with the magnetic field data in the solar wind from the Mars Atmosphere and Volatile EvolutioN (MAVEN), the magnetic field by MOMAG is at the same level in magnitude, and the same magnetic structures with the similar variations in three components could be found in MOMAG data. In the first one and a half months, we recognize 158 clear bow shock (BS) crossings from MOMAG data, whose locations statistically match well with the modeled average BS. We also identify 5 pairs of simultaneous BS crossings of the Tianwen-1's orbiter and MAVEN. These BS crossings confirm the global shape of modeled BS as well as the south-north asymmetry of the Martian BS. Two presented cases in this paper suggest that the BS is probably more dynamic at flank than near the nose. So far, MOMAG performs well, and provides accurate magnetic field vectors. MOMAG is continuously scanning the magnetic field surrounding Mars. These measurements complemented by observations from MAVEN will undoubtedly advance our understanding of the plasma environment of Mars.