Source author record

Zhenpeng Su

Zhenpeng Su appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language astro-ph.EP astro-ph.SR physics.space-ph

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment

We present GoLongRL, a fully open-source, capability-oriented post-training recipe for long-context reinforcement learning with verifiable rewards (RLVR). Existing long-context RL methods often treat data construction as a matter of designing increasingly complex retrieval paths, leading to homogeneous task coverage and reward formulations that inadequately reflect practical long-context requirements. Our work offers two contributions. (1) Capability-oriented data construction with full open release. We openly release a dataset of 23K RLVR samples, the complete construction pipeline, and all training code. Guided by a taxonomy of long-context capabilities, the dataset spans 9 task types, each paired with its natural evaluation metric. It comprises curated open-source samples from established corpora and synthetic samples whose QA pairs are generated from real source documents such as books, academic papers, and multi-turn dialogues. Under the same vanilla GRPO setup, our dataset alone outperforms the closed-source QwenLong-L1.5 dataset. Moreover, our Qwen3-30B-A3B model trained on this data delivers long-context performance comparable to DeepSeek-R1-0528 and Qwen3-235B-A22B-Thinking-2507, suggesting that broader coverage and greater reward diversity substantially benefit long-context capability improvement. (2) TMN-Reweight for heterogeneous multitask optimization. To address optimization challenges from heterogeneous rewards, we propose TMN-Reweight, which combines task-level mean normalization for cross-task reward scale alignment with difficulty-adaptive weighting for more reliable advantage estimation. TMN-Reweight further improves average performance over vanilla GRPO, with general capabilities preserved or improved across reported evaluations.

preprint2026arXiv

MiLe Loss: a New Entropy-Weighed Loss for Mitigating the Bias of Learning Difficulties in Large Language Models

Generative language models are usually pretrained on large text corpus via predicting the next token (i.e., sub-word/word/phrase) given the previous ones. Recent works have demonstrated the impressive performance of large generative language models on downstream tasks. However, existing generative language models generally neglect an inherent challenge in text corpus during training, i.e., the imbalance between frequent tokens and infrequent ones. It can lead a language model to be dominated by common and easy-to-learn tokens, thereby overlooking the infrequent and difficult-to-learn ones. To alleviate that, we propose a MiLe Loss function for mitigating the bias of learning difficulties with tokens. During training, it can dynamically assess the learning difficulty of a to-be-learned token, according to the information entropy of the corresponding predicted probability distribution over the vocabulary. Then it scales the training loss adaptively, trying to lead the model to focus more on the difficult-to-learn tokens. On the Pile dataset, we train generative language models at different scales of 468M, 1.2B, and 6.7B parameters. Experiments reveal that models incorporating the proposed MiLe Loss can gain consistent performance improvement on downstream benchmarks.

preprint2023arXiv

The Mars Orbiter Magnetometer of Tianwen-1: In-flight Performance and First Science Results

Mars Orbiter MAGnetometer (MOMAG) is a scientifc instrument onboard the orbiter of China's first mission for Mars -- Tianwen-1. It started to routinely measure the magnetic field from the solar wind to magnetic pile-up region surrounding Mars since November 13, 2021. Here we present its in-flight performance and first science results based on the first one and a half months' data. By comparing with the magnetic field data in the solar wind from the Mars Atmosphere and Volatile EvolutioN (MAVEN), the magnetic field by MOMAG is at the same level in magnitude, and the same magnetic structures with the similar variations in three components could be found in MOMAG data. In the first one and a half months, we recognize 158 clear bow shock (BS) crossings from MOMAG data, whose locations statistically match well with the modeled average BS. We also identify 5 pairs of simultaneous BS crossings of the Tianwen-1's orbiter and MAVEN. These BS crossings confirm the global shape of modeled BS as well as the south-north asymmetry of the Martian BS. Two presented cases in this paper suggest that the BS is probably more dynamic at flank than near the nose. So far, MOMAG performs well, and provides accurate magnetic field vectors. MOMAG is continuously scanning the magnetic field surrounding Mars. These measurements complemented by observations from MAVEN will undoubtedly advance our understanding of the plasma environment of Mars.

preprint2012arXiv

Slow Magneto-acoustic Waves Observed above Quiet-Sun Region in a Dark Cavity

Waves play a crucial role in diagnosing the plasma properties of various structures in the solar corona and coronal heating. Slow magneto-acoustic (MA) waves are one of the important magnetohydrodynamic waves. In past decades, numerous slow MA waves were detected above the active regions and coronal holes, but rarely found elsewhere. Here, we investigate a `tornado'-like structure consisting of quasi-periodic streaks within a dark cavity at about 40--110 Mm above the quiet-Sun region on 2011 September 25. Our analysis reveals that these streaks are actually slow MA wave trains. The properties of these wave trains, including the phase speed, compression ratio, kinetic energy density, etc., are similar to those of the reported slow MA waves, except that the period of these waves is about 50 s, much shorter than the typical reported values (3--5 minutes).