Researcher profile

Peiyang Song

Peiyang Song contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

Interactive Evaluation Requires a Design Science

AI evaluation is undergoing a structural change. Large language models (LLMs) are increasingly deployed as systems that act over time through tools, environments, users, and other agents, while many evaluation practices still inherit assumptions from response-centered benchmarks (e.g., fixed inputs, isolated outputs, and outcome judgments that can be made from a single response). The field has begun to build interactive benchmarks, but the resulting landscape is fragmented: benchmarks differ in what interaction artifacts they admit, how trajectories are scored, and what claims their results support. This position paper argues that interactive evaluation should be treated as a principled evaluation paradigm, not merely a new family of agent benchmarks. Simply adopting previous evaluation paradigms does not suffice. We define evaluation as an autonomous mapping from evidence to judgments, and show that interactive evaluation changes both sides of this mapping: the evidence becomes interaction-generated trajectories, while the evaluation procedure must assess process, recoverability, coordination, robustness, and system-level performance. Building on this definition, we propose a two-axis taxonomy, derive design principles and reporting standards, examine representative scenarios, and analyze how longstanding evaluation challenges reappear at the trajectory level.

preprint2026arXiv

LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction

Mathematical reasoning remains a significant challenge for Large Language Models (LLMs) due to hallucinations. When combined with formal proof assistants like Lean, these hallucinations can be eliminated through rigorous verification, making theorem proving reliable. However, even with formal verification, LLMs still struggle with long proofs and complex mathematical formalizations. While Lean with LLMs offers valuable assistance with retrieving lemmas, generating tactics, or even complete proofs, it lacks a crucial capability: providing a sense of proof progress. This limitation particularly impacts the overall development efficiency in large formalization projects. We introduce LeanProgress, a method that predicts the progress in the proof. Training and evaluating our models made on a large corpus of Lean proofs from Lean Workbook Plus and Mathlib4 and how many steps remain to complete it, we employ data preprocessing and balancing techniques to handle the skewed distribution of proof lengths. Our experiments show that LeanProgress achieves an overall prediction accuracy of 75.8% in predicting the amount of progress and, hence, the remaining number of steps. When integrated into a best-first search framework using Reprover, our method shows a 3.8% improvement on Mathlib4 compared to baseline performances of 41.4%, particularly for longer proofs. These results demonstrate how proof progress prediction can enhance both automated and interactive theorem proving, enabling users to make more informed decisions about proof strategies. Our code is merged in this library here https://github.com/lean-dojo/LeanDojo-v2.

preprint2025arXiv

How and Why LLMs Generalize: A Fine-Grained Analysis of LLM Reasoning from Cognitive Behaviors to Low-Level Patterns

Large Language Models (LLMs) display strikingly different generalization behaviors: supervised fine-tuning (SFT) often narrows capability, whereas reinforcement-learning (RL) tuning tends to preserve it. The reasons behind this divergence remain unclear, as prior studies have largely relied on coarse accuracy metrics. We address this gap by introducing a novel benchmark that decomposes reasoning into atomic core skills such as calculation, fact retrieval, simulation, enumeration, and diagnostic, providing a concrete framework for addressing the fundamental question of what constitutes reasoning in LLMs. By isolating and measuring these core skills, the benchmark offers a more granular view of how specific cognitive abilities emerge, transfer, and sometimes collapse during post-training. Combined with analyses of low-level statistical patterns such as distributional divergence and parameter statistics, it enables a fine-grained study of how generalization evolves under SFT and RL across mathematical, scientific reasoning, and non-reasoning tasks. Our meta-probing framework tracks model behavior at different training stages and reveals that RL-tuned models maintain more stable behavioral profiles and resist collapse in reasoning skills, whereas SFT models exhibit sharper drift and overfit to surface patterns. This work provides new insights into the nature of reasoning in LLMs and points toward principles for designing training strategies that foster broad, robust generalization.

preprint2023arXiv

For Intelligent and Higher Spectrum Efficiency: A Variable Packing Ratio Transmission System Based on Faster-than-Nyquist and Deep Learning

With the rapid development of various services in wireless communications, spectrum resource has become increasingly valuable. Faster than Nyquist (FTN) signaling, proposed in the 1970s, is a promising paradigm for improving spectrum utilization. This paper proposes intelligent variable-packing-ratio (VPR)-based transmissions for high spectrum efficiency (SE) and security, respectively. Aided by deep learning (DL)-based estimation, the proposed scheme for high SE can achieve a higher capacity with negligible modification to existing communication paradigms (e.g., spectrum allocation or frame structure). Also, for VPR-based secure transmission, a dynamic generation scheme is proposed to produce randomly distributed positions to switch the packing ratio, which can effectively avoid detections and attacks. In addition, we propose a simplified DL-based packing ratio estimation for both of these two scenarios so that the receiver can estimate the packing ratio without any in-band or out-band control messages. Simulation results show that the proposed simplified estimation achieves nearly the same accuracy and convergence speed as the original multi-branch fully-connected structure with a complexity reduction of 20 folds. Finally, we derive the closed-form SE of the proposed VPR transmission under different channels. The numerical results validate the correctness of the derivation and demonstrate the SE gains of the VPR scheme beyond conventional Nyquist transmission.

preprint2020arXiv

Receiver Design for Faster-than-Nyquist Signaling: Deep-learning-based Architectures

Faster-than-Nyquist (FTN) is a promising paradigm to improve bandwidth utilization at the expense of additional intersymbol interference (ISI). In this paper, we apply state-of-the-art deep learning (DL) technology into receiver design for FTN signaling and propose two DL-based new architectures. Firstly, we propose an FTN signal detection based on DL and connect it with the successive interference cancellation (SIC) to replace traditional detection algorithms. Simulation results show that this architecture can achieve near-optimal performance in both uncoded and coded scenarios. Additionally, we propose a DL-based joint signal detection and decoding for FTN signaling to replace the complete baseband part in traditional FTN receivers. The performance of this new architecture has also been illustrated by simulation results. Finally, both the proposed DL-based receiver architecture has the robustness to signal to noise ratio (SNR). In a nutshell, DL has been proved to be a powerful tool for the FTN receiver design.