Researcher profile

Zhengyu Chen

Zhengyu Chen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness

Recent advances in agentic harness with orchestration frameworks that coordinate multiple agents with memory, skills, and tool use have achieved remarkable success in complex reasoning tasks. However, the underlying mechanism that truly drives performance remains obscured behind intricate system designs. In this paper, we propose HeavySkill, a perspective that views heavy thinking not only as a minimal execution unit in orchestration harness but also as an inner skill internalized within the model's parameters that drives the orchestrator to solve complex tasks. We identify this skill as a two-stage pipeline, i.e., parallel reasoning then summarization, which can operate beneath any agentic harness. We present a systematic empirical study of HeavySkill across diverse domains. Our results show that this inner skill consistently outperforms traditional Best-of-N (BoN) strategies; notably, stronger LLMs can even approach Pass@N performance. Crucially, we demonstrate that the depth and width of heavy thinking, as a learnable skill, can be further scaled via reinforcement learning, offering a promising path toward self-evolving LLMs that internalize complex reasoning without relying on brittle orchestration layers.

preprint2026arXiv

PruneTIR: Inference-Time Tool Call Pruning for Effective yet Efficient Tool-Integrated Reasoning

Tool-integrated reasoning (TIR) enables large language models (LLMs) to enhance their capabilities by interacting with external tools, such as code interpreters (CI). Most recent studies focus on exploring various methods to equip LLMs with the ability to use tools. However, how to further boost the reasoning ability of already tool-capable LLMs at inference time remains underexplored. Improving reasoning at inference time requires no additional training and can help LLMs better leverage tools to solve problems. We observe that, during tool-capable LLM inference, both the number and the proportion of erroneous tool calls are negatively correlated with answer correctness. Moreover, erroneous tool calls are typically resolved successfully within a few subsequent turns. If not, LLMs often struggle to resolve such errors even with many additional turns. Building on the above observations, we propose PruneTIR, a rather effective yet efficient framework that enhances the tool-integrated reasoning at inference time. During LLM inference, PruneTIR prunes trajectories, resamples tool calls, and suspends tool usage through three components: Success-Triggered Pruning, Stuck-Triggered Pruning and Resampling, and Retry-Triggered Tool Suspension. These three components enable PruneTIR to mitigate the negative impact of erroneous tool calls and prevent LLMs from getting stuck in repeated failed resolution attempts, thereby improving overall LLM performance. Extensive experimental results demonstrate the effectiveness of PruneTIR, which significantly improves Pass@1 and efficiency while reducing the working context length for tool-capable LLMs.

preprint2026arXiv

Reinforcement Learning for Tool-Integrated Interleaved Thinking towards Cross-Domain Generalization

Recent advances in large language models (LLMs) have demonstrated remarkable capabilities in reasoning and tool utilization. However, the generalization of tool-augmented reinforcement learning (RL) across diverse domains remains a significant challenge. Standard paradigms often treat tool usage as a linear or isolated event, which becomes brittle when transferring skills from restricted domains (e.g., mathematics) to open-ended tasks. In this work, we investigate the cross-domain generalization of an LLM agent trained exclusively on mathematical problem-solving. To facilitate robust skill transfer, we propose a {\textbf{R}einforcement Learning for \textbf{I}nterleaved \textbf{T}ool \textbf{E}xecution (RITE)}. Unlike traditional methods, RITE enforces a continuous ``Plan-Action-Reflection'' cycle, allowing the model to ground its reasoning in intermediate tool outputs and self-correct during long-horizon tasks. To effectively train this complex interleaved policy, we introduce {Dr. GRPO}, a robust optimization objective that utilizes token-level loss aggregation with importance sampling to mitigate reward sparsity and high-variance credit assignment. Furthermore, we employ a dual-component reward system and dynamic curriculum via online rollout filtering to ensure structural integrity and sample efficiency. Extensive experiments reveal that our approach, despite being trained solely on math tasks, achieves state-of-the-art performance across diverse reasoning domains, demonstrating high token efficiency and strong generalization capabilities.

preprint2026arXiv

Rethinking Supply Chain Planning: A Generative Paradigm

Supply chain planning is the critical process of anticipating future demand and coordinating operational activities across the logistics network. However, within the context of contemporary e-commerce, traditional planning paradigms, typically characterized by fragmented processes and static optimization, prove inadequate in addressing dynamic demand, organizational silos, and the complexity of multi-stage coordination. To address these challenges, this study proposes a fundamental rethinking of supply chain planning, redefining it not merely as a computational task, but as an interactive, integrated, and automated cognitive process. This new paradigm emphasizes the organic unification of human strategic intent with adaptive execution, shifting the focus from rigid control to continuous, intelligent orchestration. To operationalize this conceptual shift, we introduce a Generative AI-powered agentic framework. Functioning as an intelligent cognitive interface, this framework bridges the gap between unstructured business contexts and structured analytical workflows, enabling the system to comprehend complex semantics and coordinate decisions across organizational boundaries. We demonstrate the empirical validity of this approach within JD.com's large-scale operations. The deployment confirms the efficacy of this cognitive paradigm, yielding an approximate 22% improvement in planning accuracy and a 2% increase in in-stock rates, thereby validating the transformation of planning into an adaptive, knowledge-driven capability.

preprint2022arXiv

Kinetic theory of overpopulated gluon systems with inelastic processes

In this work, the role of inelastic processes in the formation of a transient Bose-Einstein condensation (BEC) is investigated based on kinetic theory. We calculate the condensation rate for an overpopulated gluon system which is assumed to be in thermal equilibrium and with the presence of a BEC. The matrix elements of the inelastic processes are chosen as the isotropic one and the gluons are considered to have a finite mass. Our calculations indicate that the inelastic processes can hinder the formation of a BEC since the negatively infinite net condensation rate can destroy any BEC instantly.

preprint2022arXiv

Knowledge Distillation of Transformer-based Language Models Revisited

In the past few years, transformer-based pre-trained language models have achieved astounding success in both industry and academia. However, the large model size and high run-time latency are serious impediments to applying them in practice, especially on mobile phones and Internet of Things (IoT) devices. To compress the model, considerable literature has grown up around the theme of knowledge distillation (KD) recently. Nevertheless, how KD works in transformer-based models is still unclear. We tease apart the components of KD and propose a unified KD framework. Through the framework, systematic and extensive experiments that spent over 23,000 GPU hours render a comprehensive analysis from the perspectives of knowledge types, matching strategies, width-depth trade-off, initialization, model size, etc. Our empirical results shed light on the distillation in the pre-train language model and with relative significant improvement over previous state-of-the-arts(SOTA). Finally, we provide a best-practice guideline for the KD in transformer-based models.

preprint2019arXiv

Calculation of anisotropic transport coefficients for an ultrarelativistic Boltzmann gas in a magnetic field within a kinetic approach

According to the Kubo formulas we employ the (3+1)-d parton cascade, Boltzmann approach of multiparton scatterings (BAMPS), to calculate the anisotropic transport coefficients (shear viscosity and electric conductivity) for an ultrarelativistic Boltzmann gas in the presence of a magnetic field. The results are compared with those recently obtained by using the Grad's approximation. We find good agreements between both results, which confirms the general use of the derived Kubo formulas for calculating the anisotropic transport coefficients of quark-gluon plasma in a magnetic field.