Researcher profile

Yuchuan Wu

Yuchuan Wu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

Act-Adaptive Margin: Dynamically Calibrating Reward Models for Subjective Ambiguity

Currently, most reinforcement learning tasks focus on domains like mathematics and programming, where verification is relatively straightforward. However, in subjective tasks such as role-playing, alignment techniques struggle to make progress, primarily because subjective reward modeling using the Bradley-Terry model faces significant challenges when dealing with ambiguous preferences. To improve reward modeling in subjective tasks, this paper proposes AAM (\textbf{\underline{A}}ct-\textbf{\underline{A}}daptive \textbf{\underline{M}}argin), which enhances reward modeling by dynamically calibrating preference margins using the model's internal parameter knowledge. We design two versions of AAM that efficiently generate contextually-appropriate preference gaps without additional human annotation. This approach fundamentally improves how reward models handle subjective rewards by better integrating generative understanding with preference scoring. To validate AAM's effectiveness in subjective reward modeling, we conduct evaluations on RewardBench, JudgeBench, and challenging role-playing tasks. Results show that AAM significantly improves subjective reward modeling performance, enhancing Bradley-Terry reward models by 2.95\% in general tasks and 4.85\% in subjective role-playing tasks. Furthermore, reward models trained with AAM can help downstream alignment tasks achieve better results. Our test results show that applying rewards generated by AAM-Augmented RM to preference learning techniques (e.g., GRPO) achieves state-of-the-art results on CharacterEval and Charm. Code and dataset are available at https://github.com/calubkk/AAM.

preprint2022arXiv

Broadening and redward asymmetry of H$α$ line profiles observed by LAMOST during a stellar flare on an M-type star

Stellar flares are characterized by sudden enhancement of electromagnetic radiation in stellar atmospheres. So far much of our understanding of stellar flares comes from photometric observations, from which plasma motions in flare regions could not be detected. From the spectroscopic data of LAMOST DR7, we have found one stellar flare that is characterized by an impulsive increase followed by a gradual decrease in the H$α$ line intensity on an M4-type star, and the total energy radiated through H$α$ is estimated to be on the order of $10^{33}$ erg. The H$α$ line appears to have a Voigt profile during the flare, which is likely caused by Stark pressure broadening due to the dramatic increase of electron density and/or opacity broadening due to the occurrence of strong non-thermal heating. Obvious enhancement has been identified at the red wing of the H$α$ line profile after the impulsive increase of the H$α$ line intensity. The red wing enhancement corresponds to plasma moving away from the Earth at a velocity of 100$-$200 km s$^{-1}$. According to the current knowledge of solar flares, this red wing enhancement may originate from: (1) flare-driven coronal rain, (2) chromospheric condensation, or (3) a filament/prominence eruption that either with a non-radial backward propagation or with strong magnetic suppression. The total mass of the moving plasma is estimated to be on the order of $10^{15}$ kg.

preprint2022arXiv

Detection of Flare-induced Plasma Flows in the Corona of EV Lac with X-ray Spectroscopy

Stellar flares are characterized by sudden enhancement of electromagnetic radiation from the atmospheres of stars. Compared to their solar counterparts, our knowledge on the coronal plasma dynamics of stellar flares and their connection to coronal mass ejections (CMEs) remains very limited. With time-resolved high-resolution spectroscopic observations from the \textit{Chandra} X-ray observatory, we detected noticeable coronal plasma flows during several stellar flares on a nearby dMe star EV Lac. In the observed spectra of O~{\sc{viii}} (3 MK), Fe~{\sc{xvii}} (6 MK), Mg~{\sc{xii}} (10 MK), and Si~{\sc{xiv}} (16 MK) lines, these flare-induced upflows/downflows appear as significant Doppler shifts of several tens to \speed{130}, and the upflow velocity generally increases with temperature. Variable line ratios of the Si~{\sc{xiii}} triplet reveal that these plasma flows in most flares are accompanied by an increase of the coronal plasma density and temperature. We interpret these results as X-ray evidences for chromospheric evaporation on EV Lac. In two successive flares, the plasma flow pattern and a sharp increase of the measured coronal density are highly suggestive of explosive evaporation. The transition from redshifts to blueshifts in such an explosive evaporation occurs at a temperature of at least 10 MK, much higher than that observed in solar flares ($\sim$1 MK). However, in one flare the cool and warm upflows appear to be accompanied by a decreasing plasma density, which might be explained by a stellar filament/prominence eruption coupled to this flare. These results provide important clues to understand the coronal plasma dynamics during flares on M dwarfs.

preprint2022arXiv

Duplex Conversation: Towards Human-like Interaction in Spoken Dialogue Systems

In this paper, we present Duplex Conversation, a multi-turn, multimodal spoken dialogue system that enables telephone-based agents to interact with customers like a human. We use the concept of full-duplex in telecommunication to demonstrate what a human-like interactive experience should be and how to achieve smooth turn-taking through three subtasks: user state detection, backchannel selection, and barge-in detection. Besides, we propose semi-supervised learning with multimodal data augmentation to leverage unlabeled data to increase model generalization. Experimental results on three sub-tasks show that the proposed method achieves consistent improvements compared with baselines. We deploy the Duplex Conversation to Alibaba intelligent customer service and share lessons learned in production. Online A/B experiments show that the proposed system can significantly reduce response latency by 50%.

preprint2022arXiv

GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-Supervised Learning and Explicit Policy Injection

Pre-trained models have proved to be powerful in enhancing task-oriented dialog systems. However, current pre-training methods mainly focus on enhancing dialog understanding and generation tasks while neglecting the exploitation of dialog policy. In this paper, we propose GALAXY, a novel pre-trained dialog model that explicitly learns dialog policy from limited labeled dialogs and large-scale unlabeled dialog corpora via semi-supervised learning. Specifically, we introduce a dialog act prediction task for policy optimization during pre-training and employ a consistency regularization term to refine the learned representation with the help of unlabeled dialogs. We also implement a gating mechanism to weigh suitable unlabeled dialog samples. Empirical results show that GALAXY substantially improves the performance of task-oriented dialog systems, and achieves new state-of-the-art results on benchmark datasets: In-Car, MultiWOZ2.0 and MultiWOZ2.1, improving their end-to-end combined scores by 2.5, 5.3 and 5.5 points, respectively. We also show that GALAXY has a stronger few-shot ability than existing models under various low-resource settings.