Researcher profile

Zheng Chu

Zheng Chu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

Learning from Mistakes: Negative Reasoning Samples Enhance Out-of-Domain Generalization

Supervised fine-tuning (SFT) on chain-of-thought (CoT) trajectories demonstrations is a common approach for enabling reasoning in large language models. Standard practices typically only retain trajectories with correct final answers (positives) while ignoring the rest (negatives). We argue that this paradigm discards substantial supervision and exacerbates overfitting, limiting out-of-domain (OOD) generalization. Specifically, we surprisingly find that incorporating negative trajectories into SFT yields substantial OOD generalization gains over positive-only training, as these trajectories often retain valid intermediate reasoning despite incorrect final answers. To understand this effect in depth, we systematically analyze data, training dynamics, and inference behavior, identifying 22 recurring patterns in negative chains that serve a dual role: they moderate loss descent to mitigate overfitting during training and boost policy entropy by 35.67% during inference to facilitate exploration. Motivated by these observations, we further propose Gain-based LOss Weighting (GLOW), an adaptive, sample-aware scheme that exploits such distinctive training dynamics by rescaling per-sample loss based on inter-epoch progress. Empirically, GLOW efficiently leverages unfiltered trajectories, yielding a 5.51% OOD gain over positive-only SFT on Qwen2.5-7B and boosting MMLU from 72.82% to 76.47% as an RL initialization.

preprint2022arXiv

HIT at SemEval-2022 Task 2: Pre-trained Language Model for Idioms Detection

The same multi-word expressions may have different meanings in different sentences. They can be mainly divided into two categories, which are literal meaning and idiomatic meaning. Non-contextual-based methods perform poorly on this problem, and we need contextual embedding to understand the idiomatic meaning of multi-word expressions correctly. We use a pre-trained language model, which can provide a context-aware sentence embedding, to detect whether multi-word expression in the sentence is idiomatic usage.

preprint2022arXiv

Machine Learning-based Beamforming Design for Millimeter Wave IRS Communications with Discrete Phase Shifters

In this paper, we investigate an intelligent reflecting surface (IRS)-assisted millimeter-wave multiple-input single-output downlink wireless communication system. By jointly calculating the active beamforming at the base station and the passive beamforming at the IRS, we aim to minimize the transmit power under the constraint of each user' signal-to-interference-plus-noise ratio. To solve this problem, we propose a low-complexity machine learning-based cross-entropy (CE) algorithm to alternately optimize the active beamforming and the passive beamforming. Specifically, in the alternative iteration process, the zero-forcing (ZF) method and CE algorithm are applied to acquire the active beamforming and the passive beamforming, respectively. The CE algorithm starts with random sampling, by the idea of distribution focusing, namely shifting the distribution towards a desired one by minimizing CE, and a near optimal reflection coefficients with adequately high probability can be obtained. In addition, we extend the original one-bit phase shift at the IRS to the common case with high-resolution phase shift to enhance the effectiveness of the algorithms. Simulation results verify that the proposed algorithm can obtain a near optimal solution with lower computational complexity.

preprint2022arXiv

The Far-/Near-Field Beam Squint and Solutions for THz Intelligent Reflecting Surface Communications

Terahertz (THz) and intelligent reflecting surface (IRS) have been regarded as two promising technologies to improve the capacity and coverage for future 6G networks. Generally, IRS is usually equipped with large-scale elements when implemented at THz frequency. In this case, the near-field model and beam squint should be considered. Therefore, in this paper, we investigate the far-field and near-field beam squint problems in THz IRS communications for the first time. The far-field and near-field channel models are constructed based on the different electromagnetic radiation characteristics. Next, we first analyze the far-field beam squint and its effect for the beam gain based on the cascaded base station (BS)-IRS-user channel model, and then the near-field case is studied. To overcome the far-field and near-field beam squint effects, we propose to apply delay adjustable metasurface (DAM) to IRS, and develop a scheme of optimizing the reflecting phase shifts and time delays of IRS elements, which effectively eliminates the beam gain loss caused by beam squint. Finally, simulations are conducted to demonstrate the effectiveness of our proposed schemes in combating the near and far field beam squint.

preprint2020arXiv

Robust Design for Intelligent Reflecting Surface Assisted MIMO-OFDMA Terahertz Communications

Recently, terahertz (THz) communication has drawn considerable attention as one of the promising technologies for the future wireless communications owning to its ultra-wide bandwidth. Nonetheless, one major obstacle that prevents the actual deployment of THz lies in its inherent huge attenuation. Intelligent reflecting surface (IRS) and multiple-input multiple-output (MIMO) represent two effective solutions for compensating the large pathloss in THz systems. In this paper, we consider an IRS-aided multi-user THz MIMO system with orthogonal frequency division multiple access, where the sparse radio frequency chain antenna structure is adopted for reducing the power consumption. The objective is to maximize the weighted sum rate via jointly optimizing the hybrid analog/digital beamforming at the base station and reflection matrix at the IRS. {Since the analog beamforming and reflection matrix need to cater all users and subcarriers, it is difficult to directly solve the formulated problem, and thus, an alternatively iterative optimization algorithm is proposed.} Specifically, the analog beamforming is designed by solving a MIMO capacity maximization problem, while the digital beamforming and reflection matrix optimization are both tackled using semidefinite relaxation technique. Considering that obtaining perfect channel state information (CSI) is a challenging task in IRS-based systems, we further explore the case with the imperfect CSI for the channels from the IRS to users. Under this setup, we propose a robust beamforming and reflection matrix design scheme for the originally formulated non-convex optimization problem. Finally, simulation results are presented to demonstrate the effectiveness of the proposed algorithms.