Researcher profile

Zhou Liu

Zhou Liu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

Reinforcement Learning with Semantic Rewards Enables Low-Resource Language Expansion without Alignment Tax

Extending large language models (LLMs) to low-resource languages often incurs an "alignment tax": improvements in the target language come at the cost of catastrophic forgetting in general capabilities. We argue that this trade-off arises from the rigidity of supervised fine-tuning (SFT), which enforces token-level surface imitation on narrow and biased data distributions. To address this limitation, we propose a semantic-space alignment paradigm powered by Group Relative Policy Optimization (GRPO), where the model is optimized using embedding-level semantic rewards rather than likelihood maximization. This objective encourages meaning preservation through flexible realizations, enabling controlled updates that reduce destructive interference with pretrained knowledge. We evaluate our approach on Tibetan-Chinese machine translation and Tibetan headline generation. Experiments show that our method acquires low-resource capabilities while markedly mitigating alignment tax, preserving general competence more effectively than SFT. Despite producing less rigid surface overlap, semantic RL yields higher semantic quality and preference in open-ended generation, and few-shot transfer results indicate that it learns more transferable and robust representations under limited supervision. Overall, our study demonstrates that reinforcement learning with semantic rewards provides a safer and more reliable pathway for inclusive low-resource language expansion.

preprint2022arXiv

Learning to Prove Trigonometric Identities

Automatic theorem proving with deep learning methods has attracted attentions recently. In this paper, we construct an automatic proof system for trigonometric identities. We define the normalized form of trigonometric identities, design a set of rules for the proof and put forward a method which can generate theoretically infinite trigonometric identities. Our goal is not only to complete the proof, but to complete the proof in as few steps as possible. For this reason, we design a model to learn proof data generated by random BFS (rBFS), and it is proved theoretically and experimentally that the model can outperform rBFS after a simple imitation learning. After further improvement through reinforcement learning, we get AutoTrig, which can give proof steps for identities in almost as short steps as BFS (theoretically shortest method), with a time cost of only one-thousandth. In addition, AutoTrig also beats Sympy, Matlab and human in the synthetic dataset, and performs well in many generalization tasks.

preprint2021arXiv

Observation of the Orbital Rashba-Edelstein Magnetoresistance

We report the observation of magnetoresistance (MR) originating from the orbital angular momentum transport (OAM) in a Permalloy (Py) / oxidized Cu (Cu*) heterostructure: the orbital Rashba-Edelstein magnetoresistance. The angular dependence of the MR depends on the relative angle between the induced OAM and the magnetization in a similar fashion as the spin Hall magnetoresistance (SMR). Despite the absence of elements with large spin-orbit coupling, we find a sizable MR ratio, which is in contrast to the conventional SMR which requires heavy elements. By varying the thickness of the Cu* layer, we confirm that the interface is responsible for the MR, suggesting that the orbital Rashba-Edelstein effect is responsible for the generation of the OAM. Through Py thickness-dependence studies, we find that the effective values for the spin diffusion and spin dephasing lengths of Py are significantly larger than the values measured in Py / Pt bilayers, approximately by the factor of 2 and 4, respectively. This implies that another mechanism beyond the conventional spin-based scenario is responsible for the MR observed in Py / Cu* structures originated in a sizeable transport of OAM. Our findings not only unambiguously demonstrate the current-induced torque without using any heavy element via the OAM channel but also provide an important clue towards the microscopic understanding of the role that OAM transport can play for magnetization dynamics.

preprint2020arXiv

Covariant propagator and chiral power counting

Some one-loop diagrams with one and two external baryons/nucleons are revisited using covariant baryon propagators in chiral effective theory. We showed that it is enough to separate and subtract all the local terms that violate chiral power counting to recover chiral power counting, no need to introduce extra operations. The structures of leading chiral corrections and IR enhancement or threshold effects are 'stable' or persist as long as covariant propagators are employed for all particles.

preprint2020arXiv

Mixed Noise Removal with Pareto Prior

Denoising images contaminated by the mixture of additive white Gaussian noise (AWGN) and impulse noise (IN) is an essential but challenging problem. The presence of impulsive disturbances inevitably affects the distribution of noises and thus largely degrades the performance of traditional AWGN denoisers. Existing methods target to compensate the effects of IN by introducing a weighting matrix, which, however, is lack of proper priori and thus hard to be accurately estimated. To address this problem, we exploit the Pareto distribution as the priori of the weighting matrix, based on which an accurate and robust weight estimator is proposed for mixed noise removal. Particularly, a relatively small portion of pixels are assumed to be contaminated with IN, which should have weights with small values and then be penalized out. This phenomenon can be properly described by the Pareto distribution of type 1. Therefore, armed with the Pareto distribution, we formulate the problem of mixed noise removal in the Bayesian framework, where nonlocal self-similarity priori is further exploited by adopting nonlocal low rank approximation. Compared to existing methods, the proposed method can estimate the weighting matrix adaptively, accurately, and robust for different level of noises, thus can boost the denoising performance. Experimental results on widely used image datasets demonstrate the superiority of our proposed method to the state-of-the-arts.