Source author record

Zhou Liu

Zhou Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Computation and Language Computer Vision cond-mat.mtrl-sci eess.IV hep-ph hep-th nucl-th

Catalog footprint

What is connected

5works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Reinforcement Learning with Semantic Rewards Enables Low-Resource Language Expansion without Alignment Tax

Extending large language models (LLMs) to low-resource languages often incurs an "alignment tax": improvements in the target language come at the cost of catastrophic forgetting in general capabilities. We argue that this trade-off arises from the rigidity of supervised fine-tuning (SFT), which enforces token-level surface imitation on narrow and biased data distributions. To address this limitation, we propose a semantic-space alignment paradigm powered by Group Relative Policy Optimization (GRPO), where the model is optimized using embedding-level semantic rewards rather than likelihood maximization. This objective encourages meaning preservation through flexible realizations, enabling controlled updates that reduce destructive interference with pretrained knowledge. We evaluate our approach on Tibetan-Chinese machine translation and Tibetan headline generation. Experiments show that our method acquires low-resource capabilities while markedly mitigating alignment tax, preserving general competence more effectively than SFT. Despite producing less rigid surface overlap, semantic RL yields higher semantic quality and preference in open-ended generation, and few-shot transfer results indicate that it learns more transferable and robust representations under limited supervision. Overall, our study demonstrates that reinforcement learning with semantic rewards provides a safer and more reliable pathway for inclusive low-resource language expansion.

preprint2022arXiv

Learning to Prove Trigonometric Identities

Automatic theorem proving with deep learning methods has attracted attentions recently. In this paper, we construct an automatic proof system for trigonometric identities. We define the normalized form of trigonometric identities, design a set of rules for the proof and put forward a method which can generate theoretically infinite trigonometric identities. Our goal is not only to complete the proof, but to complete the proof in as few steps as possible. For this reason, we design a model to learn proof data generated by random BFS (rBFS), and it is proved theoretically and experimentally that the model can outperform rBFS after a simple imitation learning. After further improvement through reinforcement learning, we get AutoTrig, which can give proof steps for identities in almost as short steps as BFS (theoretically shortest method), with a time cost of only one-thousandth. In addition, AutoTrig also beats Sympy, Matlab and human in the synthetic dataset, and performs well in many generalization tasks.

preprint2021arXiv

Observation of the Orbital Rashba-Edelstein Magnetoresistance

We report the observation of magnetoresistance (MR) originating from the orbital angular momentum transport (OAM) in a Permalloy (Py) / oxidized Cu (Cu*) heterostructure: the orbital Rashba-Edelstein magnetoresistance. The angular dependence of the MR depends on the relative angle between the induced OAM and the magnetization in a similar fashion as the spin Hall magnetoresistance (SMR). Despite the absence of elements with large spin-orbit coupling, we find a sizable MR ratio, which is in contrast to the conventional SMR which requires heavy elements. By varying the thickness of the Cu* layer, we confirm that the interface is responsible for the MR, suggesting that the orbital Rashba-Edelstein effect is responsible for the generation of the OAM. Through Py thickness-dependence studies, we find that the effective values for the spin diffusion and spin dephasing lengths of Py are significantly larger than the values measured in Py / Pt bilayers, approximately by the factor of 2 and 4, respectively. This implies that another mechanism beyond the conventional spin-based scenario is responsible for the MR observed in Py / Cu* structures originated in a sizeable transport of OAM. Our findings not only unambiguously demonstrate the current-induced torque without using any heavy element via the OAM channel but also provide an important clue towards the microscopic understanding of the role that OAM transport can play for magnetization dynamics.

preprint2020arXiv

Covariant propagator and chiral power counting

Some one-loop diagrams with one and two external baryons/nucleons are revisited using covariant baryon propagators in chiral effective theory. We showed that it is enough to separate and subtract all the local terms that violate chiral power counting to recover chiral power counting, no need to introduce extra operations. The structures of leading chiral corrections and IR enhancement or threshold effects are 'stable' or persist as long as covariant propagators are employed for all particles.

preprint2020arXiv

Mixed Noise Removal with Pareto Prior

Denoising images contaminated by the mixture of additive white Gaussian noise (AWGN) and impulse noise (IN) is an essential but challenging problem. The presence of impulsive disturbances inevitably affects the distribution of noises and thus largely degrades the performance of traditional AWGN denoisers. Existing methods target to compensate the effects of IN by introducing a weighting matrix, which, however, is lack of proper priori and thus hard to be accurately estimated. To address this problem, we exploit the Pareto distribution as the priori of the weighting matrix, based on which an accurate and robust weight estimator is proposed for mixed noise removal. Particularly, a relatively small portion of pixels are assumed to be contaminated with IN, which should have weights with small values and then be penalized out. This phenomenon can be properly described by the Pareto distribution of type 1. Therefore, armed with the Pareto distribution, we formulate the problem of mixed noise removal in the Bayesian framework, where nonlocal self-similarity priori is further exploited by adopting nonlocal low rank approximation. Compared to existing methods, the proposed method can estimate the weighting matrix adaptively, accurately, and robust for different level of noises, thus can boost the denoising performance. Experimental results on widely used image datasets demonstrate the superiority of our proposed method to the state-of-the-arts.

Zhou Liu

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Reinforcement Learning with Semantic Rewards Enables Low-Resource Language Expansion without Alignment Tax

Learning to Prove Trigonometric Identities

Observation of the Orbital Rashba-Edelstein Magnetoresistance

Covariant propagator and chiral power counting

Mixed Noise Removal with Pareto Prior