Researcher profile

Xiaoze Liu

Xiaoze Liu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

Experience Sharing in Mutual Reinforcement Learning for Heterogeneous Language Models

We introduce Mutual Reinforcement Learning, a framework for concurrent RL post-training in which heterogeneous LLM policies exchange typed experience while keeping separate parameters, objectives, and tokenizers. The framework combines a Shared Experience Exchange (SEE), Multi-Worker Resource Allocation (MWRA), and a Tokenizer Heterogeneity Layer (THL) that retokenizes text and aligns token-level traces across incompatible vocabularies. This substrate makes the experience-sharing design question operational across model families. We instantiate three controlled probes on top of GRPO: data-level rollout sharing via Peer Rollout Pooling (PRP), value-level advantage sharing via Cross-Policy GRPO Advantage Sharing (XGRPO), and outcome-level success transfer via Success-Gated Transfer (SGT). A contextual-bandit analysis characterizes their structural positions on a stability-support trade-off: PRP pays density-ratio variance and THL residual costs, XGRPO preserves on-policy actor support while changing scalar baselines, and SGT supplies a rescue-set score direction toward verified peer successes. In the evaluated regime, outcome-level sharing occupies the favorable point of this trade-off.

preprint2026arXiv

Multi-Rollout On-Policy Distillation via Peer Successes and Failures

Large language models are often post-trained with sparse verifier rewards, which indicate whether a sampled trajectory succeeds but provide limited guidance about where reasoning succeeds or fails. On-policy distillation (OPD) offers denser token-level supervision by training on student-generated trajectories, yet existing methods typically distill each rollout independently and ignore the other attempts sampled for the same prompt. We introduce Multi-Rollout On-Policy Distillation (MOPD), a peer-conditioned distillation framework that uses the student's local rollout group to construct more informative teacher signals. MOPD conditions the teacher on both successful and failed peer rollouts: successes provide positive evidence for valid reasoning patterns, while failures provide structured negative evidence about plausible mistakes to avoid. We study two peer-context constructions: positive peer imitation and contrastive success-failure conditioning. Experiments on competitive programming, mathematical reasoning, scientific question answering, and tool-use benchmarks show that MOPD consistently improves over standard on-policy baselines. Further teacher-signal analysis shows that mixed success-failure contexts better align teacher scores with verifier rewards, indicating that the gains arise from more faithful, instance-adaptive supervision. These results indicate that effective on-policy distillation should exploit the student's multi-rollout trial-and-error behavior rather than treating rollouts as isolated samples.

preprint2022arXiv

ClusterEA: Scalable Entity Alignment with Stochastic Training and Normalized Mini-batch Similarities

Entity alignment (EA) aims at finding equivalent entities in different knowledge graphs (KGs). Embedding-based approaches have dominated the EA task in recent years. Those methods face problems that come from the geometric properties of embedding vectors, including hubness and isolation. To solve these geometric problems, many normalization approaches have been adopted for EA. However, the increasing scale of KGs renders it hard for EA models to adopt the normalization processes, thus limiting their usage in real-world applications. To tackle this challenge, we present ClusterEA, a general framework that is capable of scaling up EA models and enhancing their results by leveraging normalization methods on mini-batches with a high entity equivalent rate. ClusterEA contains three components to align entities between large-scale KGs, including stochastic training, ClusterSampler, and SparseFusion. It first trains a large-scale Siamese GNN for EA in a stochastic fashion to produce entity embeddings. Based on the embeddings, a novel ClusterSampler strategy is proposed for sampling highly overlapped mini-batches. Finally, ClusterEA incorporates SparseFusion, which normalizes local and global similarity and then fuses all similarity matrices to obtain the final similarity matrix. Extensive experiments with real-life datasets on EA benchmarks offer insight into the proposed framework, and suggest that it is capable of outperforming the state-of-the-art scalable EA framework by up to 8 times in terms of Hits@1.

preprint2022arXiv

Electrically pumped polarized exciton-polaritons in a halide perovskite microcavity

Exciton polaritons, hybrid quasiparticles with part-light part-matter nature in semiconductor microcavities, are extensively investigated for striking phenomena such as polariton condensation and quantum emulation. These phenomena have recently been discovered in emerging lead halide perovskites at elevated temperatures up to room temperature. For advancing these discoveries into practical applications, one critical requirement is the realization of electrically pumped exciton-polaritons. However, electrically pumped polariton light-emitting devices with perovskites have not yet been achieved experimentally. Here, we devise a new method to combine the device with the microcavity and report the first halide perovskite polariton light-emitting device. Specifically, the device is based on a CsPbBr3 capacitive structure, which can inject the electrons and holes from the same electrode, conducive to the formation of excitons and simultaneously maintaining the high quality of the microcavity. In addition, highly polarization-selective polariton emissions have been demonstrated due to the optical birefringence in the CsPbBr3 microplate. This work paves the way for realizing practical polaritonic devices such as high-speed light-emitting devices for information communications and inversionless electrically pumped lasers based on perovskites.

preprint2019arXiv

Observation of Rydberg exciton polaritons and their condensate in a perovskite cavity

The condensation of half-light half-matter exciton polaritons in semiconductor optical cavities is a striking example of macroscopic quantum coherence in a solid state platform. Quantum coherence is possible only when there are strong interactions between the exciton polaritons provided by their excitonic constituents. Rydberg excitons with high principle value exhibit strong dipole-dipole interactions in cold atoms. However, polaritons with the excitonic constituent that is an excited state, namely Rydberg exciton polaritons (REPs), have not yet been experimentally observed. Here, for the first time, we observe the formation of REPs in a single crystal CsPbBr3 perovskite cavity without any external fields. These polaritons exhibit strong nonlinear behavior that leads to a coherent polariton condensate with a prominent blue shift. Furthermore, the REPs in CsPbBr3 are highly anisotropic and have a large extinction ratio, arising from the perovskite's orthorhombic crystal structure. Our observation not only sheds light on the importance of many-body physics in coherent polariton systems involving higher-order excited states, but also paves the way for exploring these coherent interactions for solid state quantum optical information processing.