Researcher profile

Zijun Wei

Zijun Wei contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

Diagnosing Training Inference Mismatch in LLM Reinforcement Learning

Modern LLM RL systems separate rollout generation from policy optimization. These two stages are expected to produce token probabilities that match exactly. However, implementation differences can make them assign different values to the same sequence under the same model weights, inducing Training-Inference Mismatch (TIM). TIM is difficult to inspect because it is entangled with off-policy drift and common stabilization mechanisms. In this work, we isolate TIM in a zero-mismatch diagnostic setting (VeXact), and show that small token-level numerical disagreements can independently cause training collapse. We further show that TIM changes the effective optimization problem, and identify a set of remedies that could mitigate TIM. Our results suggest that TIM is not benign numerical noise, but a systems-level perturbation that should be treated as a first-order factor in analyzing LLM RL stability.

preprint2022arXiv

Interactive Portrait Harmonization

Current image harmonization methods consider the entire background as the guidance for harmonization. However, this may limit the capability for user to choose any specific object/person in the background to guide the harmonization. To enable flexible interaction between user and harmonization, we introduce interactive harmonization, a new setting where the harmonization is performed with respect to a selected \emph{region} in the reference image instead of the entire background. A new flexible framework that allows users to pick certain regions of the background image and use it to guide the harmonization is proposed. Inspired by professional portrait harmonization users, we also introduce a new luminance matching loss to optimally match the color/luminance conditions between the composite foreground and select reference region. This framework provides more control to the image harmonization pipeline achieving visually pleasing portrait edits. Furthermore, we also introduce a new dataset carefully curated for validating portrait harmonization. Extensive experiments on both synthetic and real-world datasets show that the proposed approach is efficient and robust compared to previous harmonization baselines, especially for portraits. Project Webpage at \href{https://jeya-maria-jose.github.io/IPH-web/}{https://jeya-maria-jose.github.io/IPH-web/}

preprint2021arXiv

Structure and magnetic properties of melilite-type compounds RE2Be2GeO7 (RE = Pr, Nd, Gd-Yb) with Rare-Earth ions on Shastry-Sutherland lattice

Rare-earth (RE) based frustrated magnets as typical systems of combining strong spin-orbit coupling, geometric frustration and anisotropic exchange interactions, can give rise to diverse exotic magnetic ground states such as quantum spin liquid (QSL). The discovery of new RE-based frustrated materials is crucial for exploring the exotic magnetic phases. Herein, we report the synthesis, structure and magnetic properties of a family of melilite-type RE2Be2GeO7 (RE = Pr, Nd, Gd-Yb) compounds crystallized in a tetragonal structure, where magnetic RE3+ ions lay out on Shastry-Sutherland lattice (SSL) within ab-plane and are well separated by nonmagnetic GeBe2O7 polyhedrons along c-axis. Temperature-dependent susceptibilities and isothermal magnetization M(H) measurements reveal that most RE2Be2GeO7 compounds except RE=Tb show no magnetic ordering down to 2 K despite the dominant antiferromagnetic (AFM) interactions, where Tb2Be2GeO7 undergoes AFM transition with Neel temperature TN~ 2.5 K and field-induced spin flop behaviors (T< TN). In addition, the calculated magnetic entropy change from the isothermal M(H) curves reveal a viable magnetocaloric effect (MCE) for RE2Be2GeO7 (RE =Gd, Dy) in liquid helium temperature regimes, Gd2Be2GeO7 shows maximum Sm up to 54.8 J K-1 Kg-1 at H= 7 T and Dy2Be2GeO7 has largest value Sm=16.1 J K-1 kg-1 at H= 2 T in this family. More excitingly, rich diversity of RE ions in this family enables an archetype for exploring exotic quantum magnetic phenomena with large variability of spin located on SSL lattice.

preprint2020arXiv

Predicting Goal-directed Human Attention Using Inverse Reinforcement Learning

Being able to predict human gaze behavior has obvious importance for behavioral vision and for computer vision applications. Most models have mainly focused on predicting free-viewing behavior using saliency maps, but these predictions do not generalize to goal-directed behavior, such as when a person searches for a visual target object. We propose the first inverse reinforcement learning (IRL) model to learn the internal reward function and policy used by humans during visual search. The viewer&#39;s internal belief states were modeled as dynamic contextual belief maps of object locations. These maps were learned by IRL and then used to predict behavioral scanpaths for multiple target categories. To train and evaluate our IRL model we created COCO-Search18, which is now the largest dataset of high-quality search fixations in existence. COCO-Search18 has 10 participants searching for each of 18 target-object categories in 6202 images, making about 300,000 goal-directed fixations. When trained and evaluated on COCO-Search18, the IRL model outperformed baseline models in predicting search fixation scanpaths, both in terms of similarity to human search behavior and search efficiency. Finally, reward maps recovered by the IRL model reveal distinctive target-dependent patterns of object prioritization, which we interpret as a learned object context.