Researcher profile

Haoran He

Haoran He contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2025arXiv

GARDO: Reinforcing Diffusion Models without Reward Hacking

Fine-tuning diffusion models via online reinforcement learning (RL) has shown great potential for enhancing text-to-image alignment. However, since precisely specifying a ground-truth objective for visual tasks remains challenging, the models are often optimized using a proxy reward that only partially captures the true goal. This mismatch often leads to reward hacking, where proxy scores increase while real image quality deteriorates and generation diversity collapses. While common solutions add regularization against the reference policy to prevent reward hacking, they compromise sample efficiency and impede the exploration of novel, high-reward regions, as the reference policy is usually sub-optimal. To address the competing demands of sample efficiency, effective exploration, and mitigation of reward hacking, we propose Gated and Adaptive Regularization with Diversity-aware Optimization (GARDO), a versatile framework compatible with various RL algorithms. Our key insight is that regularization need not be applied universally; instead, it is highly effective to selectively penalize a subset of samples that exhibit high uncertainty. To address the exploration challenge, GARDO introduces an adaptive regularization mechanism wherein the reference model is periodically updated to match the capabilities of the online policy, ensuring a relevant regularization target. To address the mode collapse issue in RL, GARDO amplifies the rewards for high-quality samples that also exhibit high diversity, encouraging mode coverage without destabilizing the optimization process. Extensive experiments across diverse proxy rewards and hold-out unseen metrics consistently show that GARDO mitigates reward hacking and enhances generation diversity without sacrificing sample efficiency or exploration, highlighting its effectiveness and robustness.

preprint2020arXiv

Chiral symmetry breaking for deterministic switching of perpendicular magnetization by spin-orbit torque

Symmetry breaking is a characteristic to determine which branch of a bifurcation system follows upon crossing a critical point. Specifically, in spin-orbit torque (SOT) devices, a fundamental question arises: how to break the symmetry of the perpendicular magnetic moment by the in-plane spin polarization? Here, we show that the chiral symmetry breaking by the DMI can induce the deterministic SOT switching of the perpendicular magnetization. By introducing a gradient of saturation magnetization or magnetic anisotropy, non-collinear spin textures are formed by the gradient of effective SOT strength, and thus the chiral symmetry of the SOT-induced spin textures is broken by the DMI, resulting in the deterministic magnetization switching. We introduce a strategy to induce an out-of-plane (z) gradient of magnetic properties, as a practical solution for the wafer-scale manufacture of SOT devices.