Source author record

Haoyuan Sun

Haoyuan Sun appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Distributed, Parallel, and Cluster Computing eess.SY Multiagent Systems Systems and Control

Catalog footprint

What is connected

2works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Power Reinforcement Post-Training of Text-to-Image Models with Super-Linear Advantage Shaping

Recently, post-training methods based on reinforcement learning, with a particular focus on Group Relative Policy Optimization (GRPO), have emerged as the robust paradigm for further advancement of text-to-image (T2I) models. However, these methods are often prone to reward hacking, wherein models exploit biases in imperfect reward functions rather than yielding genuine performance gains. In this work, we identify that normalization could lead to miscalibration and directly removing the prompt-level standard deviation term yields an optimal policy ascent direction that is linear in the advantage but still limits the separation of genuine signals from noise. To mitigate the above issues, we propose Super-Linear Advantage Shaping (SLAS) by revisiting the functional update from an information geometry perspective. By extending the Fisher-Rao information metric with advantage-dependent weighting, SLAS introduces a non-linear geometric structure that reshapes the local policy space. This design relaxes constraints along high-advantage directions to amplify informative updates, while tightening those in low-advantage regions to suppress illusory gradients. In addition, batch-level normalization is applied to stabilize training under varying reward scales. Extensive evaluations demonstrate that SLAS consistently surpasses the DanceGRPO baseline across multiple backbones and benchmarks. In particular, it yields faster training dynamics, improved out-of-domain performance on GenEval and UniGenBench++, and enhanced robustness to model scaling, while mitigating reward hacking and preserving semantic and compositional fidelity in generations.

preprint2020arXiv

Distributed Submodular Maximization with Parallel Execution

The submodular maximization problem is widely applicable in many engineering problems where objectives exhibit diminishing returns. While this problem is known to be NP-hard for certain subclasses of objective functions, there is a greedy algorithm which guarantees approximation at least 1/2 of the optimal solution. This greedy algorithm can be implemented with a set of agents, each making a decision sequentially based on the choices of all prior agents. In this paper, we consider a generalization of the greedy algorithm in which agents can make decisions in parallel, rather than strictly in sequence. In particular, we are interested in partitioning the agents, where a set of agents in the partition all make a decision simultaneously based on the choices of prior agents, so that the algorithm terminates in limited iterations. We provide bounds on the performance of this parallelized version of the greedy algorithm and show that dividing the agents evenly among the sets in the partition yields an optimal structure. We additionally show that this optimal structure is still near-optimal when the objective function exhibits a certain monotone property. Lastly, we show that the same performance guarantees can be achieved in the parallelized greedy algorithm even when agents can only observe the decisions of a subset of prior agents.