Researcher profile

Jingyuan Zhu

Jingyuan Zhu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

Diffusion-APO: Trajectory-Aware Direct Preference Alignment for Video Diffusion Transformers

Efficiently aligning large-scale video diffusion models with human intent requires a scalable and trajectory-aware pathway that bridges the inherent discrepancy between training noise distributions and practical inference trajectories. While existing paradigms such as Direct Preference Optimization (DPO) and Group Relative Policy Optimization (GRPO) attempt to address this, they are often hindered by either reliance on bias-prone, complex reward models or suboptimal timestep sampling. In this paper, we propose Diffusion-APO (Aligned Preference Optimization), a trajectory-aware algorithm that resolves this misalignment by synchronizing training noise with inference-time denoising paths to maximize gradient signal efficacy. To translate this algorithmic innovation into a practical solution, we introduce a unified and modular RLHF framework that integrates online ranking, half-online anchoring, offline refinement, and distillation-aware drift correction. This framework enables flexible, multi-stage preference alignment across diverse data and computational constraints without relying on scalar-reward-based policy gradients. Through extensive experiments, we demonstrate that Diffusion-APO consistently outperforms standard baselines in visual quality and instruction following, while effectively preserving generative fidelity during model acceleration, providing a robust, end-to-end pathway for scalable video diffusion alignment.

preprint2026arXiv

Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs

Current Vision-Language Models (VLMs) struggle with fine-grained spatial reasoning, particularly when multi-step logic and precise spatial alignment are required. In this work, we introduce SpatialReasoner-R1, a vision-language reasoning model designed to address these limitations. To construct high-quality supervision for spatial reasoning, we design a Multi-Model Monte Carlo Tree Search (M3CTS) method that generates diverse, logically consistent Long Chain-of-Thought (LongCOT) reasoning trajectories. In addition, we propose a fine-grained Direct Preference Optimization (fDPO) method that introduces segment-specific preference granularity for descriptive grounding and logical reasoning, guided by a spatial reward mechanism that evaluates candidate responses based on visual consistency, spatial grounding, and logical coherence. Experimental results demonstrate that fDPO achieves relative performance gains of 4.1% and 9.0% over standard DPO on spatial qualitative and quantitative tasks, respectively. SpatialReasoner-R1, trained with fDPO, sets a new SoTA on SpatialRGPT-Bench, outperforming the strongest baseline by 9.4% in average accuracy, while maintaining competitive performance on general vision-language tasks.

preprint2022arXiv

Solving the 3-SAT problem using network-based biocomputation

The 3-Satisfiability Problem (3-SAT) is a demanding combinatorial problem, of central importance among the non-deterministic polynomial (NP) complete problems, with applications in circuit design, artificial intelligence and logistics. Even with optimized algorithms, the solution space that needs to be explored grows exponentially with increasing size of 3-SAT instances. Thus, large 3-SAT instances require excessive amounts of energy to solve with serial electronic computers. Network-based biocomputation (NBC) is a multidisciplinary parallel computation approach with drastically reduced energy consumption. NBC uses biomolecular motors to propel cytoskeletal filaments through nanofabricated networks that encode the mathematical problems. By stochastically exploring possible paths through the networks, the cytoskeletal filaments find possible solutions to the encoded problem instance. Here we first report a novel algorithm that converts 3-SAT into NBC-compatible network format. We demonstrate that this algorithm works in practice, by experimentally solving four small 3-SAT instances (with up to 3 variables and 5 clauses) using the actin-myosin biomolecular motor system. This is a key step towards the broad general applicability of NBC because polynomial conversions to 3-SAT exist for a wide set of important NP-complete problems.

preprint2021arXiv

Physical requirements for scaling up network-based biocomputation

The high energy consumption of electronic data processors, together with physical challenges limiting their further improvement, has triggered intensive interest in alternative computation paradigms. Here we focus on network-based biocomputation (NBC), a massively parallel approach that benefits from the energy efficiency of biological agents, such as molecular motors or bacteria, and their availability in large numbers. We analyse and define the fundamental requirements that need to be fulfilled to scale up NBC computers to become a viable technology that can solve large NP-complete problem instances faster or with less energy consumption than electronic computers. Our work can serve as a guide for further efforts to contribute to elements of future NBC devices, and as the theoretical basis for a detailed NBC roadmap.