Researcher profile

Jiaming Hu

Jiaming Hu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

Ab initio study of carrier mobility in Bi$_2$O$_2$Se

Bi$_2$O$_2$Se is an emerging high-performance layered semiconductor with excellent stability. While experimental studies have explored carrier transport across various doping levels for both $n$-type and $p$-type conduction, a comprehensive theoretical understanding remains incomplete. In this work, we present parameter-free first-principles calculations of the electron and hole mobilities in Bi$_2$O$_2$Se, based on iterative solution of the Boltzmann transport equation that includes electron-phonon scattering and ionized impurity scattering on an equal footing. Intriguingly, we find that Bi$_2$O$_2$Se exhibits high electron mobilities in both the in-plane and out-of-plane directions, whereas the hole mobilities are only significant in the in-plane direction, displaying a unique three-dimensional (3D) electron transport and two-dimensional (2D) hole transport behavior. At 300~K, the calculated intrinsic electron and hole mobilities along the in-plane direction are 447~$\mathrm{cm^2\,V^{-1}\,s^{-1}}$ and 29~$\mathrm{cm^2\,V^{-1}\,s^{-1}}$, respectively, which are primarily affected by Fröhlich electron-phonon interactions. Due to its large static dielectric permittivity, Bi$_2$O$_2$Se exhibits an exceptionally high low-temperature electron mobilities above $1.0\times10^5~\mathrm{cm^2\,V^{-1}\,s^{-1}}$, and its electron mobilities above 50~K is robust against ionized impurity scattering over a wide range of impurity concentrations. By incorporating the Hall effect into our analysis, we predict an in-plane electron Hall mobility of 517~$\mathrm{cm^2\,V^{-1}\,s^{-1}}$ at 300~K, in excellent agreement with experimental data. These results provide valuable insights into the carrier transport mechanisms in Bi$_2$O$_2$Se, and offer predictive benchmarks for future theoretical and experimental investigations.

preprint2026arXiv

BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning

Image captioning is one of the most fundamental tasks in computer vision. Owing to its open-ended nature, it has received significant attention in the era of multimodal large language models (MLLMs). In pursuit of ever more detailed and accurate captions, recent work has increasingly turned to reinforcement learning (RL). However, existing captioning-RL methods and evaluation metrics often emphasize a narrow notion of caption quality, inducing trade-offs across core dimensions of captioning. For example, utility-oriented objectives can encourage noisy, hallucinated, or overlong captions that improve downstream question answering while harming fluency, whereas arena-style objectives can favor fluent but generic descriptions with limited usefulness. To address this, we propose a more balanced RL framework that jointly optimizes utility-aware correctness, reference coverage, and linguistic quality. In order to effectively optimize the resulting continuous multi-objective reward formulation, we apply GDPO-style reward-decoupled normalization to continuous-valued captioning rewards and show that it improves performance over vanilla GRPO. Additionally, we introduce length-conditional reward masking, yielding a more suitable length penalty for captioning. Across LLaVA-1.5-7B and Qwen2.5-VL 3B and 7B base models, our method consistently improves caption quality, with peak gains of +13.6 DCScore, +9.0 CaptionQA, and +29.0 CapArena across different models.

preprint2026arXiv

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

AI agents may soon become capable of autonomously completing valuable, long-horizon tasks in diverse domains. Current benchmarks either do not measure real-world tasks, or are not sufficiently difficult to meaningfully measure frontier models. To this end, we present Terminal-Bench 2.0: a carefully curated hard benchmark composed of 89 tasks in computer terminal environments inspired by problems from real workflows. Each task features a unique environment, human-written solution, and comprehensive tests for verification. We show that frontier models and agents score less than 65\% on the benchmark and conduct an error analysis to identify areas for model and agent improvement. We publish the dataset and evaluation harness to assist developers and researchers in future work at https://www.tbench.ai/ .

preprint2026arXiv

Towards General Preference Alignment: Diffusion Models at Nash Equilibrium

Reinforcement learning from human feedback (RLHF) has been popular for aligning text-to-image (T2I) diffusion models with human preferences. As a mainstream branch of RLHF, Direct Preference Optimization (DPO) offers a computationally efficient alternative that avoids explicit reward modeling and has been widely adopted in diffusion alignment. However, existing preference-based methods for diffusion alignment still rely on reward-induced preference signals and typically assume that human preferences can be adequately modeled by the Bradley--Terry (BT) model, which may fail to capture the full complexity of human preferences. In this paper, we formulate diffusion alignment from a game-theoretic perspective. We propose Diffusion Nash Preference Optimization (Diff.-NPO), an intuitive general preference framework for diffusion alignment. Diff.-NPO encourages the current policy to play against itself to achieve self improvement and lead to a better alignment. Empirically, we demonstrate the effectiveness of Diff.-NPO on the text-to-image generation task via various metrics. Diff.-NPO consistently outperforms existing preference-based diffusion alignment methods.