Researcher profile

Jian Dong

Jian Dong contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

Lost in Benchmarks? Rethinking Large Language Model Benchmarking with Item Response Theory

The evaluation of large language models (LLMs) via benchmarks is widespread, yet inconsistencies between different leaderboards and poor separability among top models raise concerns about their ability to accurately reflect authentic model capabilities. This paper provides a critical analysis of benchmark effectiveness, examining mainstream prominent LLM benchmarks using results from diverse models. We first propose Pseudo-Siamese Network for Item Response Theory (PSN-IRT), an enhanced Item Response Theory framework that incorporates a rich set of item parameters within an IRT-grounded architecture. PSN-IRT can be utilized for accurate and reliable estimations of item characteristics and model abilities. Based on PSN-IRT, we conduct extensive analysis on 11 LLM benchmarks comprising 41,871 items, revealing significant and varied shortcomings in their measurement quality. Furthermore, we demonstrate that leveraging PSN-IRT is able to construct smaller benchmarks while maintaining stronger alignment with human preference.

preprint2025arXiv

TIM-PRM: Verifying multimodal reasoning with Tool-Integrated PRM

Multimodal Large Language Models (MLLMs) have achieved impressive performances in mathematical reasoning, yet they remain vulnerable to visual hallucinations and logical inconsistencies that standard outcome-based supervision fails to mitigate. While Process Reward Models (PRMs) promise step-by-step verification, current approaches typically operate as scalar scorers or generative critics that suffer from sycophancy, blindly validating the flawed hypotheses rather than grounding them in visual reality. To bridge this gap, we introduce TIM-PRM (Tool-Integrated Multimodal PRM), a novel agentic framework that transforms verification from a passive classification task into an active, tool-augmented investigation. TIM-PRM is trained to explicitly plan verification strategies and utilizes a mechanism of Independent Question Asking to query evidence via external tools, effectively decoupling verification from the reasoning context to eliminate confirmation bias. We instantiate this method by curating a high-quality dataset of tool-integrated verification trajectories. Extensive experiments on VisualProcessBench demonstrate that our 8B parameter model surpasses existing open-source multimodal PRMs, significantly outperforming much larger models like Qwen2.5-72B and InternVL-78B, while offering interpretable insights into the verification process.

preprint2020arXiv

DR 21 South Filament: a Parsec-sized Dense Gas Accretion Flow onto the DR 21 Massive Young Cluster

DR21 south filament (DR21SF) is a unique component of the giant network of filamentary molecular clouds in the north region of Cygnus X complex. Unlike the highly fragmented and star-forming active environment it resides, DR21SF exhibits a coherent profile in the column density map with very few star formation signposts, even though the previously reported linear density of the filament is an order of magnitude higher than the thermal stable threshold. We derive the size (3.6~pc by 0.13~pc), temperature (10 to 15~K), and mass (1048~\textit{M$_\odot$}) of DR21SF from Shanghai 65 m TianMa Radio Telescope (TMRT) observations of NH$_3$ (1, 1) and (2, 2) inversion lines in conjunction with the column density map from our previous work. Star-forming sites are identified along the filament where gas temperature excesses. We find clear gradients in radial velocity and intrinsic line-width along the spine of the filament. The gradients can be well interpreted with a scenario of an accretion flow feeding DR 21 at a mass transfer rate of $1.1 \times 10^{-3}$~\textit{M$_\odot$} yr$^{-1}$. Based on the analysis of its kinematic temperature, intrinsic line-width and mass distribution, we conclude that DR21SF is in an overall trans-critical status, which indicates an early evolutionary stage.

preprint2019arXiv

Very Long Natural Scenery Image Prediction by Outpainting

Comparing to image inpainting, image outpainting receives less attention due to two challenges in it. The first challenge is how to keep the spatial and content consistency between generated images and original input. The second challenge is how to maintain high quality in generated results, especially for multi-step generations in which generated regions are spatially far away from the initial input. To solve the two problems, we devise some innovative modules, named Skip Horizontal Connection and Recurrent Content Transfer, and integrate them into our designed encoder-decoder structure. By this design, our network can generate highly realistic outpainting prediction effectively and efficiently. Other than that, our method can generate new images with very long sizes while keeping the same style and semantic content as the given input. To test the effectiveness of the proposed architecture, we collect a new scenery dataset with diverse, complicated natural scenes. The experimental results on this dataset have demonstrated the efficacy of our proposed network. The code and dataset are available from https://github.com/z-x-yang/NS-Outpainting.