Researcher profile

Qing Jiang

Qing Jiang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

Guide, Think, Act: Interactive Embodied Reasoning in Vision-Language-Action Models

In this paper, we propose GTA-VLA(Guide, Think, Act), an interactive Vision-Language-Action (VLA) framework that enables spatially steerable embodied reasoning by allowing users to guide robot policies with explicit visual cues. Existing VLA models learn a direct "Sense-to-Act" mapping from multimodal observations to robot actions. While effective within the training distribution, such tightly coupled policies are brittle under out-of-domain (OOD) shifts and difficult to correct when failures occur. Although recent embodied Chain-of-Thought (CoT) approaches expose intermediate reasoning, they still lack a mechanism for incorporating human spatial guidance, limiting their ability to resolve visual ambiguities or recover from mistakes. To address this gap, our framework allows users to optionally guide the policy with spatial priors, such as affordance points, boxes, and traces, which the subsequent reasoning process can directly condition on. Based on these inputs, the model generates a unified spatial-visual Chain-of-Thought that integrates external guidance with internal task planning, aligning human visual intent with autonomous decision-making. For practical deployment, we further couple the reasoning module with a lightweight reactive action head for efficient action execution. Extensive experiments demonstrate the effectiveness of our approach. On the in-domain SimplerEnv WidowX benchmark, our framework achieves a state-of-the-art 81.2% success rate. Under OOD visual shifts and spatial ambiguities, a single visual interaction substantially improves task success over existing methods, highlighting the value of interactive reasoning for failure recovery in embodied control. Details of the project can be found here: https://signalispupupu.github.io/GTA-VLA_ProjPage/

preprint2026arXiv

SceneParser: Hierarchical Scene Parsing for Visual Semantics Understanding

General scene perception has progressed from object recognition toward open-vocabulary grounding, part localization, and affordance prediction. Yet these capabilities are often realized as isolated predictions that localize objects, parts, or interaction points without capturing the structured dependencies needed for interaction-oriented scene understanding. To address this gap, we introduce Hierarchical Scene Parsing, an interaction-oriented parsing task that represents physical scenes as explicit scene -> object -> part -> affordance hierarchies with cross-level bindings. We instantiate this task with SceneParser, a VLM-based parser trained for unified hierarchical generation with structural-completion pseudo labels and curriculum learning. To support training and evaluation, we construct SceneParser-Bench, a large-scale benchmark built with a scalable hierarchical data engine, containing 110K training images, a 5K validation split, 777K objects, 1.14M parts, 1.74M affordance annotations, and 1.74M valid object-part-affordance chain instances. We further introduce Level-1 to Level-3 conditional metrics and ParseRate to evaluate localization, cross-level binding, and hierarchical completeness. Experiments show that existing MLLMs and perception-stitching pipelines struggle with hierarchical parsing on our SceneParser-Bench, while SceneParser achieves stronger structure-aware performance. Besides, ablations, evaluations on COCO and AGD20K, and a downstream planning probe demonstrate that our SceneParser is compatible with conventional tasks and provides an actionable representation for visual understanding.

preprint2022arXiv

Determinants of local chemical environments and magnetic moments of high-entropy alloys

High-entropy alloys (HEAs) such as CrMnFeCoNi exhibit unconventional mechanical properties due to their compositional disorder. However, it remains a formidable challenge to estimate the local chemical-environment and magnetic effects of HEAs. Herein we identify the state-associated cohesive energy and band filling originated from the tight-binding and Friedel models as descriptors to quantify the site-to-site chemical bonding and magnetic moments of HEAs. We find that the s-state cohesive energy is indispensable in determining the bonding-strength trend of CrMnFeCoNi that differs from the bonding characteristics of precious and refractory HEAs, while the s-band filling is effective in determining the magnetic moments. This unusual behavior stems from the unique chemical and magnetic nature of Cr atoms and is essentially due to the localized and transferred itinerant electrons. Our study establishes a fundamental physical picture of chemical bonding and magnetic interactions of HEAs and provides a rational guidance for designing advanced structural alloys.

preprint2020arXiv

Correlating surface energy with adsorption energy by means of intrinsic characteristics of substrates

Surface energy is fundamental in controlling surface properties and surface-driven processes like heterogeneous catalysis, as adsorption energy is. It is thus crucial to establish an effective scheme to determine surface energy and its relation with adsorption energy. Herein, we propose a model to quantify the effects of the intrinsic characteristics of materials on the material-dependent property and anisotropy of surface energy, based on the period number and group number of bulk atoms, and the valence-electron number, electronegativity and coordination of surface atoms. Our scheme holds for elemental crystals in both solid and liquid phases, body-centered-tetragonal intermetallics, fluorite-structure intermetallics, face-centered-cubic intermetallics, Mg-based surface alloys and semiconductor compounds, which further identifies a quantitative relation between surface energy and adsorption energy and rationalizes the material-dependent error of first-principle methods in calculating the two quantities. This model is predictive with easily accessible parameters and thus allows the rapid screening of materials for targeted properties.