Source author record

Qing Jiang

Qing Jiang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.mtrl-sci Networking and Internet Architecture Robotics Social and Information Networks

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Guide, Think, Act: Interactive Embodied Reasoning in Vision-Language-Action Models

In this paper, we propose GTA-VLA(Guide, Think, Act), an interactive Vision-Language-Action (VLA) framework that enables spatially steerable embodied reasoning by allowing users to guide robot policies with explicit visual cues. Existing VLA models learn a direct "Sense-to-Act" mapping from multimodal observations to robot actions. While effective within the training distribution, such tightly coupled policies are brittle under out-of-domain (OOD) shifts and difficult to correct when failures occur. Although recent embodied Chain-of-Thought (CoT) approaches expose intermediate reasoning, they still lack a mechanism for incorporating human spatial guidance, limiting their ability to resolve visual ambiguities or recover from mistakes. To address this gap, our framework allows users to optionally guide the policy with spatial priors, such as affordance points, boxes, and traces, which the subsequent reasoning process can directly condition on. Based on these inputs, the model generates a unified spatial-visual Chain-of-Thought that integrates external guidance with internal task planning, aligning human visual intent with autonomous decision-making. For practical deployment, we further couple the reasoning module with a lightweight reactive action head for efficient action execution. Extensive experiments demonstrate the effectiveness of our approach. On the in-domain SimplerEnv WidowX benchmark, our framework achieves a state-of-the-art 81.2% success rate. Under OOD visual shifts and spatial ambiguities, a single visual interaction substantially improves task success over existing methods, highlighting the value of interactive reasoning for failure recovery in embodied control. Details of the project can be found here: https://signalispupupu.github.io/GTA-VLA_ProjPage/

preprint2026arXiv

SceneParser: Hierarchical Scene Parsing for Visual Semantics Understanding

General scene perception has progressed from object recognition toward open-vocabulary grounding, part localization, and affordance prediction. Yet these capabilities are often realized as isolated predictions that localize objects, parts, or interaction points without capturing the structured dependencies needed for interaction-oriented scene understanding. To address this gap, we introduce Hierarchical Scene Parsing, an interaction-oriented parsing task that represents physical scenes as explicit scene -> object -> part -> affordance hierarchies with cross-level bindings. We instantiate this task with SceneParser, a VLM-based parser trained for unified hierarchical generation with structural-completion pseudo labels and curriculum learning. To support training and evaluation, we construct SceneParser-Bench, a large-scale benchmark built with a scalable hierarchical data engine, containing 110K training images, a 5K validation split, 777K objects, 1.14M parts, 1.74M affordance annotations, and 1.74M valid object-part-affordance chain instances. We further introduce Level-1 to Level-3 conditional metrics and ParseRate to evaluate localization, cross-level binding, and hierarchical completeness. Experiments show that existing MLLMs and perception-stitching pipelines struggle with hierarchical parsing on our SceneParser-Bench, while SceneParser achieves stronger structure-aware performance. Besides, ablations, evaluations on COCO and AGD20K, and a downstream planning probe demonstrate that our SceneParser is compatible with conventional tasks and provides an actionable representation for visual understanding.

preprint2022arXiv

Determinants of local chemical environments and magnetic moments of high-entropy alloys

High-entropy alloys (HEAs) such as CrMnFeCoNi exhibit unconventional mechanical properties due to their compositional disorder. However, it remains a formidable challenge to estimate the local chemical-environment and magnetic effects of HEAs. Herein we identify the state-associated cohesive energy and band filling originated from the tight-binding and Friedel models as descriptors to quantify the site-to-site chemical bonding and magnetic moments of HEAs. We find that the s-state cohesive energy is indispensable in determining the bonding-strength trend of CrMnFeCoNi that differs from the bonding characteristics of precious and refractory HEAs, while the s-band filling is effective in determining the magnetic moments. This unusual behavior stems from the unique chemical and magnetic nature of Cr atoms and is essentially due to the localized and transferred itinerant electrons. Our study establishes a fundamental physical picture of chemical bonding and magnetic interactions of HEAs and provides a rational guidance for designing advanced structural alloys.

preprint2020arXiv

Correlating surface energy with adsorption energy by means of intrinsic characteristics of substrates

Surface energy is fundamental in controlling surface properties and surface-driven processes like heterogeneous catalysis, as adsorption energy is. It is thus crucial to establish an effective scheme to determine surface energy and its relation with adsorption energy. Herein, we propose a model to quantify the effects of the intrinsic characteristics of materials on the material-dependent property and anisotropy of surface energy, based on the period number and group number of bulk atoms, and the valence-electron number, electronegativity and coordination of surface atoms. Our scheme holds for elemental crystals in both solid and liquid phases, body-centered-tetragonal intermetallics, fluorite-structure intermetallics, face-centered-cubic intermetallics, Mg-based surface alloys and semiconductor compounds, which further identifies a quantitative relation between surface energy and adsorption energy and rationalizes the material-dependent error of first-principle methods in calculating the two quantities. This model is predictive with easily accessible parameters and thus allows the rapid screening of materials for targeted properties.

preprint2015arXiv

A Novel Methodologyof Router-To-ASMapping inspired by Community Discovery

In the last decade many works has been done on the Internet topology at router or autonomous system (AS) level. As routers is the essential composition of ASes while ASes dominate the behavior of their routers. It is no doubt that identifying the affiliation between routers and ASes can let us gain a deeper understanding on the topology. However, the existing methods that assign a router to an AS just based on the origin ASes of its IP addresses, which does not make full use of information in our hand. In this paper, we propose a methodology to assign routers to their owner ASes based on community discovery tech. First, we use the origin ASes information along with router-pairs similarities to construct a weighted router level topology, secondly, for enormous topology data (more than 2M nodes and 19M edges) from CAIDA ITDK project, we propose a fast hierarchy clustering which time and space complex are both linear to do ASes community discovery, last we do router-to-AS mapping based on these ASes communities. Experiments show that combining with ASes communities our methodology discovers, the best accuracy rate of router-to-AS mapping can reach to 82.62%, which is drastically high comparing to prior works that stagnate on 65.44%.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint