Source author record

Hao Shi

Hao Shi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Robotics cond-mat.mes-hall cond-mat.str-el

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Electromagnetic responses of bilayer excitonic insulators: from exciton London equations to dipole and inverse dipole Hall effects

We develop a microscopic theory of the linear electromagnetic response of bilayer excitonic insulators relevant to electron-hole double-layer systems. Using a self-consistent Hartree-Fock description of the excitonic ground state and time-dependent Hartree-Fock for its dynamics, we compute the collective mode spectrum and the full first-order response to layer-symmetric (charge) and layer-antisymmetric (exciton) gauge fields. At zero magnetic field, we find that two gapped plasmon modes dominate the long-wavelength charge response, while the exciton channel is governed by a linearly dispersing phase (Goldstone) mode. From the Goldstone-dominated kernel we derive a London-like equation for the exciton condensate, demonstrating non-dissipative acceleration under a layer-antisymmetric electric field, which we identify as the direct evidence of exciton superfluid; in contrast, a normal exciton fluid shows a Drude-like, dissipative response. In a perpendicular magnetic field, the Goldstone mode develops a magnetic-roton minimum that signals an instability toward a finite-momentum stripe-ordered excitonic insulator. Besides, charge and exciton motions become coupled under the field, giving rise to dipole and inverse dipole Hall effects in which a charge (exciton) bias induces a transverse exciton (charge) current. As a manifestation of the exciton superfluidity, these mixed Hall responses remain finite even in the DC limit. Our findings provide concrete targets for microwave and transport probes of bilayer exciton superfluidity.

preprint2026arXiv

Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems

The rapid advancement of autonomous systems, including self-driving vehicles and drones, has intensified the need to forge true Spatial Intelligence from multi-modal onboard sensor data. While foundation models excel in single-modal contexts, integrating their capabilities across diverse sensors like cameras and LiDAR to create a unified understanding remains a formidable challenge. This paper presents a comprehensive framework for multi-modal pre-training, identifying the core set of techniques driving progress toward this goal. We dissect the interplay between foundational sensor characteristics and learning strategies, evaluating the role of platform-specific datasets in enabling these advancements. Our central contribution is the formulation of a unified taxonomy for pre-training paradigms: ranging from single-modality baselines to sophisticated unified frameworks that learn holistic representations for advanced tasks like 3D object detection and semantic occupancy prediction. Furthermore, we investigate the integration of textual inputs and occupancy representations to facilitate open-world perception and planning. Finally, we identify critical bottlenecks, such as computational efficiency and model scalability, and propose a roadmap toward general-purpose multi-modal foundation models capable of achieving robust Spatial Intelligence for real-world deployment.

preprint2026arXiv

Seeing Together: Multi-Robot Cooperative Egocentric Spatial Reasoning with Multimodal Large Language Models

Multimodal Large Language Models (MLLMs) have made substantial progress in egocentric video understanding, but their ability to reason cooperatively from multiple embodied viewpoints remains largely unexplored. We study this problem through multi-robot cooperative dynamic spatial reasoning, where a model must answer spatial, temporal, visibility, and coordination questions by integrating synchronized egocentric videos from a team of moving robots. To support this setting, we introduce CoopSR, the first benchmark for this task, together with EgoTeam, a multi-robot egocentric QA dataset. EgoTeam contains 114,227 QA pairs spanning 19 question types, four difficulty tiers, and three team sizes in Habitat and iGibson, along with a real-world test set of around 2,326 QAs collected using two quadruped robots. We further propose SP-CoR (Spectral and Physics-Informed Cooperative Reasoner), an MLLM framework for fine-grained cooperative spatial reasoning. SP-CoR combines dynamics-aware multi-robot frame sampling, spectral- and physics-guided view fusion, and physics-aligned prompt distillation, enabling the model to benefit from privileged robot-pose supervision during training while requiring only egocentric videos at test time. Across 22 MLLM baselines, SP-CoR consistently improves cooperative reasoning, outperforming the strongest fine-tuned baseline by +3.87% on Habitat and +7.12% on iGibson. It also shows stronger generalization to unseen team sizes and real-world robot tests. Code can be found at https://github.com/KPeng9510/seeing-together.git.

preprint2026arXiv

SpatialActor: Exploring Disentangled Spatial Representations for Robust Robotic Manipulation

Robotic manipulation requires precise spatial understanding to interact with objects in the real world. Point-based methods suffer from sparse sampling, leading to the loss of fine-grained semantics. Image-based methods typically feed RGB and depth into 2D backbones pre-trained on 3D auxiliary tasks, but their entangled semantics and geometry are sensitive to inherent depth noise in real-world that disrupts semantic understanding. Moreover, these methods focus on high-level geometry while overlooking low-level spatial cues essential for precise interaction. We propose SpatialActor, a disentangled framework for robust robotic manipulation that explicitly decouples semantics and geometry. The Semantic-guided Geometric Module adaptively fuses two complementary geometry from noisy depth and semantic-guided expert priors. Also, a Spatial Transformer leverages low-level spatial cues for accurate 2D-3D mapping and enables interaction among spatial features. We evaluate SpatialActor on multiple simulation and real-world scenarios across 50+ tasks. It achieves state-of-the-art performance with 87.4% on RLBench and improves by 13.9% to 19.4% under varying noisy conditions, showing strong robustness. Moreover, it significantly enhances few-shot generalization to new tasks and maintains robustness under various spatial perturbations. Project Page: https://shihao1895.github.io/SpatialActor

Hao Shi

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Electromagnetic responses of bilayer excitonic insulators: from exciton London equations to dipole and inverse dipole Hall effects

Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems

Seeing Together: Multi-Robot Cooperative Egocentric Spatial Reasoning with Multimodal Large Language Models

SpatialActor: Exploring Disentangled Spatial Representations for Robust Robotic Manipulation