Source author record

Jie Ma

Jie Ma appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computer Vision astro-ph.EP cond-mat.str-el cond-mat.supr-con eess.SP

Catalog footprint

What is connected

6works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems

LLM-based autonomous agents have demonstrated strong capabilities in reasoning, planning, and tool use, yet remain limited when tasks require sustained coordination across roles, tools, and environments. Multi-agent systems address this through structured collaboration among specialized agents, but tighter coordination also amplifies a less explored risk: errors can propagate across agents and interaction rounds, producing failures that are difficult to diagnose and rarely translate into structural self-improvement. Existing surveys cover individual agent capabilities, multi-agent collaboration, or agent self-evolution separately, leaving the causal dependencies among them unexamined. This survey provides a unified review organized around four causally linked stages, which we term the LIFE progression: Lay the capability foundation, Integrate agents through collaboration, Find faults through attribution, and Evolve through autonomous self-improvement. For each stage, we provide systematic taxonomies and formally characterize the dependencies between adjacent stages, revealing how each stage both depends on and constrains the next. Beyond synthesizing existing work, we identify open challenges at stage boundaries and propose a cross-stage research agenda for closed-loop multi-agent systems capable of continuously diagnosing failures, reorganizing structures, and refining agent behaviors, extending current coordination frameworks toward more self-organizing forms of collective intelligence. By bridging these previously fragmented research threads, this survey aims to offer both a systematic reference and a conceptual roadmap toward autonomous, self-improving multi-agent intelligence.

preprint2026arXiv

MoCha:End-to-End Video Character Replacement without Structural Guidance

Controllable video character replacement with a user-provided identity remains a challenging problem due to the lack of paired video data. Prior works have predominantly relied on a reconstruction-based paradigm that requires per-frame segmentation masks and explicit structural guidance (e.g., skeleton, depth). This reliance, however, severely limits their generalizability in complex scenarios involving occlusions, character-object interactions, unusual poses, or challenging illumination, often leading to visual artifacts and temporal inconsistencies. In this paper, we propose MoCha, a pioneering framework that bypasses these limitations by requiring only a single arbitrary frame mask. To effectively adapt the multi-modal input condition and enhance facial identity, we introduce a condition-aware RoPE and employ an RL-based post-training stage. Furthermore, to overcome the scarcity of qualified paired-training data, we propose a comprehensive data construction pipeline. Specifically, we design three specialized datasets: a high-fidelity rendered dataset built with Unreal Engine 5 (UE5), an expression-driven dataset synthesized by current portrait animation techniques, and an augmented dataset derived from existing video-mask pairs. Extensive experiments demonstrate that our method substantially outperforms existing state-of-the-art approaches. We will release the code to facilitate further research. Please refer to our project page for more details: orange-3dv-team.github.io/MoCha

preprint2026arXiv

Positioning-Aided Channel Estimation for Multi-LEO Satellite Cooperative Beamforming

We investigate a multi-low Earth orbit (LEO) satellite system that simultaneously provides positioning and communication services to terrestrial user terminals. To address the challenges of accurately acquiring channel state information in LEO satellite systems, we propose a novel two-timescale positioning-aided channel estimation framework, exploiting the distinct variation rates of position-related parameters and channel gains inherent in LEO satellite channels. Using the misspecified Cramér-Rao bound (MCRB) theory, we systematically analyze positioning performance under practical imperfections, such as inter-satellite clock bias and carrier frequency offset. Furthermore, we theoretically demonstrate how position information derived from downlink positioning can enhance uplink channel estimation accuracy, even in the presence of positioning errors, through an MCRB-based analysis. To address the limited link budgets and communication rates of single-satellite communication, we develop a multi-LEO cooperative beamforming strategy for downlink transmission that leverages cluster-wise satellite cooperation while maintaining reduced complexity. Theoretical analyses and numerical results confirm the effectiveness of the proposed framework in facilitating high-precision downlink positioning under practical imperfections, facilitating uplink channel estimation, and enabling efficient downlink communication.

preprint2026arXiv

SparseOccVLA: Bridging Occupancy and Vision-Language Models via Sparse Queries for Unified 4D Scene Understanding and Planning

In autonomous driving, Vision Language Models (VLMs) excel at high-level reasoning , whereas semantic occupancy provides fine-grained details. Despite significant progress in individual fields, there is still no method that can effectively integrate both paradigms. Conventional VLMs struggle with token explosion and limited spatiotemporal reasoning, while semantic occupancy provides a unified, explicit spatial representation but is too dense to integrate efficiently with VLMs. To address these challenges and bridge the gap between VLMs and occupancy, we propose SparseOccVLA, a novel vision-language-action model that unifies scene understanding, occupancy forecasting, and trajectory planning powered by sparse occupancy queries. Starting with a lightweight Sparse Occupancy Encoder, SparseOccVLA generates compact yet highly informative sparse occupancy queries that serve as the single bridge between vision and language. These queries are aligned into the language space and reasoned by the LLM for unified scene understanding and future occupancy forecasting. Furthermore, we introduce an LLM-guided Anchor-Diffusion Planner featuring decoupled anchor scoring and denoising, as well as cross-model trajectory-condition fusion. SparseOccVLA achieves a 7% relative improvement in CIDEr over the state-of-the-art on OmniDrive-nuScenes, a 0.5 increase in mIoU score on Occ3D-nuScenes, and sets state-of-the-art open-loop planning metric on nuScenes benchmark, demonstrating its strong holistic capability.

preprint2025arXiv

Origins of spontaneous magnetic fields in Sr$_2$RuO$_4$

The nature of the broken time reversal symmetry (BTRS) state in Sr$_2$RuO$_4$ remains elusive, and its relation to superconductivity remains controversial. There are various universal predictions for the BTRS state when it is associated with a multicomponent superconducting order parameter. In particular, in the BTRS superconducting state, spontaneous fields appear around crystalline defects, impurities, superconducting domain walls and sample surfaces. However, this phenomenon has not yet been experimentally demonstrated for any BTRS superconductor. Here, we aimed to verify these predictions for Sr$_2$RuO$_4$ by performing muon spin relaxation ($μ$SR) measurements on Sr$_{2-y}$La$_{y}$RuO$_4$ single crystals at ambient pressure and stoichiometric Sr$_2$RuO$_4$ under hydrostatic pressure. The study allowed us to conclude that spontaneous fields in the BTRS superconducting state of Sr$_2$RuO$_4$ appear around non-magnetic inhomogeneities and, at the same time, decrease with the suppression of $T_{\rm c}$. The observed behaviour is consistent with the prediction for multicomponent BTRS superconductivity in Sr$_2$RuO$_4$. The results of the work are relevant to understanding BTRS superconductivity in general, as they demonstrate, for the first time, the relationship among the superconducting order parameter, the BTRS transition, and crystal-structure inhomogeneities.

preprint2025arXiv

The Flying Saucer edge-on disc's Near Infrared silhouette revealed by the JWST JEDIce program

Edge-on discs offer a unique opportunity to probe radial and vertical dust and gas distributions in the protoplanetary phase. This study aims to investigate the distribution of micron-sized dust particles in the Flying Saucer (BKLT J162813-243139) in Rho Ophiuchi, leveraging the unique observational conditions of a bright infrared background that enables the edge-on disc to be seen in both silhouette and scattered light at certain, specific wavelengths. As part of the JWST Edge-on Disc Ice program ('JEDIce'), we use NIRSpec IFU observations of the Flying Saucer, serendipitously observed against a PAH-emitting background, to constrain the dust distribution and grain sizes through radiative transfer modelling. Observation of the Flying Saucer in silhouette at 3.29 microns reveals that the midplane radial extent of small dust grains is ~235 au, larger than the large-grain disc extent previously determined to be 190 au from millimetre data. The scattered light observed in emission probes micron sized icy grains at large vertical distances above the midplane. The vertical extent of the disc silhouette is similar at visible, near-IR, and mid-IR wavelengths, corroborating the conclusion that dust settling is inefficient for grains as large as tens of microns, vertically and radially.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint