Researcher profile

Yiming Sun

Yiming Sun contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

CtrlFuse: Mask-Prompt Guided Controllable Infrared and Visible Image Fusion

Infrared and visible image fusion generates all-weather perception-capable images by combining complementary modalities, enhancing environmental awareness for intelligent unmanned systems. Existing methods either focus on pixel-level fusion while overlooking downstream task adaptability or implicitly learn rigid semantics through cascaded detection/segmentation models, unable to interactively address diverse semantic target perception needs. We propose CtrlFuse, a controllable image fusion framework that enables interactive dynamic fusion guided by mask prompts. The model integrates a multi-modal feature extractor, a reference prompt encoder (RPE), and a prompt-semantic fusion module (PSFM). The RPE dynamically encodes task-specific semantic prompts by fine-tuning pre-trained segmentation models with input mask guidance, while the PSFM explicitly injects these semantics into fusion features. Through synergistic optimization of parallel segmentation and fusion branches, our method achieves mutual enhancement between task performance and fusion quality. Experiments demonstrate state-of-the-art results in both fusion controllability and segmentation accuracy, with the adapted task branch even outperforming the original segmentation model.

preprint2026arXiv

DIFF-MF: A Difference-Driven Channel-Spatial State Space Model for Multi-Modal Image Fusion

Multi-modal image fusion aims to integrate complementary information from multiple source images to produce high-quality fused images with enriched content. Although existing approaches based on state space model have achieved satisfied performance with high computational efficiency, they tend to either over-prioritize infrared intensity at the cost of visible details, or conversely, preserve visible structure while diminishing thermal target salience. To overcome these challenges, we propose DIFF-MF, a novel difference-driven channel-spatial state space model for multi-modal image fusion. Our approach leverages feature discrepancy maps between modalities to guide feature extraction, followed by a fusion process across both channel and spatial dimensions. In the channel dimension, a channel-exchange module enhances channel-wise interaction through cross-attention dual state space modeling, enabling adaptive feature reweighting. In the spatial dimension, a spatial-exchange module employs cross-modal state space scanning to achieve comprehensive spatial fusion. By efficiently capturing global dependencies while maintaining linear computational complexity, DIFF-MF effectively integrates complementary multi-modal features. Experimental results on the driving scenarios and low-altitude UAV datasets demonstrate that our method outperforms existing approaches in both visual quality and quantitative evaluation.

preprint2026arXiv

Orbit equivalence of Cantor minimal systems

In this paper we study the descriptive complexity of the topological orbit equvalence relation for some Borel classes of Cantor minimal systems. Specifically, we study the Borel class of all Cantor minimal systems with only finitely many ergodic measures, and show that the orbit equivalence for this class is Borel bireducible with the equivalence relation $=^+$. We prove the same for the subclass of regular $\{0, 1\}$-Toeplitz subshifts or that of the uniquely ergodic minimal subshifts. We also study the orbit equivalence for the Borel class of minimal subshifts of finite topological rank. Denote by $R_n$ the orbit equivalence for minimal subshifts of topological rank $n\geq 2$. We prove that for any $n\geq 2$, $R_n$ is virtually countable, i.e., Borel reducible to a countable Borel equivalence relation. Moreover, $R_2$ is virtually amenable. On the other hand, $R_n$ is not smooth when $n\geq 2$, is not virtually hyperfinite when $n\geq 4$, and is not virtually treeable when $n\geq 5$. For any $n\geq 2$, our contructions yield uniquely ergodic minimal subshifts of topological rank exactly $n$.

preprint2025arXiv

GCRank: A Generative Contextual Comprehension Paradigm for Takeout Ranking Model

The ranking stage serves as the central optimization and allocation hub in advertising systems, governing economic value distribution through eCPM and orchestrating the user-centric blending of organic and advertising content. Prevailing ranking models often rely on fragmented modules and hand-crafted features, limiting their ability to interpret complex user intent. This challenge is further amplified in location-based services such as food delivery, where user decisions are shaped by dynamic spatial, temporal, and individual contexts. To address these limitations, we propose a novel generative framework that reframes ranking as a context comprehension task, modeling heterogeneous signals in a unified architecture. Our architecture consists of two core components: the Generative Contextual Encoder (GCE) and the Generative Contextual Fusion (GCF). The GCE comprises three specialized modules: a Personalized Context Enhancer (PCE) for user-specific modeling, a Collective Context Enhancer (CCE) for group-level patterns, and a Dynamic Context Enhancer (DCE) for real-time situational adaptation. The GCF module then seamlessly integrates these contextual representations through low-rank adaptation. Extensive experiments confirm that our method achieves significant gains in critical business metrics, including click-through rate and platform revenue. We have successfully deployed our method on a large-scale food delivery advertising platform, demonstrating its substantial practical impact. This work pioneers a new perspective on generative recommendation and highlights its practical potential in industrial advertising systems.

preprint2025arXiv

Observation of robust one-dimensional edge channels in a three-dimensional quantum spin Hall insulator

Topologically protected edge channels show prospects for quantum devices. They have been found experimentally in two-dimensional (2D) quantum spin Hall insulators (QSHIs), weak topological insulators and higher-order topological insulators (HOTIs), but the number of materials realizing these topologies is still quite limited. Here, we provide evidence for topological edge states within a novel topology named three-dimensional (3D) QSHIs. Its topology originates solely from a nonzero $S_z$ spin Chern number for each $k_z$ plane of the crystal and is realized in bulk $α$-Bi$_4$I$_4$ with trivial symmetry indicators, as we show by density functional theory calculations. We experimentally observe the related edge states at each type of monolayer and bilayer step of this material by scanning tunneling microscopy. Consistently, the edge states are neither interrupted, nor backscattered by defects at the step edges corroborating their helical character as expected from the nontrivial topology. Furthermore, two individual edge channels are directly observed at bilayer steps without visible interaction gap opening, demonstrating the robustness of these edge modes against vertical stacking. Our results establish $α$-Bi$_4$I$_4$ as the first material realization of a 3D QSHI whose definition goes beyond the scope of topological symmetry indicators, and provide a pathway for realizing nearly-quantized spin Hall conductivity per unit cell in a bulk crystal.

preprint2022arXiv

Online Active Regression

Active regression considers a linear regression problem where the learner receives a large number of data points but can only observe a small number of labels. Since online algorithms can deal with incremental training data and take advantage of low computational cost, we consider an online extension of the active regression problem: the learner receives data points one by one and immediately decides whether it should collect the corresponding labels. The goal is to efficiently maintain the regression of received data points with a small budget of label queries. We propose novel algorithms for this problem under $\ell_p$ loss where $p\in[1,2]$. To achieve a $(1+ε)$-approximate solution, our proposed algorithms only require $\tilde{\mathcal{O}}(ε^{-1} d \log(nκ))$ queries of labels, where $n$ is the number of data points and $κ$ is a quantity, called the condition number, of the data points. The numerical results verify our theoretical results and show that our methods have comparable performance with offline active regression algorithms.

preprint2020arXiv

Multi-Drone based Single Object Tracking with Agent Sharing Network

Drone equipped with cameras can dynamically track the target in the air from a broader view compared with static cameras or moving sensors over the ground. However, it is still challenging to accurately track the target using a single drone due to several factors such as appearance variations and severe occlusions. In this paper, we collect a new Multi-Drone single Object Tracking (MDOT) dataset that consists of 92 groups of video clips with 113,918 high resolution frames taken by two drones and 63 groups of video clips with 145,875 high resolution frames taken by three drones. Besides, two evaluation metrics are specially designed for multi-drone single object tracking, i.e. automatic fusion score (AFS) and ideal fusion score (IFS). Moreover, an agent sharing network (ASNet) is proposed by self-supervised template sharing and view-aware fusion of the target from multiple drones, which can improve the tracking accuracy significantly compared with single drone tracking. Extensive experiments on MDOT show that our ASNet significantly outperforms recent state-of-the-art trackers.