Source author record

Haonan Zhao

Haonan Zhao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computer Vision Machine Learning cond-mat.mtrl-sci cond-mat.str-el

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Decoupling Amplitude and Phase Attention in Frequency Domain for RGB-Event based Visual Object Tracking

Existing RGB-Event visual object tracking approaches primarily rely on conventional feature-level fusion, failing to fully exploit the unique advantages of event cameras. In particular, the high dynamic range and motion-sensitive nature of event cameras are often overlooked, while low-information regions are processed uniformly, leading to unnecessary computational overhead for the backbone network. To address these issues, we propose a novel tracking framework that performs early fusion in the frequency domain, enabling effective aggregation of high-frequency information from the event modality. Specifically, RGB and event modalities are transformed from the spatial domain to the frequency domain via the Fast Fourier Transform, with their amplitude and phase components decoupled. High-frequency event information is selectively fused into RGB modality through amplitude and phase attention, enhancing feature representation while substantially reducing backbone computation. In addition, a motion-guided spatial sparsification module leverages the motion-sensitive nature of event cameras to capture the relationship between target motion cues and spatial probability distribution, filtering out low-information regions and enhancing target-relevant features. Finally, a sparse set of target-relevant features is fed into the backbone network for learning, and the tracking head predicts the final target position. Extensive experiments on three widely used RGB-Event tracking benchmark datasets, including FE108, FELT, and COESOT, demonstrate the high performance and efficiency of our method. The source code of this paper will be released on https://github.com/Event-AHU/OpenEvTracking

preprint2026arXiv

DriveCtrl: Conditioned Sim-to-Real Driving Video Generation

Large-scale labelled driving video data is essential for training autonomous driving systems. Although simulation offers scalable and fully annotated data, the domain gap between synthetic and real-world driving videos significantly limits its utility for downstream deployment. Existing video generation methods are not well-suited for this task, as they fail to simultaneously preserve scene structure, object dynamics, temporal consistency, and visual realism, all of which are critical for maintaining annotation validity in generated data. In this paper, we present DriveCtrl, a depth-conditioned controllable sim-to-real video generation framework for realistic driving video synthesis. Built upon a pretrained video foundation model, DriveCtrl introduces a structure-aware adapter that enables depth-guided generation while preserving the scene layout and motion patterns of the source simulation, producing temporally coherent driving videos that remain aligned with the original simulated sequences. We further introduce a scalable data generation pipeline that transforms simulator videos into realistic driving footage matching the visual style of a target real-world dataset. The pipeline supports three conditioning signals: structural depth, reference-dataset style, and text prompts, while preserving frame-level annotations for downstream perception tasks. To better assess this task, we propose a driving-domain-specific knowledge-informed evaluation metric called Driving Video Realism Score (DVRS) that assesses the realism of generated videos. Experiments demonstrate that DriveCtrl consistently outperforms the base model and competing alternatives in realism, temporal quality, and perception task performance, substantially narrowing the sim-to-real gap for driving video generation.

preprint2022arXiv

A three-stage magnetic phase transition revealed in ultrahigh-quality van der Waals magnet CrSBr

van der Waals (vdW) magnets are receiving ever-growing attention nowadays due to their significance in both fundamental research on low-dimensional magnetism and potential applications in spintronic devices. High crystalline quality of vdW magnets is key for maintaining intrinsic magnetic and electronic properties, especially when exfoliated down to the 2D limit. Here, ultrahigh-quality air-stable vdW CrSBr crystals are synthesized using the direct vapor-solid synthesis method. The high single crystallinity and spatial homogeneity have been thoroughly evidenced at length scales from sub-mm to atomic resolution by X-ray diffraction, second harmonic generation, and scanning transmission electron microscopy. More importantly, specific heat measurements of these ultrahigh quality CrSBr crystals show three thermodynamic anomalies at 185K, 156K, and 132K, revealing a stage-by-stage development of the magnetic order upon cooling, which is also corroborated with the magnetization and transport results. Our ultrahigh-quality CrSBr can further be exfoliated down to monolayers and bilayers easily, paving the way to integrate them into heterostructures for spintronic and magneto-optoelectronic applications.

preprint2022arXiv

Lifelong Personal Context Recognition

We focus on the development of AIs which live in lifelong symbiosis with a human. The key prerequisite for this task is that the AI understands - at any moment in time - the personal situational context that the human is in. We outline the key challenges that this task brings forth, namely (i) handling the human-like and ego-centric nature of the the user's context, necessary for understanding and providing useful suggestions, (ii) performing lifelong context recognition using machine learning in a way that is robust to change, and (iii) maintaining alignment between the AI's and human's representations of the world through continual bidirectional interaction. In this short paper, we summarize our recent attempts at tackling these challenges, discuss the lessons learned, and highlight directions of future research. The main take-away message is that pursuing this project requires research which lies at the intersection of knowledge representation and machine learning. Neither technology can achieve this goal without the other.

Haonan Zhao

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Decoupling Amplitude and Phase Attention in Frequency Domain for RGB-Event based Visual Object Tracking

DriveCtrl: Conditioned Sim-to-Real Driving Video Generation

A three-stage magnetic phase transition revealed in ultrahigh-quality van der Waals magnet CrSBr

Lifelong Personal Context Recognition