Source author record

Siyuan Song

Siyuan Song appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.IV eess.SP physics.optics

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

DeTrack: A Benchmark and Altitude-Aware Dual World Model for Drone-embodied Tracking

Aerial object tracking has broad applications in public safety, emergency rescue, wildlife monitoring, and related fields. However, existing aerial tracking benchmarks are mainly based on passive 2D video sequences captured from fixed camera locations or predefined flight paths, where drones are treated as passive cameras rather than embodied agents that actively perceive, interact, and control their motion in dynamic 3D scenes. In this paper, we define a new drone-embodied tracking task, termed DeTrack, which requires a drone to track a target in interactive 3D environments using online egocentric observations and active flight control in a closed loop. We build a large-scale benchmark containing 11,368 target trajectories across diverse scenes, rendering conditions, semantic regions, and moving distractors, together with evaluation metrics for target visibility, tracking accuracy, and trajectory success. We further propose AaDWorlds, an altitude-aware dual world model framework for drone-embodied tracking. AaDWorlds consists of an altitude-aware perception module and dual world models that imagine future states under both high- and low-altitude regimes. By combining pseudo altitude-aware observations and imagined future states, AaDWorlds alleviates the intrinsic altitude-mediated contradiction between target visibility and flight safety. Experiments on the DeTrack benchmark demonstrate that AaDWorlds improves closed-loop tracking performance across all evaluation metrics.

preprint2022arXiv

Optimizing Ghost Imaging via Analysis and Design of Speckle Patterns

We study the influence rules of the speckle size of light source on ghost imaging, and propose a new type of speckle patterns to improve the quality of ghost imaging. The results show that the image quality will first increase and then decrease with the increase of the speckle size, and there is an optimal speckle size for a specific object. Moreover, by using the random distribution of speckle positions, a new type of displacement speckle patterns is designed, and the imaging quality is better than that of the random speckle patterns. These results are of great significances for finding the best speckle patterns suitable for detecting targets, which further promotes the practical applications of ghost imaging.

preprint2020arXiv

A Software-Based Approach for Acoustical Modeling of Construction Job Sites with Multiple Operational Machines

Several studies have been conducted to automatically recognize activities of construction equipment using their generated sound patterns. Most of these studies are focused on single-machine scenarios under controlled environments. However, real construction job sites are more complex and often consist of several types of equipment with different orientations, directions, and locations working simultaneously. The current state-of-research for recognizing activities of multiple machines on a job site is hardware-oriented, on the basis of using microphone arrays (i.e., several single microphones installed on a board under specific geometric layout) and beamforming principles for classifying sound directions for each machine. While effective, the common hardware-approach has limitations and using microphone arrays is not always a feasible option at ordinary job sites. In this paper, the authors proposed a software-oriented approach using Deep Neural Networks (DNNs) and Time-Frequency Masks (TFMs) to address this issue. The proposed method requires using single microphones, as the sound sources could be differentiated by training a DNN. The presented approach has been tested and validated under simulated job site conditions where two machines operated simultaneously. Results show that the average accuracy for soft TFM is 38% higher than binary TFM.