Source author record

Haoyang Zhang

Haoyang Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Computation and Language Robotics astro-ph.HE eess.SY Human-Computer Interaction Multimedia physics.app-ph physics.optics Systems and Control

Catalog footprint

What is connected

14works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Analysis of Full Order Observer Based Control for Spacecraft Orbit Maneuver Trajectory Under Solar Radiation Pressure

This study investigates the application of modern control theory to improve the precision of spacecraft orbit maneuvers in low Earth orbit under the influence of solar radiation pressure. A full order observer based feedback control framework is developed to estimate system states and compensate for external disturbances during the trajectory correction phase following main engine cut off. The maneuver trajectory is generated using Lambert guidance, while the observer based controller ensures accurate tracking of the target orbit despite SRP perturbations. The effectiveness of the proposed design is assessed through stability, observability, and controllability analyses. Stability is validated by step-response simulations and eigenvalue distributions of the system dynamics. Observability is demonstrated through state matrix rank analysis, confirming complete state estimation. Controllability is verified using state feedback rank conditions and corresponding control performance plots. Comparative simulations highlight that, in contrast to uncontrolled or conventional control cases, the observer based controller achieves improved trajectory accuracy and robust disturbance rejection with moderate control effort. These findings indicate that observer-based feedback control offers a reliable and scalable solution for precision orbital maneuvering in LEO missions subject to environmental disturbances.

preprint2026arXiv

Boosting Omni-Modal Language Models: Staged Post-Training with Visually Debiased Evaluation

Omni-modal language models are intended to jointly understand audio, visual inputs, and language, but benchmark gains can be inflated when visual evidence alone is enough to answer a query. We study whether current omni-modal benchmarks separate visual shortcuts from genuine audio-visual-language evidence integration, and how post-training behaves under a visually debiased evaluation setting. We audit nine omni-modal benchmarks with visual-only probing, remove visually solvable queries, and retain full subsets when filtering is undefined or would make comparisons unstable. This yields OmniClean, a cleaned evaluation view with 8,551 retained queries from 16,968 audited queries. On OmniClean, we evaluate OmniBoost, a three-stage post-training recipe based on Qwen2.5-Omni-3B: mixed bi-modal SFT, mixed-modality RLVR, and SFT on self-distilled data. Balanced bi-modal SFT gives limited and uneven gains, RLVR provides the first broad improvement, and self-distillation reshapes the benchmark profile. After SFT on self-distilled data, the 3B model reaches performance comparable to, and in aggregate slightly above, Qwen3-Omni-30B-A3B-Instruct without using a stronger omni-modal teacher. These results show that omni-modal progress is easier to interpret when evaluation controls visual leakage, and that small omni-modal models can benefit from staged post-training with self-distilled omni-query supervision. Project page: https://cheliu-computation.github.io/omni/

preprint2026arXiv

DepFlow: Disentangled Speech Generation to Mitigate Semantic Bias in Depression Detection

Speech is a scalable and non-invasive biomarker for early mental health screening. However, widely used depression datasets like DAIC-WOZ exhibit strong coupling between linguistic sentiment and diagnostic labels, encouraging models to learn semantic shortcuts. As a result, model robustness may be compromised in real-world scenarios, such as Camouflaged Depression, where individuals maintain socially positive or neutral language despite underlying depressive states. To mitigate this semantic bias, we propose DepFlow, a three-stage depression-conditioned text-to-speech framework. First, a Depression Acoustic Encoder learns speaker- and content-invariant depression embeddings through adversarial training, achieving effective disentanglement while preserving depression discriminability (ROC-AUC: 0.693). Second, a flow-matching TTS model with FiLM modulation injects these embeddings into synthesis, enabling control over depressive severity while preserving content and speaker identity. Third, a prototype-based severity mapping mechanism provides smooth and interpretable manipulation across the depression continuum. Using DepFlow, we construct a Camouflage Depression-oriented Augmentation (CDoA) dataset that pairs depressed acoustic patterns with positive/neutral content from a sentiment-stratified text bank, creating acoustic-semantic mismatches underrepresented in natural data. Evaluated across three depression detection architectures, CDoA improves macro-F1 by 9%, 12%, and 5%, respectively, consistently outperforming conventional augmentation strategies in depression Detection. Beyond enhancing robustness, DepFlow provides a controllable synthesis platform for conversational systems and simulation-based evaluation, where real clinical data remains limited by ethical and coverage constraints.

preprint2026arXiv

Detection of a puzzling dual-superorbital hard X-ray modulation in the X-ray binary GX 301-2

The superorbital modulations (SMs) observed in wind-fed X-ray binaries remain a puzzling phenomenon in astrophysics. To investigate this behavior observationally, we analyzed the long-term hard X-ray light curve from the Swift/BAT 157-Month Hard X-ray Survey in X-ray binary GX 301-2. Using three timing analysis methods--the Lomb-Scargle periodogram, the weighted wavelet Ztransform, and Gaussian processes--we identify a rare dual-SM behavior in this source: the 115-day modulation exceeds the 5$σ$ global significance level, whereas the 65-day signal only marginally reaches the 4$σ$ level. Because the 115-day period is more consistent with the previously reported linear relation between orbital and superorbital periods, we interpret 115 days as the actual superorbital period, while the weaker and less stable 65-day period is its beat modulation with the orbital period.By assessing the applicability of different physical scenarios to our results, we suggest that this dual-SM behavior is most plausibly associated with corotating interaction regions (CIRs) in the stellar wind. This framework can also account for the observed linear orbital-superorbital relation, despite the unclear physical mechanism that sets the apparent ratio between the CIR and orbital periods across sources. Further long-term monitoring of this system, together with continued theoretical development of the CIR scenario, will be essential for clarifying the origin of wind-fed SMs.

preprint2026arXiv

UST-Hand: An Uncertainty-aware Spatiotemporal Point Cloud Interaction Network for 3D Self-supervised Hand Pose Estimation

Manually annotating accurate 3D hand poses is extremely time-consuming and labor-intensive. Existing self-supervised hand pose estimation methods leverage the discrepancy between input images and rendered outputs, or multi-view consistency constraints, as the driving force to optimize networks and progressively refine pose accuracy. However, these methods are highly susceptible to noisy pseudo-labels and overlook the importance of fully exploiting fine-grained spatial correlations, which undermines the stability of model training. To address these issues, we propose UST-Hand, a self-supervised learning framework that estimates uncertainty distribution of hand pose and constructs a probabilistic point cloud feature space, which enables the complex spatiotemporal relationship modeling. UST-Hand employs a conditional normalizing flow model to capture hand pose distributions and samples diverse hypotheses, facilitating robust learning under noisy pseudo-labels supervision with enhanced stability. These multi-hypothesis are mapped to a unified probabilistic 3D point cloud space for multi-view and temporal feature interaction, comprehensively exploring hand motion patterns and fine-grained spatial correlations. Extensive experiments on three challenging datasets demonstrate that UST-Hand achieves state-of-the-art performance, outperforming existing self-supervised methods by up to 37.8% in Mean Per Vertex Position Error (MPVPE).

preprint2023arXiv

InsPro: Propagating Instance Query and Proposal for Online Video Instance Segmentation

Video instance segmentation (VIS) aims at segmenting and tracking objects in videos. Prior methods typically generate frame-level or clip-level object instances first and then associate them by either additional tracking heads or complex instance matching algorithms. This explicit instance association approach increases system complexity and fails to fully exploit temporal cues in videos. In this paper, we design a simple, fast and yet effective query-based framework for online VIS. Relying on an instance query and proposal propagation mechanism with several specially developed components, this framework can perform accurate instance association implicitly. Specifically, we generate frame-level object instances based on a set of instance query-proposal pairs propagated from previous frames. This instance query-proposal pair is learned to bind with one specific object across frames through conscientiously developed strategies. When using such a pair to predict an object instance on the current frame, not only the generated instance is automatically associated with its precursors on previous frames, but the model gets a good prior for predicting the same object. In this way, we naturally achieve implicit instance association in parallel with segmentation and elegantly take advantage of temporal clues in videos. To show the effectiveness of our method InsPro, we evaluate it on two popular VIS benchmarks, i.e., YouTube-VIS 2019 and YouTube-VIS 2021. Without bells-and-whistles, our InsPro with ResNet-50 backbone achieves 43.2 AP and 37.6 AP on these two benchmarks respectively, outperforming all other online VIS methods.

preprint2022arXiv

PanopticDepth: A Unified Framework for Depth-aware Panoptic Segmentation

This paper presents a unified framework for depth-aware panoptic segmentation (DPS), which aims to reconstruct 3D scene with instance-level semantics from one single image. Prior works address this problem by simply adding a dense depth regression head to panoptic segmentation (PS) networks, resulting in two independent task branches. This neglects the mutually-beneficial relations between these two tasks, thus failing to exploit handy instance-level semantic cues to boost depth accuracy while also producing sub-optimal depth maps. To overcome these limitations, we propose a unified framework for the DPS task by applying a dynamic convolution technique to both the PS and depth prediction tasks. Specifically, instead of predicting depth for all pixels at a time, we generate instance-specific kernels to predict depth and segmentation masks for each instance. Moreover, leveraging the instance-wise depth estimation scheme, we add additional instance-level depth cues to assist with supervising the depth learning via a new depth loss. Extensive experiments on Cityscapes-DPS and SemKITTI-DPS show the effectiveness and promise of our method. We hope our unified solution to DPS can lead a new paradigm in this area. Code is available at https://github.com/NaiyuGao/PanopticDepth.

preprint2021arXiv

IFoodCloud: A Platform for Real-time Sentiment Analysis of Public Opinion about Food Safety in China

The Internet contains a wealth of public opinion on food safety, including views on food adulteration, food-borne diseases, agricultural pollution, irregular food distribution, and food production issues. In order to systematically collect and analyse public opinion on food safety, we developed IFoodCloud, a platform for the real-time sentiment analysis of public opinion on food safety in China. It collects data from more than 3,100 public sources that can be used to explore public opinion trends, public sentiment, and regional attention differences of food safety incidents. At the same time, we constructed a sentiment classification model using multiple lexicon-based and deep learning-based algorithms integrated with IFoodCloud that provide an unprecedented rapid means of understanding the public sentiment toward specific food safety incidents. Our best model's F1-score achieved 0.9737. Further, three real-world cases are presented to demonstrate the application and robustness. IFoodCloud could be considered a valuable tool for promote scientisation of food safety supervision and risk communication.

preprint2021arXiv

VarifocalNet: An IoU-aware Dense Object Detector

Accurately ranking the vast number of candidate detections is crucial for dense object detectors to achieve high performance. Prior work uses the classification score or a combination of classification and predicted localization scores to rank candidates. However, neither option results in a reliable ranking, thus degrading detection performance. In this paper, we propose to learn an Iou-aware Classification Score (IACS) as a joint representation of object presence confidence and localization accuracy. We show that dense object detectors can achieve a more accurate ranking of candidate detections based on the IACS. We design a new loss function, named Varifocal Loss, to train a dense object detector to predict the IACS, and propose a new star-shaped bounding box feature representation for IACS prediction and bounding box refinement. Combining these two new components and a bounding box refinement branch, we build an IoU-aware dense object detector based on the FCOS+ATSS architecture, that we call VarifocalNet or VFNet for short. Extensive experiments on MS COCO show that our VFNet consistently surpasses the strong baseline by $\sim$2.0 AP with different backbones. Our best model VFNet-X-1200 with Res2Net-101-DCN achieves a single-model single-scale AP of 55.1 on COCO test-dev, which is state-of-the-art among various object detectors.Code is available at https://github.com/hyz-xmaster/VarifocalNet .

preprint2020arXiv

BenchBot: Evaluating Robotics Research in Photorealistic 3D Simulation and on Real Robots

We introduce BenchBot, a novel software suite for benchmarking the performance of robotics research across both photorealistic 3D simulations and real robot platforms. BenchBot provides a simple interface to the sensorimotor capabilities of a robot when solving robotics research problems; an interface that is consistent regardless of whether the target platform is simulated or a real robot. In this paper we outline the BenchBot system architecture, and explore the parallels between its user-centric design and an ideal research development process devoid of tangential robot engineering challenges. The paper describes the research benefits of using the BenchBot system, including: enhanced capacity to focus solely on research problems, direct quantitative feedback to inform research development, tools for deriving comprehensive performance characteristics, and submission formats which promote sharability and repeatability of research outcomes. BenchBot is publicly available (http://benchbot.org), and we encourage its use in the research community for comprehensively evaluating the simulated and real world performance of novel robotic algorithms.

preprint2020arXiv

Enhanced Light-Matter Interactions in Dielectric Nanostructures via Machine Learning Approach

A key concept underlying the specific functionalities of metasurfaces, i.e. arrays of subwavelength nanoparticles, is the use of constituent components to shape the wavefront of the light, on-demand. Metasurfaces are versatile and novel platforms to manipulate the scattering, colour, phase or the intensity of the light. Currently, one of the typical approaches for designing a metasurface is to optimize one or two variables, among a vast number of fixed parameters, such as various materials' properties and coupling effects, as well as the geometrical parameters. Ideally, it would require a multi-dimensional space optimization through direct numerical simulations. Recently, an alternative approach became quite popular allowing to reduce the computational cost significantly based on a deep-learning-assisted method. In this paper, we utilize a deep-learning approach for obtaining high-quality factor (high-Q) resonances with desired characteristics, such as linewidth, amplitude and spectral position. We exploit such high-Q resonances for the enhanced light-matter interaction in nonlinear optical metasurfaces and optomechanical vibrations, simultaneously. We demonstrate that optimized metasurfaces lead up to 400+ folds enhancement of the third harmonic generation (THG); at the same time, they also contribute to 100+ folds enhancement in optomechanical vibrations. This approach can be further used to realize structures with unconventional scattering responses.

preprint2020arXiv

Probabilistic Object Detection: Definition and Evaluation

We introduce Probabilistic Object Detection, the task of detecting objects in images and accurately quantifying the spatial and semantic uncertainties of the detections. Given the lack of methods capable of assessing such probabilistic object detections, we present the new Probability-based Detection Quality measure (PDQ).Unlike AP-based measures, PDQ has no arbitrary thresholds and rewards spatial and label quality, and foreground/background separation quality while explicitly penalising false positive and false negative detections. We contrast PDQ with existing mAP and moLRP measures by evaluating state-of-the-art detectors and a Bayesian object detector based on Monte Carlo Dropout. Our experiments indicate that conventional object detectors tend to be spatially overconfident and thus perform poorly on the task of probabilistic object detection. Our paper aims to encourage the development of new object detection approaches that provide detections with accurately estimated spatial and label uncertainties and are of critical importance for deployment on robots and embodied AI systems in the real world.

preprint2020arXiv

Stochastic Computing Implemented by Skyrmionic Logic Devices

Magnetic skyrmion, topologically non-trivial spin texture, has been considered as promising information carrier in future electronic devices because of its nanoscale size, low depinning current density and high motion velocity. Despite the broad interests in skyrmion racetrack memory, researchers have been recently exploiting logic functions enabled by using the particle-like behaviors of skyrmions. These functions can be applied to unconventional computing, such as stochastic computing (SC), which treats data as probabilities and is superior to binary computing due to its simplicity of logic operation. In this work, we demonstrate SC implemented by skyrmionic logic devices. We propose a skyrmionic AND-OR logic device as a multiplier in the stochastic domain and two skyrmionic multiplexer (MUX) logic devices as stochastic adders. With the assist of voltage controlled magnetic anisotropy (VCMA), the precise control of skyrmions collision is not required in the skyrmionic AND-OR logic device, thus improving the operation robustness. In the two MUX logic devices, skyrmions can be driven by Zhang-Li torque or spin orbit torque (SOT). Particularly, we can flexibly regulate the skyrmion motion by VCMA or voltage controlled Dzyaloshinskii-Moriya Interaction (VCDMI) in the SOT case. Furthermore, 3-bit stochastic multiplier and adder are demonstrated by micromagnetic simulations. In addition, simulations in synthetic antiferromagnets (SAF) show that the performance of our skyrmionic logic gates can be optimized through advanced materials. Our work opens up perspective to implement SC using skyrmionic logic devices.

preprint2020arXiv

The Robotic Vision Scene Understanding Challenge

Being able to explore an environment and understand the location and type of all objects therein is important for indoor robotic platforms that must interact closely with humans. However, it is difficult to evaluate progress in this area due to a lack of standardized testing which is limited due to the need for active robot agency and perfect object ground-truth. To help provide a standard for testing scene understanding systems, we present a new robot vision scene understanding challenge using simulation to enable repeatable experiments with active robot agency. We provide two challenging task types, three difficulty levels, five simulated environments and a new evaluation measure for evaluating 3D cuboid object maps. Our aim is to drive state-of-the-art research in scene understanding through enabling evaluation and comparison of active robotic vision systems.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Computer Vision Artificial Intelligence Computation and Language Robotics astro-ph.HE eess.SY Human-Computer Interaction Multimedia physics.app-ph physics.optics Systems and Control

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2601.00303:author:5:haoyang-zhang

Imported May 21, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.12034:author:5:haoyang-zhang

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.17742:author:2:haoyang-zhang

Imported May 20, 2026Synced May 20, 2026

4 works

Feras Dayoub

Researcher

Feras Dayoub contributes to research discovery and scholarly infrastructure.

Open to collaborate

4 works

Niko Sünderhauf

Researcher

Niko Sünderhauf contributes to research discovery and scholarly infrastructure.

Open to collaborate

3 works

David Hall

Researcher

David Hall contributes to research discovery and scholarly infrastructure.

Open to collaborate

2 works

Ben Talbot

Researcher

Ben Talbot contributes to research discovery and scholarly infrastructure.

Open to collaborate

Haoyang Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Analysis of Full Order Observer Based Control for Spacecraft Orbit Maneuver Trajectory Under Solar Radiation Pressure

Boosting Omni-Modal Language Models: Staged Post-Training with Visually Debiased Evaluation

DepFlow: Disentangled Speech Generation to Mitigate Semantic Bias in Depression Detection

Detection of a puzzling dual-superorbital hard X-ray modulation in the X-ray binary GX 301-2

UST-Hand: An Uncertainty-aware Spatiotemporal Point Cloud Interaction Network for 3D Self-supervised Hand Pose Estimation

InsPro: Propagating Instance Query and Proposal for Online Video Instance Segmentation

PanopticDepth: A Unified Framework for Depth-aware Panoptic Segmentation

IFoodCloud: A Platform for Real-time Sentiment Analysis of Public Opinion about Food Safety in China

VarifocalNet: An IoU-aware Dense Object Detector

BenchBot: Evaluating Robotics Research in Photorealistic 3D Simulation and on Real Robots

Enhanced Light-Matter Interactions in Dielectric Nanostructures via Machine Learning Approach

Probabilistic Object Detection: Definition and Evaluation

Stochastic Computing Implemented by Skyrmionic Logic Devices

The Robotic Vision Scene Understanding Challenge