Researcher profile

Yongsheng Gao

Yongsheng Gao contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

OneViewAll: Semantic Prior Guided One-View 6D Pose Estimation for Novel Objects

In many practical 6D object pose estimation scenarios, we often have access to only a single real-world RGB-D reference view per object, typically without CAD models. Existing methods largely rely on explicit 3D models or multi-view data, which limits their scalability. To address this challenging single-reference model-free setting, we propose \textbf{OneViewAll}, a semantic-prior-guided framework that performs pose estimation via a novel Project-and-Compare paradigm. Instead of relying on computationally expensive CAD-based rendering, our method directly aligns reference and query observations within a projection-equivariant space. OneViewAll progressively integrates hierarchical semantic priors across three levels: (1) \textit{category- and scene-level} priors for efficient hypothesis initialization; (2) \textit{object-level symmetry} priors for geometry completion via mirror fusion; and (3) \textit{patch-level} priors for discriminative refinement. Extensive experiments demonstrate that OneViewAll achieves \textbf{92.5\%} ADD-0.1 accuracy on the LINEMOD dataset using only one real reference view -- significantly outperforming the CVPR 2025 baseline One2Any (52.6\%). It also yields consistent improvements on YCB-V, Real275, and Toyota-Light while maintaining low inference latency. Our results underscore the efficacy of symmetry-aware projection in handling symmetric, texture-less, and occluded objects.

preprint2026arXiv

Privacy-Aware Video Anomaly Detection through Orthogonal Subspace Projection

Video anomaly detection (VAD) systems often prioritize accuracy while overlooking privacy concerns, limiting their suitability for real-world deployment. We propose the Orthogonal Projection Layer (OPL), a lightweight module that removes task-irrelevant variations to produce representations focused on anomaly-relevant cues. To address privacy risks in human-centered scenarios, we introduce Guided OPL (G-OPL), which suppresses facial attributes using weak supervision from face-presence signals while preserving non-identifying features such as pose and motion. A cosine alignment objective enforces consistent capture and removal of facial information without identity labels or adversarial training. We further present a privacy-aware evaluation framework that jointly assesses detection performance and privacy preservation, and enables analysis of how sensitive information is filtered. Experiments show that embedding privacy constraints into model design reduces sensitive information while maintaining or improving detection accuracy, supporting projection-based architectures as a principled approach for privacy-aware VAD.

preprint2026arXiv

SpectralKAN: Weighted Activation Distribution Kolmogorov-Arnold Network for Hyperspectral Image Change Detection

Kolmogorov-Arnold networks (KANs) represent data features by learning the activation functions and demonstrate superior accuracy with fewer parameters, FLOPs, GPU memory usage (Memory), shorter training time (TraT), and testing time (TesT) when handling low-dimensional data. However, when applied to high-dimensional data, which contains significant redundant information, the current activation mechanism of KANs leads to unnecessary computations, thereby reducing computational efficiency. KANs require reshaping high-dimensional data into a one-dimensional tensor as input, which inevitably results in the loss of dimensional information. To address these limitations, we propose weighted activation distribution KANs (WKANs), which reduce the frequency of activations per node and distribute node information into different output nodes through weights to avoid extracting redundant information. Furthermore, we introduce a multilevel tensor splitting framework (MTSF), which decomposes high-dimensional data to extract features from each dimension independently and leverages tensor-parallel computation to significantly improve the computational efficiency of WKANs on high-dimensional data. In this paper, we design SpectralKAN for hyperspectral image change detection using the proposed MTSF. SpectralKAN demonstrates outstanding performance across five datasets, achieving an overall accuracy (OA) of 0.9801 and a Kappa coefficient (K) of 0.9514 on the Farmland dataset, with only 8 k parameters, 0.07 M FLOPs, 911 MB Memory, 13.26 S TraT, and 2.52 S TesT, underscoring its superior accuracy-efficiency trade-off. The source code is publicly available at https://github.com/yanhengwang-heu/SpectralKAN.

preprint2022arXiv

Feature Fusion Vision Transformer for Fine-Grained Visual Categorization

The core for tackling the fine-grained visual categorization (FGVC) is to learn subtle yet discriminative features. Most previous works achieve this by explicitly selecting the discriminative parts or integrating the attention mechanism via CNN-based approaches.However, these methods enhance the computational complexity and make the modeldominated by the regions containing the most of the objects. Recently, vision trans-former (ViT) has achieved SOTA performance on general image recognition tasks. Theself-attention mechanism aggregates and weights the information from all patches to the classification token, making it perfectly suitable for FGVC. Nonetheless, the classifi-cation token in the deep layer pays more attention to the global information, lacking the local and low-level features that are essential for FGVC. In this work, we proposea novel pure transformer-based framework Feature Fusion Vision Transformer (FFVT)where we aggregate the important tokens from each transformer layer to compensate thelocal, low-level and middle-level information. We design a novel token selection mod-ule called mutual attention weight selection (MAWS) to guide the network effectively and efficiently towards selecting discriminative tokens without introducing extra param-eters. We verify the effectiveness of FFVT on three benchmarks where FFVT achieves the state-of-the-art performance.

preprint2022arXiv

Image Feature Information Extraction for Interest Point Detection: A Review

Interest point detection is one of the most fundamental and critical problems in computer vision and image processing. In this paper, we carry out a comprehensive review on image feature information (IFI) extraction techniques for interest point detection. To systematically introduce how the existing interest point detection methods extract IFI from an input image, we propose a taxonomy of the IFI extraction techniques for interest point detection. According to this taxonomy, we discuss different types of IFI extraction techniques for interest point detection. Furthermore, we identify the main unresolved issues related to the existing IFI extraction techniques for interest point detection and any interest point detection methods that have not been discussed before. The existing popular datasets and evaluation standards are provided and the performances for eighteen state-of-the-art approaches are evaluated and discussed. Moreover, future research directions on IFI extraction techniques for interest point detection are elaborated.

preprint2022arXiv

Imitation of Manipulation Skills Using Multiple Geometries

Daily manipulation tasks are characterized by geometric primitives related to actions and object shapes. Such geometric descriptors are poorly represented by only using Cartesian coordinate systems. In this paper, we propose a learning approach to extract the optimal representation from a dictionary of coordinate systems to encode an observed movement/behavior. This is achieved by using an extension of Gaussian distributions on Riemannian manifolds, which is used to analyse a set of user demonstrations statistically, by considering multiple geometries as candidate representations of the task. We formulate the reproduction problem as a general optimal control problem based on an iterative linear quadratic regulator (iLQR), where the Gaussian distribution in the extracted coordinate systems are used to define the cost function. We apply our approach to object grasping and box opening tasks in simulation and on a 7-axis Franka Emika robot. The results show that the robot can exploit several geometries to execute the manipulation task and generalize it to new situations, by maintaining the invariant characteristics of the task in the coordinate system(s) of interest.

preprint2022arXiv

Region Aware Video Object Segmentation with Deep Motion Modeling

Current semi-supervised video object segmentation (VOS) methods usually leverage the entire features of one frame to predict object masks and update memory. This introduces significant redundant computations. To reduce redundancy, we present a Region Aware Video Object Segmentation (RAVOS) approach that predicts regions of interest (ROIs) for efficient object segmentation and memory storage. RAVOS includes a fast object motion tracker to predict their ROIs in the next frame. For efficient segmentation, object features are extracted according to the ROIs, and an object decoder is designed for object-level segmentation. For efficient memory storage, we propose motion path memory to filter out redundant context by memorizing the features within the motion path of objects between two frames. Besides RAVOS, we also propose a large-scale dataset, dubbed OVOS, to benchmark the performance of VOS models under occlusions. Evaluation on DAVIS and YouTube-VOS benchmarks and our new OVOS dataset show that our method achieves state-of-the-art performance with significantly faster inference time, e.g., 86.1 J&F at 42 FPS on DAVIS and 84.4 J&F at 23 FPS on YouTube-VOS.

preprint2021arXiv

Extracting Optimal Explanations for Ensemble Trees via Logical Reasoning

Ensemble trees are a popular machine learning model which often yields high prediction performance when analysing structured data. Although individual small decision trees are deemed explainable by nature, an ensemble of large trees is often difficult to understand. In this work, we propose an approach called optimised explanation (OptExplain) that faithfully extracts global explanations of ensemble trees using a combination of logical reasoning, sampling and optimisation. Building on top of this, we propose a method called the profile of equivalent classes (ProClass), which uses MAX-SAT to simplify the explanation even further. Our experimental study on several datasets shows that our approach can provide high-quality explanations to large ensemble trees models, and it betters recent top-performers.

preprint2020arXiv

A Generalized Kernel Risk Sensitive Loss for Robust Two-Dimensional Singular Value Decomposition

Two-dimensional singular decomposition (2DSVD) has been widely used for image processing tasks, such as image reconstruction, classification, and clustering. However, traditional 2DSVD algorithm is based on the mean square error (MSE) loss, which is sensitive to outliers. To overcome this problem, we propose a robust 2DSVD framework based on a generalized kernel risk sensitive loss (GKRSL-2DSVD) which is more robust to noise and and outliers. Since the proposed objective function is non-convex, a majorization-minimization algorithm is developed to efficiently solve it with guaranteed convergence. The proposed framework has inherent properties of processing non-centered data, rotational invariant, being easily extended to higher order spaces. Experimental results on public databases demonstrate that the performance of the proposed method on different applications significantly outperforms that of all the benchmarks.

preprint2020arXiv

A Robust Matching Pursuit Algorithm Using Information Theoretic Learning

Current orthogonal matching pursuit (OMP) algorithms calculate the correlation between two vectors using the inner product operation and minimize the mean square error, which are both suboptimal when there are non-Gaussian noises or outliers in the observation data. To overcome these problems, a new OMP algorithm is developed based on the information theoretic learning (ITL), which is built on the following new techniques: (1) an ITL-based correlation (ITL-Correlation) is developed as a new similarity measure which can better exploit higher-order statistics of the data, and is robust against many different types of noise and outliers in a sparse representation framework; (2) a non-second order statistic measurement and minimization method is developed to improve the robustness of OMP by overcoming the limitation of Gaussianity inherent in cost function based on second-order moments. The experimental results on both simulated and real-world data consistently demonstrate the superiority of the proposed OMP algorithm in data recovery, image reconstruction, and classification.

preprint2020arXiv

Deep Residual-Dense Lattice Network for Speech Enhancement

Convolutional neural networks (CNNs) with residual links (ResNets) and causal dilated convolutional units have been the network of choice for deep learning approaches to speech enhancement. While residual links improve gradient flow during training, feature diminution of shallow layer outputs can occur due to repetitive summations with deeper layer outputs. One strategy to improve feature re-usage is to fuse both ResNets and densely connected CNNs (DenseNets). DenseNets, however, over-allocate parameters for feature re-usage. Motivated by this, we propose the residual-dense lattice network (RDL-Net), which is a new CNN for speech enhancement that employs both residual and dense aggregations without over-allocating parameters for feature re-usage. This is managed through the topology of the RDL blocks, which limit the number of outputs used for dense aggregations. Our extensive experimental investigation shows that RDL-Nets are able to achieve a higher speech enhancement performance than CNNs that employ residual and/or dense aggregations. RDL-Nets also use substantially fewer parameters and have a lower computational requirement. Furthermore, we demonstrate that RDL-Nets outperform many state-of-the-art deep learning approaches to speech enhancement.