Researcher profile

Weimin Wang

Weimin Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

Learning with Semantic Priors: Stabilizing Point-Supervised Infrared Small Target Detection via Hierarchical Knowledge Distillation

Single-frame Infrared Small Target Detection (ISTD) aims to localize weak targets under heavy background clutter, yet dense pixel-wise annotations are expensive. Point supervision with online label evolution reduces annotation cost; however, lightweight CNN detectors often lack sufficient semantics, leading to noisy pseudo-masks and unstable optimization. To address this, we propose a hierarchical VFM-driven knowledge distillation framework that uses a frozen Vision Foundation Model (VFM) during training. We formulate point-supervised learning as a bilevel optimization process: the inner loop adapts a VFM-embedded teacher on reweighted training samples, while the outer loop transfers validation-guided knowledge to a lightweight student to mitigate pseudo-label noise and training-set bias. We further introduce Semantic-Conditioned Affine Modulation (SCAM) to inject VFM semantics into CNN features at multiple layers. In addition, a dynamic collaborative learning strategy with cluster-level sample reweighting enhances robustness to imperfect pseudo-masks. Experiments on diverse challenging cases across multiple ISTD backbones demonstrate consistent improvements in detection accuracy and training stability. Our code is available at https://github.com/yuanhang-yao/semantic-prior.

preprint2023arXiv

Symmetry breaking and mechanical filter make a pseudo-gimbal-less two-dimensional MEMS scanning mirror with multiple scanning modes

Miniaturized two-dimensional scanning mirror based on microelectromechanical systems (MEMS) technology has great potential in automotive industry, consumer electronics, and biomedicine, etc. Due to its high frequency and large angle, resonant scanning is the mainstream in all MEMS actuation mechanisms, such as harmonic resonant electromagnetic scanner and parametric resonant electrostatic scanner. Although electrostatic scanner has the advantages of low power consumption and IC process compatibility, some shortcomings of parametric resonance, including double frequency of driver electronics and additional feedback control or frequency stabilization system, limit its further application. The symmetry of coplanar electrostatic comb actuator is broken in this paper, and harmonic resonant electrostatic scanner with excellent performance is realized. Further, through adopting mechanical filter, two-dimensional scanning can be achieved through one set of actuators, which avoids the problem that two sets (each for one dimension) of electrostatic actuators must be insulated each other through complicated and expensive processes. A two-dimensional MEMS scanner based on symmetry breaking and mechanical filter was proposed and demonstrated. Multiple scanning modes can be achieved through selective control of a set of four identical actuators.

preprint2022arXiv

Enhancing Local Feature Learning for 3D Point Cloud Processing using Unary-Pairwise Attention

We present a simple but effective attention named the unary-pairwise attention (UPA) for modeling the relationship between 3D point clouds. Our idea is motivated by the analysis that the standard self-attention (SA) that operates globally tends to produce almost the same attention maps for different query positions, revealing difficulties for learning query-independent and query-dependent information jointly. Therefore, we reformulate the SA and propose query-independent (Unary) and query-dependent (Pairwise) components to facilitate the learning of both terms. In contrast to the SA, the UPA ensures query dependence via operating locally. Extensive experiments show that the UPA outperforms the SA consistently on various point cloud understanding tasks including shape classification, part segmentation, and scene segmentation. Moreover, simply equipping the popular PointNet++ method with the UPA even outperforms or is on par with the state-of-the-art attention-based approaches. In addition, the UPA systematically boosts the performance of both standard and modern networks when it is integrated into them as a compositional module.

preprint2022arXiv

Enhancing Local Feature Learning Using Diffusion for 3D Point Cloud Understanding

Learning point clouds is challenging due to the lack of connectivity information, i.e., edges. Although existing edge-aware methods can improve the performance by modeling edges, how edges contribute to the improvement is unclear. In this study, we propose a method that automatically learns to enhance/suppress edges while keeping the its working mechanism clear. First, we theoretically figure out how edge enhancement/suppression works. Second, we experimentally verify the edge enhancement/suppression behavior. Third, we empirically show that this behavior improves performance. In general, we observe that the proposed method achieves competitive performance in point cloud classification and segmentation tasks.

preprint2022arXiv

Enhancing Local Geometry Learning for 3D Point Cloud via Decoupling Convolution

Modeling the local surface geometry is challenging in 3D point cloud understanding due to the lack of connectivity information. Most prior works model local geometry using various convolution operations. We observe that the convolution can be equivalently decomposed as a weighted combination of a local and a global component. With this observation, we explicitly decouple these two components so that the local one can be enhanced and facilitate the learning of local surface geometry. Specifically, we propose Laplacian Unit (LU), a simple yet effective architectural unit that can enhance the learning of local geometry. Extensive experiments demonstrate that networks equipped with LUs achieve competitive or superior performance on typical point cloud understanding tasks. Moreover, through establishing connections between the mean curvature flow, a further investigation of LU based on curvatures is made to interpret the adaptive smoothing and sharpening effect of LU. The code will be available.

preprint2022arXiv

Surgical Skill Assessment via Video Semantic Aggregation

Automated video-based assessment of surgical skills is a promising task in assisting young surgical trainees, especially in poor-resource areas. Existing works often resort to a CNN-LSTM joint framework that models long-term relationships by LSTMs on spatially pooled short-term CNN features. However, this practice would inevitably neglect the difference among semantic concepts such as tools, tissues, and background in the spatial dimension, impeding the subsequent temporal relationship modeling. In this paper, we propose a novel skill assessment framework, Video Semantic Aggregation (ViSA), which discovers different semantic parts and aggregates them across spatiotemporal dimensions. The explicit discovery of semantic parts provides an explanatory visualization that helps understand the neural network's decisions. It also enables us to further incorporate auxiliary information such as the kinematic data to improve representation learning and performance. The experiments on two datasets show the competitiveness of ViSA compared to state-of-the-art methods. Source code is available at: bit.ly/MICCAI2022ViSA.

preprint2022arXiv

Weakly Supervised Silhouette-based Semantic Scene Change Detection

This paper presents a novel semantic scene change detection scheme with only weak supervision. A straightforward approach for this task is to train a semantic change detection network directly from a large-scale dataset in an end-to-end manner. However, a specific dataset for this task, which is usually labor-intensive and time-consuming, becomes indispensable. To avoid this problem, we propose to train this kind of network from existing datasets by dividing this task into change detection and semantic extraction. On the other hand, the difference in camera viewpoints, for example, images of the same scene captured from a vehicle-mounted camera at different time points, usually brings a challenge to the change detection task. To address this challenge, we propose a new siamese network structure with the introduction of correlation layer. In addition, we collect and annotate a publicly available dataset for semantic change detection to evaluate the proposed method. The experimental results verified both the robustness to viewpoint difference in change detection task and the effectiveness for semantic change detection of the proposed networks. Our code and dataset are available at https://kensakurada.github.io/pscd.

preprint2021arXiv

Weighted boxes fusion: Ensembling boxes from different object detection models

In this work, we present a novel method for combining predictions of object detection models: weighted boxes fusion. Our algorithm utilizes confidence scores of all proposed bounding boxes to constructs the averaged boxes. We tested method on several datasets and evaluated it in the context of the Open Images and COCO Object Detection tracks, achieving top results in these challenges. The source code is publicly available at https://github.com/ZFTurbo/Weighted-Boxes-Fusion

preprint2020arXiv

3D Object Detection Method Based on YOLO and K-Means for Image and Point Clouds

Lidar based 3D object detection and classification tasks are essential for autonomous driving(AD). A lidar sensor can provide the 3D point cloud data reconstruction of the surrounding environment. However, real time detection in 3D point clouds still needs a strong algorithmic. This paper proposes a 3D object detection method based on point cloud and image which consists of there parts.(1)Lidar-camera calibration and undistorted image transformation. (2)YOLO-based detection and PointCloud extraction, (3)K-means based point cloud segmentation and detection experiment test and evaluation in depth image. In our research, camera can capture the image to make the Real-time 2D object detection by using YOLO, we transfer the bounding box to node whose function is making 3d object detection on point cloud data from Lidar. By comparing whether 2D coordinate transferred from the 3D point is in the object bounding box or not can achieve High-speed 3D object recognition function in GPU. The accuracy and precision get imporved after k-means clustering in point cloud. The speed of our detection method is a advantage faster than PointNet.

preprint2020arXiv

SOIC: Semantic Online Initialization and Calibration for LiDAR and Camera

This paper presents a novel semantic-based online extrinsic calibration approach, SOIC (so, I see), for Light Detection and Ranging (LiDAR) and camera sensors. Previous online calibration methods usually need prior knowledge of rough initial values for optimization. The proposed approach removes this limitation by converting the initialization problem to a Perspective-n-Point (PnP) problem with the introduction of semantic centroids (SCs). The closed-form solution of this PnP problem has been well researched and can be found with existing PnP methods. Since the semantic centroid of the point cloud usually does not accurately match with that of the corresponding image, the accuracy of parameters are not improved even after a nonlinear refinement process. Thus, a cost function based on the constraint of the correspondence between semantic elements from both point cloud and image data is formulated. Subsequently, optimal extrinsic parameters are estimated by minimizing the cost function. We evaluate the proposed method either with GT or predicted semantics on KITTI dataset. Experimental results and comparisons with the baseline method verify the feasibility of the initialization strategy and the accuracy of the calibration approach. In addition, we release the source code at https://github.com/--/SOIC.

preprint2020arXiv

YOLO and K-Means Based 3D Object Detection Method on Image and Point Cloud

Lidar based 3D object detection and classification tasks are essential for automated driving(AD). A Lidar sensor can provide the 3D point coud data reconstruction of the surrounding environment. But the detection in 3D point cloud still needs a strong algorithmic challenge. This paper consists of three parts.(1)Lidar-camera calib. (2)YOLO, based detection and PointCloud extraction, (3) k-means based point cloud segmentation. In our research, Camera can capture the image to make the Real-time 2D Object Detection by using YOLO, I transfer the bounding box to node whose function is making 3d object detection on point cloud data from Lidar. By comparing whether 2D coordinate transferred from the 3D point is in the object bounding box or not, and doing a k-means clustering can achieve High-speed 3D object recognition function in GPU.