Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
21works
0followers
17topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

21 published item(s)

preprint2026arXiv

VoxAfford: Multi-Scale Voxel-Token Fusion for Open-Vocabulary 3D Affordance Detection

Open-vocabulary 3D affordance detection requires localizing interaction regions on point clouds given novel affordance descriptions. Recent methods extend multimodal large language models (MLLMs) with special output tokens that are decoded into segmentation masks. However, these tokens are produced through autoregressive generation, which models sequential dependencies rather than spatial neighborhood relations, leaving them semantically rich but spatially impoverished for 3D localization. We propose Voxel-enhanced Affordance detection (VoxAfford), which bypasses this bottleneck by injecting multi-scale geometric features from a frozen pre-trained 3D VQVAE encoder into the output tokens after generation. Each output token uses its affordance semantics as a query to retrieve relevant geometric patterns from its paired voxel scale via cross-attention, with a learned compatibility gate controlling the injection strength. The enhanced tokens are then aggregated into a spatially-aware affordance prompt through semantic-conditioned attention and propagated alongside per-point features to generate the final mask. Experiments on open-vocabulary affordance detection tasks show that VoxAfford achieves state-of-the-art performance with approximately an 8% improvement in mIoU, and real robot experiments confirm zero-shot transfer to novel objects.

preprint2025arXiv

Algebraic skin effect in two-dimensional non-Hermitian metamaterials

Metamaterials have unlocked unprecedented control over light by leveraging novel mechanisms to expand their functionality. Non-Hermitian physics further enhances the tunability of non-Hermitian metamaterials (NHMs) through phenomena such as the non-Hermitian skin effect (NHSE), enabling applications like directional amplification. The higher-dimensional NHSE manifests unique effects, including the algebraic skin effect (ASE), which features power-law decay instead of exponential localization, allowing for quasi-long-range interactions. In this work, we establish apparent criteria for achieving ASE in two-dimensional reciprocal NHMs with anisotropic and complex dielectric tensors. By numerically and theoretically demonstrating ASE through mismatched optical axes and geometric structures, we reveal that ASE is governed by a generalized Fermi surface whose dimensionality exceeds that of the Fermi surface. We further propose and validate a realistic photonic crystal design for ASE, which is experimentally accessible. Our recipe for ASE provides a versatile pathway for broader generalizations, including three-dimensional structures, synthetic dimensions, and other classical wave systems, paving the way for advancements in non-Hermitian photonics.

preprint2022arXiv

Closed-Loop Data Transcription to an LDR via Minimaxing Rate Reduction

This work proposes a new computational framework for learning a structured generative model for real-world datasets. In particular, we propose to learn a closed-loop transcription between a multi-class multi-dimensional data distribution and a linear discriminative representation (LDR) in the feature space that consists of multiple independent multi-dimensional linear subspaces. In particular, we argue that the optimal encoding and decoding mappings sought can be formulated as the equilibrium point of a two-player minimax game between the encoder and decoder. A natural utility function for this game is the so-called rate reduction, a simple information-theoretic measure for distances between mixtures of subspace-like Gaussians in the feature space. Our formulation draws inspiration from closed-loop error feedback from control systems and avoids expensive evaluating and minimizing approximated distances between arbitrary distributions in either the data space or the feature space. To a large extent, this new formulation unifies the concepts and benefits of Auto-Encoding and GAN and naturally extends them to the settings of learning a both discriminative and generative representation for multi-class and multi-dimensional real-world data. Our extensive experiments on many benchmark imagery datasets demonstrate tremendous potential of this new closed-loop formulation: under fair comparison, visual quality of the learned decoder and classification performance of the encoder is competitive and often better than existing methods based on GAN, VAE, or a combination of both. Unlike existing generative models, the so learned features of the multiple classes are structured: different classes are explicitly mapped onto corresponding independent principal subspaces in the feature space. Source code can be found at https://github.com/Delay-Xili/LDR.

preprint2022arXiv

DuMLP-Pin: A Dual-MLP-dot-product Permutation-invariant Network for Set Feature Extraction

Existing permutation-invariant methods can be divided into two categories according to the aggregation scope, i.e. global aggregation and local one. Although the global aggregation methods, e. g., PointNet and Deep Sets, get involved in simpler structures, their performance is poorer than the local aggregation ones like PointNet++ and Point Transformer. It remains an open problem whether there exists a global aggregation method with a simple structure, competitive performance, and even much fewer parameters. In this paper, we propose a novel global aggregation permutation-invariant network based on dual MLP dot-product, called DuMLP-Pin, which is capable of being employed to extract features for set inputs, including unordered or unstructured pixel, attribute, and point cloud data sets. We strictly prove that any permutation-invariant function implemented by DuMLP-Pin can be decomposed into two or more permutation-equivariant ones in a dot-product way as the cardinality of the given input set is greater than a threshold. We also show that the DuMLP-Pin can be viewed as Deep Sets with strong constraints under certain conditions. The performance of DuMLP-Pin is evaluated on several different tasks with diverse data sets. The experimental results demonstrate that our DuMLP-Pin achieves the best results on the two classification problems for pixel sets and attribute sets. On both the point cloud classification and the part segmentation, the accuracy of DuMLP-Pin is very close to the so-far best-performing local aggregation method with only a 1-2% difference, while the number of required parameters is significantly reduced by more than 85% in classification and 69% in segmentation, respectively. The code is publicly available on https://github.com/JaronTHU/DuMLP-Pin.

preprint2022arXiv

MirrorAlign: A Super Lightweight Unsupervised Word Alignment Model via Cross-Lingual Contrastive Learning

Word alignment is essential for the downstream cross-lingual language understanding and generation tasks. Recently, the performance of the neural word alignment models has exceeded that of statistical models. However, they heavily rely on sophisticated translation models. In this study, we propose a super lightweight unsupervised word alignment model named MirrorAlign, in which bidirectional symmetric attention trained with a contrastive learning objective is introduced, and an agreement loss is employed to bind the attention maps, such that the alignments follow mirror-like symmetry hypothesis. Experimental results on several public benchmarks demonstrate that our model achieves competitive, if not better, performance compared to the state of the art in word alignment while significantly reducing the training and decoding time on average. Further ablation analysis and case studies show the superiority of our proposed MirrorAlign. Notably, we recognize our model as a pioneer attempt to unify bilingual word embedding and word alignments. Encouragingly, our approach achieves {16.4X speedup} against GIZA++, and {50X parameter compression} compared with the Transformer-based alignment methods. We release our code to facilitate the community: https://github.com/moore3930/MirrorAlign.

preprint2022arXiv

SuperLine3D: Self-supervised Line Segmentation and Description for LiDAR Point Cloud

Poles and building edges are frequently observable objects on urban roads, conveying reliable hints for various computer vision tasks. To repetitively extract them as features and perform association between discrete LiDAR frames for registration, we propose the first learning-based feature segmentation and description model for 3D lines in LiDAR point cloud. To train our model without the time consuming and tedious data labeling process, we first generate synthetic primitives for the basic appearance of target lines, and build an iterative line auto-labeling process to gradually refine line labels on real LiDAR scans. Our segmentation model can extract lines under arbitrary scale perturbations, and we use shared EdgeConv encoder layers to train the two segmentation and descriptor heads jointly. Base on the model, we can build a highly-available global registration module for point cloud registration, in conditions without initial transformation hints. Experiments have demonstrated that our line-based registration method is highly competitive to state-of-the-art point-based approaches. Our code is available at https://github.com/zxrzju/SuperLine3D.git.

preprint2022arXiv

The Visual-Inertial-Dynamical Multirotor Dataset

Recently, the community has witnessed numerous datasets built for developing and testing state estimators. However, for some applications such as aerial transportation or search-and-rescue, the contact force or other disturbance must be perceived for robust planning and control, which is beyond the capacity of these datasets. This paper introduces a Visual-Inertial-Dynamical (VID) dataset, not only focusing on traditional six degrees of freedom (6-DOF) pose estimation but also providing dynamical characteristics of the flight platform for external force perception or dynamics-aided estimation. The VID dataset contains hardware synchronized imagery and inertial measurements, with accurate ground truth trajectories for evaluating common visual-inertial estimators. Moreover, the proposed dataset highlights rotor speed and motor current measurements, control inputs, and ground truth 6-axis force data to evaluate external force estimation. To the best of our knowledge, the proposed VID dataset is the first public dataset containing visual-inertial and complete dynamical information in the real world for pose and external force evaluation. The dataset: https://github.com/ZJU-FAST-Lab/VID-Dataset and related files: https://github.com/ZJU-FAST-Lab/VID-Flight-Platform are open-sourced.

preprint2022arXiv

Translation Invariant Global Estimation of Heading Angle Using Sinogram of LiDAR Point Cloud

Global point cloud registration is an essential module for localization, of which the main difficulty exists in estimating the rotation globally without initial value. With the aid of gravity alignment, the degree of freedom in point cloud registration could be reduced to 4DoF, in which only the heading angle is required for rotation estimation. In this paper, we propose a fast and accurate global heading angle estimation method for gravity-aligned point clouds. Our key idea is that we generate a translation invariant representation based on Radon Transform, allowing us to solve the decoupled heading angle globally with circular cross-correlation. Besides, for heading angle estimation between point clouds with different distributions, we implement this heading angle estimator as a differentiable module to train a feature extraction network end- to-end. The experimental results validate the effectiveness of the proposed method in heading angle estimation and show better performance compared with other methods.

preprint2021arXiv

A Latent Survival Analysis Enabled Simulation Platform For Nursing Home Staffing Strategy Evaluation

Nursing homes are critical facilities for caring frail older adults with round-the-clock formal care and personal assistance. To ensure quality care for nursing home residents, adequate staffing level is of great importance. Current nursing home staffing practice is mainly based on experience and regulation. The objective of this paper is to investigate the viability of experience-based and regulation-based strategies, as well as alternative staffing strategies to minimize labor costs subject to heterogeneous service demand of nursing home residents under various scenarios of census. We propose a data-driven analysis framework to model heterogeneous service demand of nursing home residents and further identify appropriate staffing strategies by combing survival model and computer simulation techniques as well as domain knowledge. Specifically, in the analysis, we develop an agent-based simulation tool consisting of four main modules, namely individual length of stay predictor, individual daily staff time generator, facility level staffing strategy evaluator, and graphical user interface. We use real nursing home data to validate the proposed model, and demonstrate that the identified staffing strategy significantly reduces the total labor cost of certified nursing assistants compared to the benchmark strategies. Additionally, the proposed length of stay predictive model that considers multiple discharge dispositions exhibits superior accuracy and offers better staffing decisions than those without the consideration. Further, we construct different census scenarios of nursing home residents to demonstrate the capability of the proposed framework in helping adjust staffing decisions of nursing home administrators in various realistic settings.

preprint2021arXiv

An Analytics-based Decision Support System for Resource Planning under Heterogeneous Service Demand of Nursing Home Residents

Nursing homes (NHs) are critical healthcare infrastructures for caring frail older adults with 24/7 formal care and personal assistance. Adequate NH resource planning is of great importance to ensure desired quality of care and resident outcomes yet challenging. The challenge lies in the heterogeneous service demand of NH residents, due to the varied individual characteristics, the diverse dwelling duration with multiple competing discharge dispositions, and the diverse service need. Existing healthcare staffing literature often assumed a homogeneous population of NH residents and neglected the complexity of service demand heterogeneity. This work proposes an analytics-based modeling framework with a user-friendly decision support platform for NH resource planning. The proposed framework characterizes the heterogeneous service demand of NH residents via novel integration of advanced statistical modeling, computer simulation and optimization techniques. We further provide a case study using real data from our industrial collaborator to demonstrate the effectiveness and superior performance of the proposed work. The impacts of service utilization heterogeneity and service need heterogeneity on resource planning decisions are investigated as well.

preprint2020arXiv

A Lightweight and Accurate Localization Algorithm Using Multiple Inertial Measurement Units

This paper proposes a novel inertial-aided localization approach by fusing information from multiple inertial measurement units (IMUs) and exteroceptive sensors. IMU is a low-cost motion sensor which provides measurements on angular velocity and gravity compensated linear acceleration of a moving platform, and widely used in modern localization systems. To date, most existing inertial-aided localization methods exploit only one single IMU. While the single-IMU localization yields acceptable accuracy and robustness for different use cases, the overall performance can be further improved by using multiple IMUs. To this end, we propose a lightweight and accurate algorithm for fusing measurements from multiple IMUs and exteroceptive sensors, which is able to obtain noticeable performance gain without incurring additional computational cost. To achieve this, we first probabilistically map measurements from all IMUs onto a virtual IMU. This step is performed by stochastic estimation with least-square estimators and probabilistic marginalization of inter-IMU rotational accelerations. Subsequently, the propagation model for both state and error state of the virtual IMU is also derived, which enables the use of the classical filter-based or optimization-based sensor fusion algorithms for localization. Finally, results from both simulation and real-world tests are provided, which demonstrate that the proposed algorithm outperforms competing algorithms by noticeable margins.

preprint2020arXiv

Accelerating Neural Network Inference by Overflow Aware Quantization

The inherent heavy computation of deep neural networks prevents their widespread applications. A widely used method for accelerating model inference is quantization, by replacing the input operands of a network using fixed-point values. Then the majority of computation costs focus on the integer matrix multiplication accumulation. In fact, high-bit accumulator leads to partially wasted computation and low-bit one typically suffers from numerical overflow. To address this problem, we propose an overflow aware quantization method by designing trainable adaptive fixed-point representation, to optimize the number of bits for each input tensor while prohibiting numeric overflow during the computation. With the proposed method, we are able to fully utilize the computing power to minimize the quantization loss and obtain optimized inference performance. To verify the effectiveness of our method, we conduct image classification, object detection, and semantic segmentation tasks on ImageNet, Pascal VOC, and COCO datasets, respectively. Experimental results demonstrate that the proposed method can achieve comparable performance with state-of-the-art quantization methods while accelerating the inference process by about 2 times.

preprint2020arXiv

From A Glance to "Gotcha": Interactive Facial Image Retrieval with Progressive Relevance Feedback

Facial image retrieval plays a significant role in forensic investigations where an untrained witness tries to identify a suspect from a massive pool of images. However, due to the difficulties in describing human facial appearances verbally and directly, people naturally tend to depict by referring to well-known existing images and comparing specific areas of faces with them and it is also challenging to provide complete comparison at each time. Therefore, we propose an end-to-end framework to retrieve facial images with relevance feedback progressively provided by the witness, enabling an exploitation of history information during multiple rounds and an interactive and iterative approach to retrieving the mental image. With no need of any extra annotations, our model can be applied at the cost of a little response effort. We experiment on \texttt{CelebA} and evaluate the performance by ranking percentile and achieve 99\% under the best setting. Since this topic remains little explored to the best of our knowledge, we hope our work can serve as a stepping stone for further research.

preprint2020arXiv

Interpretable Foreground Object Search As Knowledge Distillation

This paper proposes a knowledge distillation method for foreground object search (FoS). Given a background and a rectangle specifying the foreground location and scale, FoS retrieves compatible foregrounds in a certain category for later image composition. Foregrounds within the same category can be grouped into a small number of patterns. Instances within each pattern are compatible with any query input interchangeably. These instances are referred to as interchangeable foregrounds. We first present a pipeline to build pattern-level FoS dataset containing labels of interchangeable foregrounds. We then establish a benchmark dataset for further training and testing following the pipeline. As for the proposed method, we first train a foreground encoder to learn representations of interchangeable foregrounds. We then train a query encoder to learn query-foreground compatibility following a knowledge distillation framework. It aims to transfer knowledge from interchangeable foregrounds to supervise representation learning of compatibility. The query feature representation is projected to the same latent space as interchangeable foregrounds, enabling very efficient and interpretable instance-level search. Furthermore, pattern-level search is feasible to retrieve more controllable, reasonable and diverse foregrounds. The proposed method outperforms the previous state-of-the-art by 10.42% in absolute difference and 24.06% in relative improvement evaluated by mean average precision (mAP). Extensive experimental results also demonstrate its efficacy from various aspects. The benchmark dataset and code will be release shortly.

preprint2020arXiv

LE-HGR: A Lightweight and Efficient RGB-based Online Gesture Recognition Network for Embedded AR Devices

Online hand gesture recognition (HGR) techniques are essential in augmented reality (AR) applications for enabling natural human-to-computer interaction and communication. In recent years, the consumer market for low-cost AR devices has been rapidly growing, while the technology maturity in this domain is still limited. Those devices are typical of low prices, limited memory, and resource-constrained computational units, which makes online HGR a challenging problem. To tackle this problem, we propose a lightweight and computationally efficient HGR framework, namely LE-HGR, to enable real-time gesture recognition on embedded devices with low computing power. We also show that the proposed method is of high accuracy and robustness, which is able to reach high-end performance in a variety of complicated interaction environments. To achieve our goal, we first propose a cascaded multi-task convolutional neural network (CNN) to simultaneously predict probabilities of hand detection and regress hand keypoint locations online. We show that, with the proposed cascaded architecture design, false-positive estimates can be largely eliminated. Additionally, an associated mapping approach is introduced to track the hand trace via the predicted locations, which addresses the interference of multi-handedness. Subsequently, we propose a trace sequence neural network (TraceSeqNN) to recognize the hand gesture by exploiting the motion features of the tracked trace. Finally, we provide a variety of experimental results to show that the proposed framework is able to achieve state-of-the-art accuracy with significantly reduced computational cost, which are the key properties for enabling real-time applications in low-cost commercial devices such as mobile devices and AR/VR headsets.

preprint2020arXiv

MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships

Monocular 3D object detection is an essential component in autonomous driving while challenging to solve, especially for those occluded samples which are only partially visible. Most detectors consider each 3D object as an independent training target, inevitably resulting in a lack of useful information for occluded samples. To this end, we propose a novel method to improve the monocular 3D object detection by considering the relationship of paired samples. This allows us to encode spatial constraints for partially-occluded objects from their adjacent neighbors. Specifically, the proposed detector computes uncertainty-aware predictions for object locations and 3D distances for the adjacent object pairs, which are subsequently jointly optimized by nonlinear least squares. Finally, the one-stage uncertainty-aware prediction structure and the post-optimization module are dedicatedly integrated for ensuring the run-time efficiency. Experiments demonstrate that our method yields the best performance on KITTI 3D detection benchmark, by outperforming state-of-the-art competitors by wide margins, especially for the hard samples.

preprint2020arXiv

SEKD: Self-Evolving Keypoint Detection and Description

Researchers have attempted utilizing deep neural network (DNN) to learn novel local features from images inspired by its recent successes on a variety of vision tasks. However, existing DNN-based algorithms have not achieved such remarkable progress that could be partly attributed to insufficient utilization of the interactive characters between local feature detector and descriptor. To alleviate these difficulties, we emphasize two desired properties, i.e., repeatability and reliability, to simultaneously summarize the inherent and interactive characters of local feature detector and descriptor. Guided by these properties, a self-supervised framework, namely self-evolving keypoint detection and description (SEKD), is proposed to learn an advanced local feature model from unlabeled natural images. Additionally, to have performance guarantees, novel training strategies have also been dedicatedly designed to minimize the gap between the learned feature and its properties. We benchmark the proposed method on homography estimation, relative pose estimation, and structure-from-motion tasks. Extensive experimental results demonstrate that the proposed method outperforms popular hand-crafted and DNN-based methods by remarkable margins. Ablation studies also verify the effectiveness of each critical training strategy. We will release our code along with the trained model publicly.

preprint2020arXiv

Semi-supervised deep learning for high-dimensional uncertainty quantification

Conventional uncertainty quantification methods usually lacks the capability of dealing with high-dimensional problems due to the curse of dimensionality. This paper presents a semi-supervised learning framework for dimension reduction and reliability analysis. An autoencoder is first adopted for mapping the high-dimensional space into a low-dimensional latent space, which contains a distinguishable failure surface. Then a deep feedforward neural network (DFN) is utilized to learn the mapping relationship and reconstruct the latent space, while the Gaussian process (GP) modeling technique is used to build the surrogate model of the transformed limit state function. During the training process of the DFN, the discrepancy between the actual and reconstructed latent space is minimized through semi-supervised learning for ensuring the accuracy. Both labeled and unlabeled samples are utilized for defining the loss function of the DFN. Evolutionary algorithm is adopted to train the DFN, then the Monte Carlo simulation method is used for uncertainty quantification and reliability analysis based on the proposed framework. The effectiveness is demonstrated through a mathematical example.

preprint2020arXiv

Sequential Selection for Accelerated Life Testing via Approximate Bayesian Inference

Accelerated life testing (ALT) is typically used to assess the reliability of material's lifetime under desired stress levels. Recent advances in material engineering have made a variety of material alternatives readily available. To identify the most reliable material setting with efficient experimental design, a sequential test planning strategy is preferred. To guarantee a tractable statistical mechanism for information collection and update, we develop explicit model parameter update formulas via approximate Bayesian inference. Theories show that our explicit update formulas give consistent parameter estimates. Simulation study and a case study show that the proposed sequential selection approach can significantly improve the probability of identifying the material alternative with best reliability performance over other design approaches.

preprint2020arXiv

Studying Politeness across Cultures Using English Twitter and Mandarin Weibo

Modeling politeness across cultures helps to improve intercultural communication by uncovering what is considered appropriate and polite. We study the linguistic features associated with politeness across US English and Mandarin Chinese. First, we annotate 5,300 Twitter posts from the US and 5,300 Sina Weibo posts from China for politeness scores. Next, we develop an English and Chinese politeness feature set, `PoliteLex'. Combining it with validated psycholinguistic dictionaries, we then study the correlations between linguistic features and perceived politeness across cultures. We find that on Mandarin Weibo, future-focusing conversations, identifying with a group affiliation, and gratitude are considered to be more polite than on English Twitter. Death-related taboo topics, lack of or poor choice of pronouns, and informal language are associated with higher impoliteness on Mandarin Weibo compared to English Twitter. Finally, we build language-based machine learning models to predict politeness with an F1 score of 0.886 on Mandarin Weibo and a 0.774 on English Twitter.

preprint2019arXiv

Sem-LSD: A Learning-based Semantic Line Segment Detector

In this paper, we introduces a new type of line-shaped image representation, named semantic line segment (Sem-LS) and focus on solving its detection problem. Sem-LS contains high-level semantics and is a compact scene representation where only visually salient line segments with stable semantics are preserved. Combined with high-level semantics, Sem-LS is more robust under cluttered environment compared with existing line-shaped representations. The compactness of Sem-LS facilitates its use in large-scale applications, such as city-scale SLAM (simultaneously localization and mapping) and LCD (loop closure detection). Sem-LS detection is a challenging task due to its significantly different appearance from existing learning-based image representations such as wireframes and objects. For further investigation, we first label Sem-LS on two well-known datasets, KITTI and KAIST URBAN, as new benchmarks. Then, we propose a learning-based Sem-LS detector (Sem-LSD) and devise new module as well as metrics to address unique challenges in Sem-LS detection. Experimental results have shown both the efficacy and efficiency of Sem-LSD. Finally, the effectiveness of the proposed Sem-LS is supported by two experiments on detector repeatability and a city-scale LCD problem. Labeled datasets and code will be released shortly.