Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
20works
0followers
20topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

20 published item(s)

preprint2026arXiv

AGPO: Asymmetric Group Policy Optimization for Verifiable Reasoning and Search Ads Relevance at JD

Reinforcement Learning with Verifiable Rewards (RLVR) has demonstrated notable success in enhancing the reasoning performance of large language models (LLMs). However, recent studies reveal that while current RLVR methods improve sampling efficiency towards correct paths, they do not elicit fundamentally new reasoning patterns. Instead, the reasoning capability boundary of trained models often narrows compared to their base models, with base models achieving higher coverage at large sample sizes. In this work, we propose Asymmetric Group Policy Optimization (AGPO) to counteract this boundary shrinkage. AGPO adopts a negative-dominant reinforcement strategy to suppress incorrect reasoning paths, maintaining the base model's exploration capacity. For positive reinforcement, AGPO adopts a group advantage mechanism, which scales positive updates based on intra-group variance, allowing the model to focus on rare correct paths while suppressing updates from trivial paths. Our experiments on five mathematical benchmarks demonstrate that AGPO achieves state-of-the-art accuracy while consistently improving pass@$k$ performance at scale. In a large-scale industrial application for search ads relevance optimization, AGPO effectively enhances the quality of the data annotation, leading to substantial performance gains in downstream student models.

preprint2026arXiv

MetaToolAgent: Towards Generalizable Tool Usage in LLMs through Meta-Learning

Tool learning is increasingly important for large language models (LLMs) to effectively coordinate and utilize a diverse set of tools in order to solve complex real-world tasks. By selecting and integrating appropriate tools, LLMs extend their capabilities beyond pure language understanding to perform specialized functions. However, existing methods for tool selection often focus on limited tool sets and struggle to generalize to novel tools encountered in practical deployments. To address these challenges, we introduce a comprehensive dataset spanning 7 domains, containing 155 tools and 9,377 question-answer pairs, which simulates realistic integration scenarios. Additionally, we propose MetaToolAgent (MTA), a meta-learning approach designed to improve cross-tool generalization. Experimental results show that MTA significantly outperforms baseline methods on unseen tools, demonstrating its promise for building flexible and scalable systems that require dynamic tool coordination.

preprint2026arXiv

Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization

Jailbreak attacks on audio language models (ALMs) optimize audio perturbations to elicit unsafe generations, and they typically update the entire waveform densely throughout optimization. In this work, we investigate the necessity of such dense optimization by analyzing the structure of token-aligned gradients in ALMs. We find that gradient energy is highly non-uniform across audio tokens, indicating that only a small subset of token-aligned audio regions dominates the optimization signal. Motivated by this observation, we propose Token-Aware Gradient Optimization (TAGO), which enables sparse jailbreak optimization by retaining only waveform gradients aligned with audio tokens that have high gradient energy, while masking the remaining gradients at each iteration. Across three ALMs, TAGO outperforms baselines, and substantial sparsification preserves strong attack success rates (e.g. on Qwen3-Omni, $\mathrm{ASR}_{l}$ remains at 86% with a token retention ratio of 0.25, compared to 87% with full token retention). These results demonstrate that dense waveform updates are largely redundant, and we advocate that future audio jailbreak and safety alignment research should further leverage this heterogeneous token-level gradient structure.

preprint2025arXiv

Spatio-Temporal Representation Decoupling and Enhancement for Federated Instrument Segmentation in Surgical Videos

Surgical instrument segmentation under Federated Learning (FL) is a promising direction, which enables multiple surgical sites to collaboratively train the model without centralizing datasets. However, there exist very limited FL works in surgical data science, and FL methods for other modalities do not consider inherent characteristics in surgical domain: i) different scenarios show diverse anatomical backgrounds while highly similar instrument representation; ii) there exist surgical simulators which promote large-scale synthetic data generation with minimal efforts. In this paper, we propose a novel Personalized FL scheme, Spatio-Temporal Representation Decoupling and Enhancement (FedST), which wisely leverages surgical domain knowledge during both local-site and global-server training to boost segmentation. Concretely, our model embraces a Representation Separation and Cooperation (RSC) mechanism in local-site training, which decouples the query embedding layer to be trained privately, to encode respective backgrounds. Meanwhile, other parameters are optimized globally to capture the consistent representations of instruments, including the temporal layer to capture similar motion patterns. A textual-guided channel selection is further designed to highlight site-specific features, facilitating model adapta tion to each site. Moreover, in global-server training, we propose Synthesis-based Explicit Representation Quantification (SERQ), which defines an explicit representation target based on synthetic data to synchronize the model convergence during fusion for improving model generalization.

preprint2022arXiv

A New Adjacency Matrix Configuration in GCN-based Models for Skeleton-based Action Recognition

Human skeleton data has received increasing attention in action recognition due to its background robustness and high efficiency. In skeleton-based action recognition, graph convolutional network (GCN) has become the mainstream method. This paper analyzes the fundamental factor for GCN-based models -- the adjacency matrix. We notice that most GCN-based methods conduct their adjacency matrix based on the human natural skeleton structure. Based on our former work and analysis, we propose that the human natural skeleton structure adjacency matrix is not proper for skeleton-based action recognition. We propose a new adjacency matrix that abandons all rigid neighbor connections but lets the model adaptively learn the relationships of joints. We conduct extensive experiments and analysis with a validation model on two skeleton-based action recognition datasets (NTURGBD60 and FineGYM). Comprehensive experimental results and analysis reveals that 1) the most widely used human natural skeleton structure adjacency matrix is unsuitable in skeleton-based action recognition; 2) The proposed adjacency matrix is superior in model performance, noise robustness and transferability.

preprint2022arXiv

A novel design of industrial real-time CT system based on sparse-view reconstruction and deep-learning image enhancement

Industrial CT is useful for defect detection, dimensional inspection and geometric analysis. While it does not meet the needs of industrial mass production, because of its time-consuming imaging procedure. This article proposes a novel stationary real-time CT system with multiple X-ray sources and detectors, which is able to refresh the CT reconstructed slices to the detector frame frequency. This kind of structure avoids the movement of the X-ray sources and detectors. Projections from different angles can be acquired with the objects' translation, which makes it easier to be integrated into pipeline. All the detectors are arranged along the conveyor, and observe the objects in different angle of view. With the translation of objects, their X-ray projections are obtained for CT reconstruction. To decrease the mechanical size and reduce the number of X-ray sources and detectors, the FBP reconstruction algorithm was combined with deep-learning image enhancement. Medical CT images were applied to train the deep-learning network for its quantity advantage in comparison with industrial ones. It is the first time to adopt this source-detector layout strategy. Data augmentation and regularization were used to elevate the generalization of the network. Time consumption of the CT imaging process was also calculated to prove its high efficiency. It is an innovative design for the 4th industrial revolution, providing an intelligent quality inspection solution for digital production.

preprint2022arXiv

Blockchain Meets COVID-19: A Framework for Contact Information Sharing and Risk Notification System

COVID-19 is a severe global epidemic in human history. Even though there are particular medications and vaccines to curb the epidemic, tracing and isolating the infection source is the best option to slow the virus spread and reduce infection and death rates. There are three disadvantages to the existing contact tracing system: 1. User data is stored in a centralized database that could be stolen and tampered with, 2. User's confidential personal identity may be revealed to a third party or organization, 3. Existing contact tracing systems only focus on information sharing from one dimension, such as location-based tracing, which significantly limits the effectiveness of such systems. We propose a global COVID-19 information sharing and risk notification system that utilizes the Blockchain, Smart Contract, and Bluetooth. To protect user privacy, we design a novel Blockchain-based platform that can share consistent and non-tampered contact tracing information from multiple dimensions, such as location-based for indirect contact and Bluetooth-based for direct contact. Hierarchical smart contract architecture is also designed to achieve global agreements from users about how to process and utilize user data, thereby enhancing the data usage transparency. Furthermore, we propose a mechanism to protect user identity privacy from multiple aspects. More importantly, our system can notify the users about the exposure risk via smart contracts. We implement a prototype system to conduct extensive measurements to demonstrate the feasibility and effectiveness of our system.

preprint2022arXiv

Disentangled Learning of Stance and Aspect Topics for Vaccine Attitude Detection in Social Media

Building models to detect vaccine attitudes on social media is challenging because of the composite, often intricate aspects involved, and the limited availability of annotated data. Existing approaches have relied heavily on supervised training that requires abundant annotations and pre-defined aspect categories. Instead, with the aim of leveraging the large amount of unannotated data now available on vaccination, we propose a novel semi-supervised approach for vaccine attitude detection, called VADet. A variational autoencoding architecture based on language models is employed to learn from unlabelled data the topical information of the domain. Then, the model is fine-tuned with a few manually annotated examples of user attitudes. We validate the effectiveness of VADet on our annotated data and also on an existing vaccination corpus annotated with opinions on vaccines. Our results show that VADet is able to learn disentangled stance and aspect topics, and outperforms existing aspect-based sentiment analysis models on both stance detection and tweet clustering.

preprint2022arXiv

Image Augmentation Based Momentum Memory Intrinsic Reward for Sparse Reward Visual Scenes

Many scenes in real life can be abstracted to the sparse reward visual scenes, where it is difficult for an agent to tackle the task under the condition of only accepting images and sparse rewards. We propose to decompose this problem into two sub-problems: the visual representation and the sparse reward. To address them, a novel framework IAMMIR combining the self-supervised representation learning with the intrinsic motivation is presented. For visual representation, a representation driven by a combination of the imageaugmented forward dynamics and the reward is acquired. For sparse rewards, a new type of intrinsic reward is designed, the Momentum Memory Intrinsic Reward (MMIR). It utilizes the difference of the outputs from the current model (online network) and the historical model (target network) to present the agent's state familiarity. Our method is evaluated on the visual navigation task with sparse rewards in Vizdoom. Experiments demonstrate that our method achieves the state of the art performance in sample efficiency, at least 2 times faster than the existing methods reaching 100% success rate.

preprint2022arXiv

Iota: A Framework for Analyzing System-Level Security of IoTs

Most IoT systems involve IoT devices, communication protocols, remote cloud, IoT applications, mobile apps, and the physical environment. However, existing IoT security analyses only focus on a subset of all the essential components, such as device firmware, and ignore IoT systems' interactive nature, resulting in limited attack detection capabilities. In this work, we propose Iota, a logic programming-based framework to perform system-level security analysis for IoT systems. Iota generates attack graphs for IoT systems, showing all of the system resources that can be compromised and enumerating potential attack traces. In building Iota, we design novel techniques to scan IoT systems for individual vulnerabilities and further create generic exploit models for IoT vulnerabilities. We also identify and model physical dependencies between different devices as they are unique to IoT systems and are employed by adversaries to launch complicated attacks. In addition, we utilize NLP techniques to extract IoT app semantics based on app descriptions. To evaluate vulnerabilities' system-wide impact, we propose two metrics based on the attack graph, which provide guidance on fortifying IoT systems. Evaluation on 127 IoT CVEs (Common Vulnerabilities and Exposures) shows that Iota's exploit modeling module achieves over 80% accuracy in predicting vulnerabilities' preconditions and effects. We apply Iota to 37 synthetic smart home IoT systems based on real-world IoT apps and devices. Experimental results show that our framework is effective and highly efficient. Among 27 shortest attack traces revealed by the attack graphs, 62.8% are not anticipated by the system administrator. It only takes 1.2 seconds to generate and analyze the attack graph for an IoT system consisting of 50 devices.

preprint2022arXiv

Minimum Message Length Autoregressive Moving Average Model Order Selection

This paper derives a Minimum Message Length (MML) criterion for the model selection of the Autoregressive Moving Average (ARMA) time series model. The MML87 performances on the ARMA model compared with other well known model selection criteria, Akaike Information Criterion (AIC), Corrected AIC (AICc), Bayesian Information Criterion (BIC), and Hannan Quinn (HQ). The experimental results show that the MML87 is outperformed the other model selection criteria as it select most of the models with lower prediction errors and the models selected by MML87 to have a lower mean squared error in different in-sample and out-sample sizes.

preprint2022arXiv

Real-time 3D Single Object Tracking with Transformer

LiDAR-based 3D single object tracking is a challenging issue in robotics and autonomous driving. Currently, existing approaches usually suffer from the problem that objects at long distance often have very sparse or partially-occluded point clouds, which makes the features extracted by the model ambiguous. Ambiguous features will make it hard to locate the target object and finally lead to bad tracking results. To solve this problem, we utilize the powerful Transformer architecture and propose a Point-Track-Transformer (PTT) module for point cloud-based 3D single object tracking task. Specifically, PTT module generates fine-tuned attention features by computing attention weights, which guides the tracker focusing on the important features of the target and improves the tracking ability in complex scenarios. To evaluate our PTT module, we embed PTT into the dominant method and construct a novel 3D SOT tracker named PTT-Net. In PTT-Net, we embed PTT into the voting stage and proposal generation stage, respectively. PTT module in the voting stage could model the interactions among point patches, which learns context-dependent features. Meanwhile, PTT module in the proposal generation stage could capture the contextual information between object and background. We evaluate our PTT-Net on KITTI and NuScenes datasets. Experimental results demonstrate the effectiveness of PTT module and the superiority of PTT-Net, which surpasses the baseline by a noticeable margin, ~10% in the Car category. Meanwhile, our method also has a significant performance improvement in sparse scenarios. In general, the combination of transformer and tracking pipeline enables our PTT-Net to achieve state-of-the-art performance on both two datasets. Additionally, PTT-Net could run in real-time at 40FPS on NVIDIA 1080Ti GPU. Our code is open-sourced for the research community at https://github.com/shanjiayao/PTT.

preprint2021arXiv

A Query-Driven Topic Model

Topic modeling is an unsupervised method for revealing the hidden semantic structure of a corpus. It has been increasingly widely adopted as a tool in the social sciences, including political science, digital humanities and sociological research in general. One desirable property of topic models is to allow users to find topics describing a specific aspect of the corpus. A possible solution is to incorporate domain-specific knowledge into topic modeling, but this requires a specification from domain experts. We propose a novel query-driven topic model that allows users to specify a simple query in words or phrases and return query-related topics, thus avoiding tedious work from domain experts. Our proposed approach is particularly attractive when the user-specified query has a low occurrence in a text corpus, making it difficult for traditional topic models built on word cooccurrence patterns to identify relevant topics. Experimental results demonstrate the effectiveness of our model in comparison with both classical topic models and neural topic models.

preprint2021arXiv

Improved Signed Distance Function for 2D Real-time SLAM and Accurate Localization

Accurate mapping and localization are very important for many industrial robotics applications. In this paper, we propose an improved Signed Distance Function (SDF) for both 2D SLAM and pure localization to improve the accuracy of mapping and localization. To achieve this goal, firstly we improved the back-end mapping to build a more accurate SDF map by extending the update range and building free space, etc. Secondly, to get more accurate pose estimation for the front-end, we proposed a new iterative registration method to align the current scan to the SDF submap by removing random outliers of laser scanners. Thirdly, we merged all the SDF submaps to produce an integrated SDF map for highly accurate pure localization. Experimental results show that based on the merged SDF map, a localization accuracy of a few millimeters (5mm) can be achieved globally within the map. We believe that this method is important for mobile robots working in scenarios where high localization accuracy matters.

preprint2020arXiv

A Robust Laser-Inertial Odometry and Mapping Method for Large-Scale Highway Environments

In this paper, we propose a novel laser-inertial odometry and mapping method to achieve real-time, low-drift and robust pose estimation in large-scale highway environments. The proposed method is mainly composed of four sequential modules, namely scan pre-processing module, dynamic object detection module, laser-inertial odometry module and laser mapping module. Scan pre-processing module uses inertial measurements to compensate the motion distortion of each laser scan. Then, the dynamic object detection module is used to detect and remove dynamic objects from each laser scan by applying CNN segmentation network. After obtaining the undistorted point cloud without moving objects, the laser inertial odometry module uses an Error State Kalman Filter to fuse the data of laser and IMU and output the coarse pose estimation at high frequency. Finally, the laser mapping module performs a fine processing step and the "Frame-to-Model" scan matching strategy is used to create a static global map. We compare the performance of our method with two state-ofthe-art methods, LOAM and SuMa, using KITTI dataset and real highway scene dataset. Experiment results show that our method performs better than the state-of-the-art methods in real highway environments and achieves competitive accuracy on the KITTI dataset.

preprint2020arXiv

Clinical Micro-CT Empowered by Interior Tomography, Robotic Scanning, and Deep Learning

While micro-CT systems are instrumental in preclinical research, clinical micro-CT imaging has long been desired with cochlear implantation as a primary example. The structural details of the cochlear implant and the temporal bone require a significantly higher image resolution than that (about 0.2 mm) provided by current medical CT scanners. In this paper, we propose a clinical micro-CT (CMCT) system design integrating conventional spiral cone-beam CT, contemporary interior tomography, deep learning techniques, and technologies of micro-focus X-ray source, photon-counting detector (PCD), and robotic arms for ultrahigh resolution localized tomography of a freely-selected volume of interest (VOI) at a minimized radiation dose level. The whole system consists of a standard CT scanner for a clinical CT exam and VOI specification, and a robotic-arm based micro-CT scanner for a local scan at much higher spatial and spectral resolution as well as much reduced radiation dose. The prior information from global scan is also fully utilized for background compensation to improve interior tomography from local data for accurate and stable VOI reconstruction. Our results and analysis show that the proposed hybrid reconstruction algorithm delivers superior local reconstruction, being insensitive to the misalignment of the isocenter position and initial view angle in the data/image registration while the attenuation error caused by scale mismatch can be effectively addressed with bias correction. These findings demonstrate the feasibility of our system design. We envision that deep learning techniques can be leveraged for optimized imaging performance. With high resolution imaging, high dose efficiency and low system cost synergistically, our proposed CMCT system has great potentials in temporal bone imaging as well as various other clinical applications.

preprint2020arXiv

Deep Patch-based Human Segmentation

3D human segmentation has seen noticeable progress in re-cent years. It, however, still remains a challenge to date. In this paper, weintroduce a deep patch-based method for 3D human segmentation. Wefirst extract a local surface patch for each vertex and then parameterizeit into a 2D grid (or image). We then embed identified shape descriptorsinto the 2D grids which are further fed into the powerful 2D Convolu-tional Neural Network for regressing corresponding semantic labels (e.g.,head, torso). Experiments demonstrate that our method is effective inhuman segmentation, and achieves state-of-the-art accuracy.

preprint2020arXiv

IoTGaze: IoT Security Enforcement via Wireless Context Analysis

Internet of Things (IoT) has become the most promising technology for service automation, monitoring, and interconnection, etc. However, the security and privacy issues caused by IoT arouse concerns. Recent research focuses on addressing security issues by looking inside platform and apps. In this work, we creatively change the angle to consider security problems from a wireless context perspective. We propose a novel framework called IoTGaze, which can discover potential anomalies and vulnerabilities in the IoT system via wireless traffic analysis. By sniffing the encrypted wireless traffic, IoTGaze can automatically identify the sequential interaction of events between apps and devices. We discover the temporal event dependencies and generate the Wireless Context for the IoT system. Meanwhile, we extract the IoT Context, which reflects user's expectation, from IoT apps' descriptions and user interfaces. If the wireless context does not match the expected IoT context, IoTGaze reports an anomaly. Furthermore, IoTGaze can discover the vulnerabilities caused by the inter-app interaction via hidden channels, such as temperature and illuminance. We provide a proof-of-concept implementation and evaluation of our framework on the Samsung SmartThings platform. The evaluation shows that IoTGaze can effectively discover anomalies and vulnerabilities, thereby greatly enhancing the security of IoT systems.

preprint2020arXiv

Precision annealing Monte Carlo methods for statistical data assimilation and machine learning

In statistical data assimilation (SDA) and supervised machine learning (ML), we wish to transfer information from observations to a model of the processes underlying those observations. For SDA, the model consists of a set of differential equations that describe the dynamics of a physical system. For ML, the model is usually constructed using other strategies. In this paper, we develop a systematic formulation based on Monte Carlo sampling to achieve such information transfer. Following the derivation of an appropriate target distribution, we present the formulation based on the standard Metropolis-Hasting (MH) procedure and the Hamiltonian Monte Carlo (HMC) method for performing the high dimensional integrals that appear. To the extensive literature on MH and HMC, we add (1) an annealing method using a hyperparameter that governs the precision of the model to identify and explore the highest probability regions of phase space dominating those integrals, and (2) a strategy for initializing the state space search. The efficacy of the proposed formulation is demonstrated using a nonlinear dynamical model with chaotic solutions widely used in geophysics.

preprint2020arXiv

RGB-D SLAM in Dynamic Environments Using Point Correlations

In this paper, a simultaneous localization and mapping (SLAM) method that eliminates the influence of moving objects in dynamic environments is proposed. This method utilizes the correlation between map points to separate points that are part of the static scene and points that are part of different moving objects into different groups. A sparse graph is first created using Delaunay triangulation from all map points. In this graph, the vertices represent map points, and each edge represents the correlation between adjacent points. If the relative position between two points remains consistent over time, there is correlation between them, and they are considered to be moving together rigidly. If not, they are considered to have no correlation and to be in separate groups. After the edges between the uncorrelated points are removed during point-correlation optimization, the remaining graph separates the map points of the moving objects from the map points of the static scene. The largest group is assumed to be the group of reliable static map points. Finally, motion estimation is performed using only these points. The proposed method was implemented for RGB-D sensors, evaluated with a public RGB-D benchmark, and tested in several additional challenging environments. The experimental results demonstrate that robust and accurate performance can be achieved by the proposed SLAM method in both slightly and highly dynamic environments. Compared with other state-of-the-art methods, the proposed method can provide competitive accuracy with good real-time performance.