Source author record

Rongtao Xu

Rongtao Xu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computer Vision eess.IV eess.SP eess.SY Information Theory Machine Learning math.IT Robotics Systems and Control

Catalog footprint

What is connected

5works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models

Vision-Language-Action (VLA) models remain brittle in long-horizon, contact-rich manipulation because success-only imitation provides little supervision for execution drift, while failed rollouts are often discarded. We introduce RePO-VLA, a recovery-driven policy optimization framework that assigns distinct roles to success, recovery, and failure trajectories. RePO-VLA first applies Recovery-Aware Initialization (RAI), slicing recovery segments and resetting history so corrective actions depend on the current adverse state rather than the preceding failure. It then learns a Progress-Aware Semantic Value Function (PAS-VF), aligning spatiotemporal trajectory features with instructions and successful references. The resulting labels salvage useful failure prefixes via reliability decay, while low-value labels mark drift and terminal breakdowns, teaching differences among nominal, failed, and corrective actions. The data engine turns adverse states into planner-generated or human-collected corrective rollouts, teaching recovery to the success manifold. Value-Conditioned Refinement (VCR) trains the policy to prefer high-progress actions. At deployment, a fixed high value ($v=1.0$) biases actions toward the learned success manifold without online failure detectors or heuristic retries. We introduce FRBench, with standardized error injection and recovery-focused evaluation. Across simulated and real-world bimanual tasks, RePO-VLA improves robustness, raising adversarial success from 20% to 75% on average and up to 80% in scaled real-world trials.

preprint2022arXiv

Accurate Lung Nodules Segmentation with Detailed Representation Transfer and Soft Mask Supervision

Accurate lung lesion segmentation from Computed Tomography (CT) images is crucial to the analysis and diagnosis of lung diseases such as COVID-19 and lung cancer. However, the smallness and variety of lung nodules and the lack of high-quality labeling make the accurate lung nodule segmentation difficult. To address these issues, we first introduce a novel segmentation mask named Soft Mask which has richer and more accurate edge details description and better visualization and develop a universal automatic Soft Mask annotation pipeline to deal with different datasets correspondingly. Then, a novel Network with detailed representation transfer and Soft Mask supervision (DSNet) is proposed to process the input low-resolution images of lung nodules into high-quality segmentation results. Our DSNet contains a special Detail Representation Transfer Module (DRTM) for reconstructing the detailed representation to alleviate the small size of lung nodules images, and an adversarial training framework with Soft Mask for further improving the accuracy of segmentation. Extensive experiments validate that our DSNet outperforms other state-of-the-art methods for accurate lung nodule segmentation and has strong generalization ability in other accurate medical segmentation tasks with competitive results. Besides, we provide a new challenging lung nodules segmentation dataset for further studies.

preprint2022arXiv

MTLDesc: Looking Wider to Describe Better

Limited by the locality of convolutional neural networks, most existing local features description methods only learn local descriptors with local information and lack awareness of global and surrounding spatial context. In this work, we focus on making local descriptors "look wider to describe better" by learning local Descriptors with More Than just Local information (MTLDesc). Specifically, we resort to context augmentation and spatial attention mechanisms to make our MTLDesc obtain non-local awareness. First, Adaptive Global Context Augmented Module and Diverse Local Context Augmented Module are proposed to construct robust local descriptors with context information from global to local. Second, Consistent Attention Weighted Triplet Loss is designed to integrate spatial attention awareness into both optimization and matching stages of local descriptors learning. Third, Local Features Detection with Feature Pyramid is given to obtain more stable and accurate keypoints localization. With the above innovations, the performance of our MTLDesc significantly surpasses the prior state-of-the-art local descriptors on HPatches, Aachen Day-Night localization and InLoc indoor localization benchmarks.

preprint2020arXiv

Design and Implementation of a High-Accuracy Positioning System Using RTK on Smartphones

In recent years, with the development of the Global Navigation Satellite System (GNSS), the satellite navigation technology has played a crucial role in smartphone navigation. To solve the problem of the low positioning accuracy in the smartphones based on GNSS, this paper proposes to apply real-time dynamic carrier phase difference technique (RTK) in the smartphones, and a real-time positioning system for smartphones based on RTK is implemented. This paper presents the implementation and experimental results of this system. This system is mainly composed of the GNSS reference station, the NTRIP system and the smartphones. The experimental results show that the system effectively improves the positioning accuracy of smartphones

preprint2011arXiv

Low Complexity Kolmogorov-Smirnov Modulation Classification

Kolmogorov-Smirnov (K-S) test-a non-parametric method to measure the goodness of fit, is applied for automatic modulation classification (AMC) in this paper. The basic procedure involves computing the empirical cumulative distribution function (ECDF) of some decision statistic derived from the received signal, and comparing it with the CDFs of the signal under each candidate modulation format. The K-S-based modulation classifier is first developed for AWGN channel, then it is applied to OFDM-SDMA systems to cancel multiuser interference. Regarding the complexity issue of K-S modulation classification, we propose a low-complexity method based on the robustness of the K-S classifier. Extensive simulation results demonstrate that compared with the traditional cumulant-based classifiers, the proposed K-S classifier offers superior classification performance and requires less number of signal samples (thus is fast).