Researcher profile

Lihua Xie

Lihua Xie contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
38works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

38 published item(s)

preprint2026arXiv

What Is The Best 3D Scene Representation for Robotics? From Geometric to Foundation Models

In this paper, we provide a comprehensive overview of existing scene representation methods for robotics, covering traditional representations such as point clouds, voxels, signed distance functions (SDF), and scene graphs, as well as more recent neural representations like Neural Radiance Fields (NeRF), 3D Gaussian Splatting (3DGS), and the emerging Foundation Models. While current SLAM and localization systems predominantly rely on sparse representations like point clouds and voxels, dense scene representations are expected to play a critical role in downstream tasks such as navigation and obstacle avoidance. Moreover, neural representations such as NeRF, 3DGS, and foundation models are well-suited for integrating high-level semantic features and language-based priors, enabling more comprehensive 3D scene understanding and embodied intelligence. In this paper, we categorized the core modules of robotics into five parts (Perception, Mapping, Localization, Navigation, Manipulation). We start by presenting the standard formulation of different scene representation methods and comparing the advantages and disadvantages of scene representation across different modules. This survey is centered around the question: What is the best 3D scene representation for robotics? We then discuss the future development trends of 3D scene representations, with a particular focus on how the 3D Foundation Model could replace current methods as the unified solution for future robotic applications. The remaining challenges in fully realizing this model are also explored. We aim to offer a valuable resource for both newcomers and experienced researchers to explore the future of 3D scene representations and their application in robotics. We have published an open-source project on GitHub and will continue to add new works and technologies to this project.

preprint2025arXiv

SplatSSC: Decoupled Depth-Guided Gaussian Splatting for Semantic Scene Completion

Monocular 3D Semantic Scene Completion (SSC) is a challenging yet promising task that aims to infer dense geometric and semantic descriptions of a scene from a single image. While recent object-centric paradigms significantly improve efficiency by leveraging flexible 3D Gaussian primitives, they still rely heavily on a large number of randomly initialized primitives, which inevitably leads to 1) inefficient primitive initialization and 2) outlier primitives that introduce erroneous artifacts. In this paper, we propose SplatSSC, a novel framework that resolves these limitations with a depth-guided initialization strategy and a principled Gaussian aggregator. Instead of random initialization, SplatSSC utilizes a dedicated depth branch composed of a Group-wise Multi-scale Fusion (GMF) module, which integrates multi-scale image and depth features to generate a sparse yet representative set of initial Gaussian primitives. To mitigate noise from outlier primitives, we develop the Decoupled Gaussian Aggregator (DGA), which enhances robustness by decomposing geometric and semantic predictions during the Gaussian-to-voxel splatting process. Complemented with a specialized Probability Scale Loss, our method achieves state-of-the-art performance on the Occ-ScanNet dataset, outperforming prior approaches by over 6.3% in IoU and 4.1% in mIoU, while reducing both latency and memory cost by more than 9.3%.

preprint2022arXiv

Computer Vision for Road Imaging and Pothole Detection: A State-of-the-Art Review of Systems and Algorithms

Computer vision algorithms have been prevalently utilized for 3-D road imaging and pothole detection for over two decades. Nonetheless, there is a lack of systematic survey articles on state-of-the-art (SoTA) computer vision techniques, especially deep learning models, developed to tackle these problems. This article first introduces the sensing systems employed for 2-D and 3-D road data acquisition, including camera(s), laser scanners, and Microsoft Kinect. Afterward, it thoroughly and comprehensively reviews the SoTA computer vision algorithms, including (1) classical 2-D image processing, (2) 3-D point cloud modeling and segmentation, and (3) machine/deep learning, developed for road pothole detection. This article also discusses the existing challenges and future development trends of computer vision-based road pothole detection approaches: classical 2-D image processing-based and 3-D point cloud modeling and segmentation-based approaches have already become history; and Convolutional neural networks (CNNs) have demonstrated compelling road pothole detection results and are promising to break the bottleneck with the future advances in self/un-supervised learning for multi-modal semantic segmentation. We believe that this survey can serve as practical guidance for developing the next-generation road condition assessment systems.

preprint2022arXiv

Continuous-Time and Event-Triggered Online Optimization for Linear Multi-Agent Systems

This paper studies the decentralized online convex optimization problem for heterogeneous linear multi-agent systems. Agents have access to their time-varying local cost functions related to their own outputs, and there are also time-varying coupling inequality constraints among them. The goal of each agent is to minimize the global cost function by selecting appropriate local actions only through communication between neighbors. We design a distributed controller based on the saddle-point method which achieves constant regret bound and sublinear fit bound. In addition, to reduce the communication overhead, we propose an event-triggered communication scheme and show that the constant regret bound and sublinear fit bound are still achieved in the case of discrete communications with no Zeno behavior. A numerical example is provided to verify the proposed algorithms.with no Zeno behavior. A numerical example is provided to verify the proposed algorithms.

preprint2022arXiv

Distributed stochastic projection-free solver for constrained optimization

This paper proposes a distributed stochastic projection-free algorithm for large-scale constrained finite-sum optimization whose constraint set is complicated such that the projection onto the constraint set can be expensive. The global cost function is allocated to multiple agents, each of which computes its local stochastic gradients and communicates with its neighbors to solve the global problem. Stochastic gradient methods enable low computational cost, while they are hard and slow to converge due to the variance caused by random sampling. To construct a convergent distributed stochastic projection-free algorithm, this paper incorporates a variance reduction technique and gradient tracking technique in the Frank-Wolfe update. We develop a sampling rule for the variance reduction technique to reduce the variance introduced by stochastic gradients. Complete and rigorous proofs show that the proposed distributed projection-free algorithm converges with a sublinear convergence rate and enjoys superior complexity guarantees for both convex and non-convex objective functions. By comparative simulations, we demonstrate the convergence and computational efficiency of the proposed algorithm.

preprint2022arXiv

Divide to Adapt: Mitigating Confirmation Bias for Domain Adaptation of Black-Box Predictors

Domain Adaptation of Black-box Predictors (DABP) aims to learn a model on an unlabeled target domain supervised by a black-box predictor trained on a source domain. It does not require access to both the source-domain data and the predictor parameters, thus addressing the data privacy and portability issues of standard domain adaptation. Existing DABP approaches mostly rely on model distillation from the black-box predictor, \emph{i.e.}, training the model with its noisy target-domain predictions, which however inevitably introduces the confirmation bias accumulated from the prediction noises. To mitigate such bias, we propose a new method, named BETA, to incorporate knowledge distillation and noisy label learning into one coherent framework. This is enabled by a new divide-to-adapt strategy. BETA divides the target domain into an easy-to-adapt subdomain with less noise and a hard-to-adapt subdomain. Then it deploys mutually-teaching twin networks to filter the predictor errors for each other and improve them progressively, from the easy to hard subdomains. As such, BETA effectively purifies the noisy labels and reduces error accumulation. We theoretically show that the target error of BETA is minimized by decreasing the noise ratio of the subdomains. Extensive experiments demonstrate BETA outperforms existing methods on all DABP benchmarks, and is even comparable with the standard domain adaptation methods that use the source-domain data.

preprint2022arXiv

EfficientFi: Towards Large-Scale Lightweight WiFi Sensing via CSI Compression

WiFi technology has been applied to various places due to the increasing requirement of high-speed Internet access. Recently, besides network services, WiFi sensing is appealing in smart homes since it is device-free, cost-effective and privacy-preserving. Though numerous WiFi sensing methods have been developed, most of them only consider single smart home scenario. Without the connection of powerful cloud server and massive users, large-scale WiFi sensing is still difficult. In this paper, we firstly analyze and summarize these obstacles, and propose an efficient large-scale WiFi sensing framework, namely EfficientFi. The EfficientFi works with edge computing at WiFi APs and cloud computing at center servers. It consists of a novel deep neural network that can compress fine-grained WiFi Channel State Information (CSI) at edge, restore CSI at cloud, and perform sensing tasks simultaneously. A quantized auto-encoder and a joint classifier are designed to achieve these goals in an end-to-end fashion. To the best of our knowledge, the EfficientFi is the first IoT-cloud-enabled WiFi sensing framework that significantly reduces communication overhead while realizing sensing tasks accurately. We utilized human activity recognition and identification via WiFi sensing as two case studies, and conduct extensive experiments to evaluate the EfficientFi. The results show that it compresses CSI data from 1.368Mb/s to 0.768Kb/s with extremely low error of data reconstruction and achieves over 98% accuracy for human activity recognition.

preprint2022arXiv

GaitFi: Robust Device-Free Human Identification via WiFi and Vision Multimodal Learning

As an important biomarker for human identification, human gait can be collected at a distance by passive sensors without subject cooperation, which plays an essential role in crime prevention, security detection and other human identification applications. At present, most research works are based on cameras and computer vision techniques to perform gait recognition. However, vision-based methods are not reliable when confronting poor illuminations, leading to degrading performances. In this paper, we propose a novel multimodal gait recognition method, namely GaitFi, which leverages WiFi signals and videos for human identification. In GaitFi, Channel State Information (CSI) that reflects the multi-path propagation of WiFi is collected to capture human gaits, while videos are captured by cameras. To learn robust gait information, we propose a Lightweight Residual Convolution Network (LRCN) as the backbone network, and further propose the two-stream GaitFi by integrating WiFi and vision features for the gait retrieval task. The GaitFi is trained by the triplet loss and classification loss on different levels of features. Extensive experiments are conducted in the real world, which demonstrates that the GaitFi outperforms state-of-the-art gait recognition methods based on single WiFi or camera, achieving 94.2% for human identification tasks of 12 subjects.

preprint2022arXiv

MetaFi: Device-Free Pose Estimation via Commodity WiFi for Metaverse Avatar Simulation

Avatar refers to a representative of a physical user in the virtual world that can engage in different activities and interact with other objects in metaverse. Simulating the avatar requires accurate human pose estimation. Though camera-based solutions yield remarkable performance, they encounter the privacy issue and degraded performance caused by varying illumination, especially in smart home. In this paper, we propose a WiFi-based IoT-enabled human pose estimation scheme for metaverse avatar simulation, namely MetaFi. Specifically, a deep neural network is designed with customized convolutional layers and residual blocks to map the channel state information to human pose landmarks. It is enforced to learn the annotations from the accurate computer vision model, thus achieving cross-modal supervision. WiFi is ubiquitous and robust to illumination, making it a feasible solution for avatar applications in smart home. The experiments are conducted in the real world, and the results show that the MetaFi achieves very high performance with a PCK@50 of 95.23%.

preprint2022arXiv

Multi-modal Semantic SLAM for Complex Dynamic Environments

Simultaneous Localization and Mapping (SLAM) is one of the most essential techniques in many real-world robotic applications. The assumption of static environments is common in most SLAM algorithms, which however, is not the case for most applications. Recent work on semantic SLAM aims to understand the objects in an environment and distinguish dynamic information from a scene context by performing image-based segmentation. However, the segmentation results are often imperfect or incomplete, which can subsequently reduce the quality of mapping and the accuracy of localization. In this paper, we present a robust multi-modal semantic framework to solve the SLAM problem in complex and highly dynamic environments. We propose to learn a more powerful object feature representation and deploy the mechanism of looking and thinking twice to the backbone network, which leads to a better recognition result to our baseline instance segmentation model. Moreover, both geometric-only clustering and visual semantic information are combined to reduce the effect of segmentation error due to small-scale objects, occlusion and motion blur. Thorough experiments have been conducted to evaluate the performance of the proposed method. The results show that our method can precisely identify dynamic objects under recognition imperfection and motion blur. Moreover, the proposed SLAM framework is able to efficiently build a static dense map at a processing rate of more than 10 Hz, which can be implemented in many practical applications. Both training data and the proposed method is open sourced at https://github.com/wh200720041/MMS_SLAM.

preprint2022arXiv

NTU VIRAL: A Visual-Inertial-Ranging-Lidar Dataset, From an Aerial Vehicle Viewpoint

In recent years, autonomous robots have become ubiquitous in research and daily life. Among many factors, public datasets play an important role in the progress of this field, as they waive the tall order of initial investment in hardware and manpower. However, for research on autonomous aerial systems, there appears to be a relative lack of public datasets on par with those used for autonomous driving and ground robots. Thus, to fill in this gap, we conduct a data collection exercise on an aerial platform equipped with an extensive and unique set of sensors: two 3D lidars, two hardware-synchronized global-shutter cameras, multiple Inertial Measurement Units (IMUs), and especially, multiple Ultra-wideband (UWB) ranging units. The comprehensive sensor suite resembles that of an autonomous driving car, but features distinct and challenging characteristics of aerial operations. We record multiple datasets in several challenging indoor and outdoor conditions. Calibration results and ground truth from a high-accuracy laser tracker are also included in each package. All resources can be accessed via our webpage https://ntu-aris.github.io/ntu_viral_dataset.

preprint2022arXiv

Quantized Consensus under Data-Rate Constraints and DoS Attacks: A Zooming-In and Holding Approach

This paper is concerned with the quantized consensus problem for uncertain nonlinear multi-agent systems under data-rate constraints and Denial-of-Service (DoS) attacks. The agents are modeled in strict-feedback form with unknown nonlinear dynamics and external disturbance. Extended state observers (ESOs) are leveraged to estimate agents' total uncertainties along with their states. To mitigate the effects of DoS attacks, a novel dynamic quantization with zooming-in and holding capabilities is proposed. The idea is to zoom-in and hold the variable to be quantized if the system is in the absence and presence of DoS attacks, respectively. The control protocol is given in terms of the outputs of the ESOs and the dynamic-quantization-based encoders and decoders. We show that, for a connected undirected network, the developed control protocol is capable of handling any DoS attacks inducing bounded consecutive packet losses with merely 3-level quantization. The application of the zooming-in and holding approach to known linear multi-agent systems is also discussed.

preprint2022arXiv

SPINS: Structure Priors aided Inertial Navigation System

Although Simultaneous Localization and Mapping (SLAM) has been an active research topic for decades, current state-of-the-art methods still suffer from instability or inaccuracy due to feature insufficiency or its inherent estimation drift, in many civilian environments. To resolve these issues, we propose a navigation system combing the SLAM and prior-map-based localization. Specifically, we consider additional integration of line and plane features, which are ubiquitous and more structurally salient in civilian environments, into the SLAM to ensure feature sufficiency and localization robustness. More importantly, we incorporate general prior map information into the SLAM to restrain its drift and improve the accuracy. To avoid rigorous association between prior information and local observations, we parameterize the prior knowledge as low dimensional structural priors defined as relative distances/angles between different geometric primitives. The localization is formulated as a graph-based optimization problem that contains sliding-window-based variables and factors, including IMU, heterogeneous features, and structure priors. We also derive the analytical expressions of Jacobians of different factors to avoid the automatic differentiation overhead. To further alleviate the computation burden of incorporating structural prior factors, a selection mechanism is adopted based on the so-called information gain to incorporate only the most effective structure priors in the graph optimization. Finally, the proposed framework is extensively tested on synthetic data, public datasets, and, more importantly, on the real UAV flight data obtained from a building inspection task. The results show that the proposed scheme can effectively improve the accuracy and robustness of localization for autonomous robots in civilian applications.

preprint2021arXiv

Exponential convergence of distributed optimization for heterogeneous linear multi-agent systems

In this work we study a distributed optimal output consensus problem for heterogeneous linear multi-agent systems where the agents aim to reach consensus with the purpose of minimizing the sum of private convex costs. Based on output feedback, a fully distributed control law is proposed by using the proportional-integral (PI) control technique. For strongly convex cost functions with Lipschitz gradients, the designed controller can achieve convergence exponentially in an undirected and connected network. Furthermore, to remove the requirement of continuous communications, the proposed control law is then extended to periodic and event-triggered communication schemes, which also achieve convergence exponentially. Two simulation examples are given to verify the proposed control algorithms.

preprint2021arXiv

F-LOAM: Fast LiDAR Odometry And Mapping

Simultaneous Localization and Mapping (SLAM) has wide robotic applications such as autonomous driving and unmanned aerial vehicles. Both computational efficiency and localization accuracy are of great importance towards a good SLAM system. Existing works on LiDAR based SLAM often formulate the problem as two modules: scan-to-scan match and scan-to-map refinement. Both modules are solved by iterative calculation which are computationally expensive. In this paper, we propose a general solution that aims to provide a computationally efficient and accurate framework for LiDAR based SLAM. Specifically, we adopt a non-iterative two-stage distortion compensation method to reduce the computational cost. For each scan input, the edge and planar features are extracted and matched to a local edge map and a local plane map separately, where the local smoothness is also considered for iterative pose optimization. Thorough experiments are performed to evaluate its performance in challenging scenarios, including localization for a warehouse Automated Guided Vehicle (AGV) and a public dataset on autonomous driving. The proposed method achieves a competitive localization accuracy with a processing rate of more than 10 Hz in the public dataset evaluation, which provides a good trade-off between performance and computational cost for practical applications.

preprint2021arXiv

Fast Loop Closure Detection via Binary Content

Loop closure detection plays an important role in reducing localization drift in Simultaneous Localization And Mapping (SLAM). It aims to find repetitive scenes from historical data to reset localization. To tackle the loop closure problem, existing methods often leverage on the matching of visual features, which achieve good accuracy but require high computational resources. However, feature point based methods ignore the patterns of image, i.e., the shape of the objects as well as the distribution of objects in an image. It is believed that this information is usually unique for a scene and can be utilized to improve the performance of traditional loop closure detection methods. In this paper we leverage and compress the information into a binary image to accelerate an existing fast loop closure detection method via binary content. The proposed method can greatly reduce the computational cost without sacrificing recall rate. It consists of three parts: binary content construction, fast image retrieval and precise loop closure detection. No offline training is required. Our method is compared with the state-of-the-art loop closure detection methods and the results show that it outperforms the traditional methods at both recall rate and speed.

preprint2021arXiv

Feasible Computationally Efficient Path Planning for UAV Collision Avoidance

This paper presents a robust computationally efficient real-time collision avoidance algorithm for Unmanned Aerial Vehicle (UAV), namely Memory-based Wall Following-Artificial Potential Field (MWF-APF) method. The new algorithm switches between Wall-Following Method (WFM) and Artificial Potential Field method (APF) with improved situation awareness capability. Historical trajectory is taken into account to avoid repetitive wrong decision. Furthermore, it can be effectively applied to platform with low computing capability. As an example, a quad-rotor equipped with limited number of Time-of-Flight (TOF) rangefinders is adopted to validate the effectiveness and efficiency of this algorithm. Both software simulation and physical flight test have been conducted to demonstrate the capability of the MWF-APF method in complex scenarios.

preprint2021arXiv

Intensity-SLAM: Intensity Assisted Localization and Mapping for Large Scale Environment

Simultaneous Localization And Mapping (SLAM) is a task to estimate the robot location and to reconstruct the environment based on observation from sensors such as LIght Detection And Ranging (LiDAR) and camera. It is widely used in robotic applications such as autonomous driving and drone delivery. Traditional LiDAR-based SLAM algorithms mainly leverage the geometric features from the scene context, while the intensity information from LiDAR is ignored. Some recent deep-learning-based SLAM algorithms consider intensity features and train the pose estimation network in an end-to-end manner. However, they require significant data collection effort and their generalizability to environments other than the trained one remains unclear. In this paper we introduce intensity features to a SLAM system. And we propose a novel full SLAM framework that leverages both geometry and intensity features. The proposed SLAM involves both intensity-based front-end odometry estimation and intensity-based back-end optimization. Thorough experiments are performed including both outdoor autonomous driving and indoor warehouse robot manipulation. The results show that the proposed method outperforms existing geometric-only LiDAR SLAM methods.

preprint2021arXiv

Lightweight 3-D Localization and Mapping for Solid-State LiDAR

The LIght Detection And Ranging (LiDAR) sensor has become one of the most important perceptual devices due to its important role in simultaneous localization and mapping (SLAM). Existing SLAM methods are mainly developed for mechanical LiDAR sensors, which are often adopted by large scale robots. Recently, the solid-state LiDAR is introduced and becomes popular since it provides a cost-effective and lightweight solution for small scale robots. Compared to mechanical LiDAR, solid-state LiDAR sensors have higher update frequency and angular resolution, but also have smaller field of view (FoV), which is very challenging for existing LiDAR SLAM algorithms. Therefore, it is necessary to have a more robust and computationally efficient SLAM method for this new sensing device. To this end, we propose a new SLAM framework for solid-state LiDAR sensors, which involves feature extraction, odometry estimation, and probability map building. The proposed method is evaluated on a warehouse robot and a hand-held device. In the experiments, we demonstrate both the accuracy and efficiency of our method using an Intel L515 solid-state LiDAR. The results show that our method is able to provide precise localization and high quality mapping. We made the source codes public at \url{https://github.com/wh200720041/SSL_SLAM}.

preprint2021arXiv

Robust Output Regulation and Reinforcement Learning-based Output Tracking Design for Unknown Linear Discrete-Time Systems

In this paper, we investigate the optimal output tracking problem for linear discrete-time systems with unknown dynamics using reinforcement learning and robust output regulation theory. This output tracking problem only allows to utilize the outputs of the reference system and the controlled system, rather than their states, and differs from most existing tracking results that depend on the state of the system. The optimal tracking problem is formulated into a linear quadratic regulation problem by proposing a family of dynamic discrete-time controllers. Then, it is shown that solving the output tracking problem is equivalent to solving output regulation equations, whose solution, however, requires the knowledge of the complete and accurate system dynamics. To remove such a requirement, an off-policy reinforcement learning algorithm is proposed using only the measured output data along the trajectories of the system and the reference output. By introducing re-expression error and analyzing the rank condition of the parameterization matrix, we ensure the uniqueness of the proposed RL based optimal control via output feedback.

preprint2021arXiv

Towards Real-time Semantic RGB-D SLAM in Dynamic Environments

Most of the existing visual SLAM methods heavily rely on a static world assumption and easily fail in dynamic environments. Some recent works eliminate the influence of dynamic objects by introducing deep learning-based semantic information to SLAM systems. However such methods suffer from high computational cost and cannot handle unknown objects. In this paper, we propose a real-time semantic RGB-D SLAM system for dynamic environments that is capable of detecting both known and unknown moving objects. To reduce the computational cost, we only perform semantic segmentation on keyframes to remove known dynamic objects, and maintain a static map for robust camera tracking. Furthermore, we propose an efficient geometry module to detect unknown moving objects by clustering the depth image into a few regions and identifying the dynamic regions via their reprojection errors. The proposed method is evaluated on public datasets and real-world conditions. To the best of our knowledge, it is one of the first semantic RGB-D SLAM systems that run in real-time on a low-power embedded platform and provide high localization accuracy in dynamic environments.

preprint2021arXiv

Vision Based Autonomous UAV Plane Estimation And Following for Building Inspection

Unmanned Aerial Vehicle (UAV) has already demonstrated its potential in many civilian applications, and the façade inspection is among the most promising ones. In this paper, we focus on enabling the autonomous perception and control of a small UAV for a façade inspection task. Specifically, we consider the perception as a planar object pose estimation problem by simplifying the building structure as concatenation of planes, and the control as an optimal reference tracking control problem. First, a vision based adaptive observer is proposed which can realize stable plane pose estimation under very mild observation conditions. Second, a model predictive controller is designed to achieve stable tracking and smooth transition in a multi-plane scenario, while the persistent excitation (PE) condition of the observer and the maneuver constraints of the UAV are satisfied. The proposed autonomous plane pose estimation and plane tracking methods are tested in both simulation and practical building fasçade inspection scenarios, which demonstrate their effectiveness and practicability.

preprint2020arXiv

An Optimal Linear Attack Strategy on Remote State Estimation

This work considers the problem of designing an attack strategy on remote state estimation under the condition of strict stealthiness and $ε$-stealthiness of the attack. An attacker is assumed to be able to launch a linear attack to modify sensor data. A metric based on Kullback-Leibler divergence is adopted to quantify the stealthiness of the attack. We propose a generalized linear attack based on past attack signals and the latest innovation. We prove that the proposed approach can obtain an attack that can cause more estimation performance loss than linear attack strategies recently studied in the literature. The result thus provides a bound on the tradeoff between available information and attack performance, which is useful in the development of mitigation strategies. Finally, some numerical examples are given to evaluate the performance of the proposed strategy.

preprint2020arXiv

Bayesian Filtering with Unknown Sensor Measurement Losses

This work studies the state estimation problem of a stochastic nonlinear system with unknown sensor measurement losses. If the estimator knows the sensor measurement losses of a linear Gaussian system, the minimum variance estimate is easily computed by the celebrated intermittent Kalman filter (IKF). However, this will no longer be the case when the measurement losses are unknown and/or the system is nonlinear or non-Gaussian. By exploiting the binary property of the measurement loss process and the IKF, we design three suboptimal filters for the state estimation, i.e., BKF-I, BKF-II and RBPF. The BKF-I is based on the MAP estimator of the measurement loss process and the BKF-II is derived by estimating the conditional loss probability. The RBPF is a particle filter based algorithm which marginalizes out the loss process to increase the efficiency of particles. All the proposed filters can be easily implemented in recursive forms. Finally, a linear system, a target tracking system and a quadrotor's path control problem are included to illustrate their effectiveness, and show the tradeoff between computational complexity and estimation accuracy of the proposed filters.

preprint2020arXiv

Cooperative Pursuit with Multi-Pursuer and One Faster Free-moving Evader

This paper addresses a multi-pursuer single-evader pursuit-evasion game where the free-moving evader moves faster than the pursuers. Most of the existing works impose constraints on the faster evader such as limited moving area and moving direction. When the faster evader is allowed to move freely without any constraint, the main issues are how to form an encirclement to trap the evader into the capture domain, how to balance between forming an encirclement and approaching the faster evader, and what conditions make the capture possible. In this paper, a distributed pursuit algorithm is proposed to enable pursuers to form an encirclement and approach the faster evader. An algorithm that balances between forming an encirclement and approaching the faster evader is proposed. Moreover, sufficient capture conditions are derived based on the initial spatial distribution and the speed ratios of the pursuers and the evader. Simulation and experimental results on ground robots validate the effectiveness and practicability of the proposed method.

preprint2020arXiv

Distributed Aggregative Optimization over Multi-Agent Networks

This paper proposes a new framework for distributed optimization, called distributed aggregative optimization, which allows local objective functions to be dependent not only on their own decision variables, but also on the average of summable functions of decision variables of all other agents. To handle this problem, a distributed algorithm, called distributed gradient tracking (DGT), is proposed and analyzed, where the global objective function is strongly convex, and the communication graph is balanced and strongly connected. It is shown that the algorithm can converge to the optimal variable at a linear rate. A numerical example is provided to corroborate the theoretical result.

preprint2020arXiv

Distributed Online Convex Optimization with an Aggregative Variable

This paper investigates distributed online convex optimization in the presence of an aggregative variable without any global/central coordinators over a multi-agent network, where each individual agent is only able to access partial information of time-varying global loss functions, thus requiring local information exchanges between neighboring agents. Motivated by many applications in reality, the considered local loss functions depend not only on their own decision variables, but also on an aggregative variable, such as the average of all decision variables. To handle this problem, an Online Distributed Gradient Tracking algorithm (O-DGT) is proposed with exact gradient information and it is shown that the dynamic regret is upper bounded by three terms: a sublinear term, a path variation term, and a gradient variation term. Meanwhile, the O-DGT algorithm is also analyzed with stochastic/noisy gradients, showing that the expected dynamic regret has the same upper bound as the exact gradient case. To our best knowledge, this paper is the first to study online convex optimization in the presence of an aggregative variable, which enjoys new characteristics in comparison with the conventional scenario without the aggregative variable. Finally, a numerical experiment is provided to corroborate the obtained theoretical results.

preprint2020arXiv

Distributed Online Optimization for Multi-Agent Networks with Coupled Inequality Constraints

This paper investigates the distributed online optimization problem over a multi-agent network subject to local set constraints and coupled inequality constraints, which has a lot of applications in many areas, such as wireless sensor networks, power systems and plug-in electric vehicles. In this problem, the cost function at each time step is the sum of local cost functions with each of them being gradually revealed to its corresponding agent, and meanwhile only local functions in coupled inequality constraints are accessible to each agent. To address this problem, a modified primal-dual algorithm, called distributed online primal-dual push-sum algorithm (DOPP), is developed in this paper, which does not rest on any assumption on parameter boundedness and is applicable to unbalanced networks. It is shown that the proposed algorithm is sublinear for both the dynamic regret and the violation of coupled inequality constraints. Finally, the theoretical results are supported by a simulation example.

preprint2020arXiv

Distributed Proximal Algorithms for Multi-Agent Optimization with Coupled Inequality Constraints

This paper aims to address distributed optimization problems over directed and time-varying networks, where the global objective function consists of a sum of locally accessible convex objective functions subject to a feasible set constraint and coupled inequality constraints whose information is only partially accessible to each agent. For this problem, a distributed proximal-based algorithm, called distributed proximal primal-dual (DPPD) algorithm, is proposed based on the celebrated centralized proximal point algorithm. It is shown that the proposed algorithm can lead to the global optimal solution with a general stepsize, which is diminishing and non-summable, but not necessarily square-summable, and the saddle-point running evaluation error vanishes proportionally to $O(1/\sqrt{k})$, where $k>0$ is the iteration number. Finally, a simulation example is presented to corroborate the effectiveness of the proposed algorithm.

preprint2020arXiv

Efficient Trajectory Planning for Multiple Non-holonomic Mobile Robots via Prioritized Trajectory Optimization

In this paper, we present a novel approach to efficiently generate collision-free optimal trajectories for multiple non-holonomic mobile robots in obstacle-rich environments. Our approach first employs a graph-based multi-agent path planner to find an initial discrete solution, and then refines this solution into smooth trajectories using nonlinear optimization. We divide the robot team into small groups and propose a prioritized trajectory optimization method to improve the scalability of the algorithm. Infeasible sub-problems may arise in some scenarios because of the decoupled optimization framework. To handle this problem, a novel grouping and priority assignment strategy is developed to increase the probability of finding feasible trajectories. Compared to the coupled trajectory optimization, the proposed approach reduces the computation time considerably with a small impact on the optimality of the plans. Simulations and hardware experiments verified the effectiveness and superiority of the proposed approach.

preprint2020arXiv

Graph Optimization Approach to Range-based Localization

In this paper, we propose a general graph optimization based framework for localization, which can accommodate different types of measurements with varying measurement time intervals. Special emphasis will be on range-based localization. Range and trajectory smoothness constraints are constructed in a position graph, then the robot trajectory over a sliding window is estimated by a graph based optimization algorithm. Moreover, convergence analysis of the algorithm is provided, and the effects of the number of iterations and window size in the optimization on the localization accuracy are analyzed. Extensive experiments on quadcopter under a variety of scenarios verify the effectiveness of the proposed algorithm and demonstrate a much higher localization accuracy than the existing range-based localization methods, especially in the altitude direction.

preprint2020arXiv

Kervolutional Neural Networks

Convolutional neural networks (CNNs) have enabled the state-of-the-art performance in many computer vision tasks. However, little effort has been devoted to establishing convolution in non-linear space. Existing works mainly leverage on the activation layers, which can only provide point-wise non-linearity. To solve this problem, a new operation, kervolution (kernel convolution), is introduced to approximate complex behaviors of human perception systems leveraging on the kernel trick. It generalizes convolution, enhances the model capacity, and captures higher order interactions of features, via patch-wise kernel functions, but without introducing additional parameters. Extensive experiments show that kervolutional neural networks (KNN) achieve higher accuracy and faster convergence than baseline CNN.

preprint2020arXiv

Multi-Path Region Mining For Weakly Supervised 3D Semantic Segmentation on Point Clouds

Point clouds provide intrinsic geometric information and surface context for scene understanding. Existing methods for point cloud segmentation require a large amount of fully labeled data. Using advanced depth sensors, collection of large scale 3D dataset is no longer a cumbersome process. However, manually producing point-level label on the large scale dataset is time and labor-intensive. In this paper, we propose a weakly supervised approach to predict point-level results using weak labels on 3D point clouds. We introduce our multi-path region mining module to generate pseudo point-level label from a classification network trained with weak labels. It mines the localization cues for each class from various aspects of the network feature using different attention modules. Then, we use the point-level pseudo labels to train a point cloud segmentation network in a fully supervised manner. To the best of our knowledge, this is the first method that uses cloud-level weak labels on raw 3D space to train a point cloud semantic segmentation network. In our setting, the 3D weak labels only indicate the classes that appeared in our input sample. We discuss both scene- and subcloud-level weakly labels on raw 3D point cloud data and perform in-depth experiments on them. On ScanNet dataset, our result trained with subcloud-level labels is compatible with some fully supervised methods.

preprint2020arXiv

Online Visual Place Recognition via Saliency Re-identification

As an essential component of visual simultaneous localization and mapping (SLAM), place recognition is crucial for robot navigation and autonomous driving. Existing methods often formulate visual place recognition as feature matching, which is computationally expensive for many robotic applications with limited computing power, e.g., autonomous driving and cleaning robot. Inspired by the fact that human beings always recognize a place by remembering salient regions or landmarks that are more attractive or interesting than others, we formulate visual place recognition as saliency re-identification. In the meanwhile, we propose to perform both saliency detection and re-identification in frequency domain, in which all operations become element-wise. The experiments show that our proposed method achieves competitive accuracy and much higher speed than the state-of-the-art feature-based methods. The proposed method is open-sourced and available at https://github.com/wh200720041/SRLCD.git.

preprint2020arXiv

Optimal Local and Remote Controls of Multiple Systems with Multiplicative Noises and Unreliable Uplink Channels

In this paper, the optimal local and remote linear quadratic (LQ) control problem is studied for a networked control system (NCS) which consists of multiple subsystems and each of which is described by a general multiplicative noise stochastic system with one local controller and one remote controller. Due to the unreliable uplink channels, the remote controller can only access unreliable state information of all subsystems, while the downlink channels from the remote controller to the local controllers are perfect. The difficulties of the LQ control problem for such a system arise from the different information structures of the local controllers and the remote controller. By developing the Pontyagin maximum principle, the necessary and sufficient solvability conditions are derived, which are based on the solution to a group of forward and backward difference equations (G-FBSDEs). Furthermore, by proposing a new method to decouple the G-FBSDEs and introducing new coupled Riccati equations (CREs), the optimal control strategies are derived where we verify that the separation principle holds for the multiplicative noise NCSs with packet dropouts. This paper can be seen as an important contribution to the optimal control problem with asymmetric information structures.

preprint2020arXiv

Temporal Logic Trees for Model Checking and Control Synthesis of Uncertain Discrete-time Systems

We propose algorithms for performing model checking and control synthesis for discrete-time uncertain systems under linear temporal logic (LTL) specifications. We construct temporal logic trees (TLT) from LTL formulae via reachability analysis. In contrast to automaton-based methods, the construction of the TLT is abstraction-free for infinite systems, that is, we do not construct discrete abstractions of the infinite systems. Moreover, for a given transition system and an LTL formula, we prove that there exist both a universal TLT and an existential TLT via minimal and maximal reachability analysis, respectively. We show that the universal TLT is an underapproximation for the LTL formula and the existential TLT is an overapproximation. We provide sufficient conditions and necessary conditions to verify whether a transition system satisfies an LTL formula by using the TLT approximations. As a major contribution of this work, for a controlled transition system and an LTL formula, we prove that a controlled TLT can be constructed from the LTL formula via control-dependent reachability analysis. Based on the controlled TLT, we design an online control synthesis algorithm, under which a set of feasible control inputs can be generated at each time step. We also prove that this algorithm is recursively feasible. We illustrate the proposed methods for both finite and infinite systems and highlight the generality and online scalability with two simulated examples.

preprint2020arXiv

Towards Stable and Comprehensive Domain Alignment: Max-Margin Domain-Adversarial Training

Domain adaptation tackles the problem of transferring knowledge from a label-rich source domain to a label-scarce or even unlabeled target domain. Recently domain-adversarial training (DAT) has shown promising capacity to learn a domain-invariant feature space by reversing the gradient propagation of a domain classifier. However, DAT is still vulnerable in several aspects including (1) training instability due to the overwhelming discriminative ability of the domain classifier in adversarial training, (2) restrictive feature-level alignment, and (3) lack of interpretability or systematic explanation of the learned feature space. In this paper, we propose a novel Max-margin Domain-Adversarial Training (MDAT) by designing an Adversarial Reconstruction Network (ARN). The proposed MDAT stabilizes the gradient reversing in ARN by replacing the domain classifier with a reconstruction network, and in this manner ARN conducts both feature-level and pixel-level domain alignment without involving extra network structures. Furthermore, ARN demonstrates strong robustness to a wide range of hyper-parameters settings, greatly alleviating the task of model selection. Extensive empirical results validate that our approach outperforms other state-of-the-art domain alignment methods. Moreover, reconstructing adapted features reveals the domain-invariant feature space which conforms with our intuition.