Source author record

You Li

You Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.SP Robotics Artificial Intelligence Computer Vision Machine Learning Computation and Language Computer Science and Game Theory eess.SY physics.optics Systems and Control

Catalog footprint

What is connected

14works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction

Recent progress in multimodal large language models (MLLMs) has brought AI capabilities from static offline data processing to real-time streaming interaction, yet they still remain far from human-level multimodal interaction. The key bottlenecks are no longer modality coverage or latency alone, but the interaction paradigm itself. First, perception and response are still separated into alternating phases, preventing models from incorporating new inputs for timely adjustment during generation. Second, most current models remain reactive, responding only to explicit user requests instead of acting proactively in the evolving multimodal environment. We present MiniCPM-o 4.5, our latest effort towards human-like multimodal interaction, which mitigates these gaps by real-time full-duplex omni-modal interaction. It can see, listen, and speak simultaneously in real-time, while also exhibiting proactive behaviors such as issuing reminders or comments based on its continuous understanding of the live scene. The key technique behind MiniCPM-o 4.5 is Omni-Flow, a unified streaming framework that aligns omni-modal inputs and outputs along a shared temporal axis. This formulation converts conventional turn-based interaction into a full-duplex, time-aligned process, enabling simultaneous perception and response and allowing proactive behavior to arise within the same framework. With a total of 9B parameters, MiniCPM-o 4.5 approaches Gemini 2.5 Flash in vision-language capabilities, delivering state-of-the-art open-source performance at its scale. It also surpasses Qwen3-Omni-30B-A3B in omni-modal understanding and delivers better speech generation, with significantly higher computation efficiency. Driven by its efficient architecture design and inference optimization, the model can perform real-time full-duplex omni-modal interaction on edge devices with less than 12GB RAM cost.

preprint2022arXiv

3D ToF LiDAR in Mobile Robotics: A Review

In the past ten years, the use of 3D Time-of-Flight (ToF) LiDARs in mobile robotics has grown rapidly. Based on our accumulation of relevant research, this article systematically reviews and analyzes the use 3D ToF LiDARs in research and industrial applications. The former includes object detection, robot localization, long-term autonomy, LiDAR data processing under adverse weather conditions, and sensor fusion. The latter encompasses service robots, assisted and autonomous driving, and recent applications performed in response to public health crises. We hope that our efforts can effectively provide readers with relevant references and promote the deployment of existing mature technologies in real-world systems.

preprint2022arXiv

Incentivizing Federated Learning

Federated Learning is an emerging distributed collaborative learning paradigm used by many of applications nowadays. The effectiveness of federated learning relies on clients' collective efforts and their willingness to contribute local data. However, due to privacy concerns and the costs of data collection and model training, clients may not always contribute all the data they possess, which would negatively affect the performance of the global model. This paper presents an incentive mechanism that encourages clients to contribute as much data as they can obtain. Unlike previous incentive mechanisms, our approach does not monetize data. Instead, we implicitly use model performance as a reward, i.e., significant contributors are paid off with better models. We theoretically prove that clients will use as much data as they can possibly possess to participate in federated learning under certain conditions with our incentive mechanism

preprint2022arXiv

Supporting GNSS Baseband Using Smartphone IMU and Ultra-Tight Integration

A great surge in the development of global navigation satellite systems (GNSS) excavates the potential for prosperity in many state-of-the-art technologies, e.g., autonomous ground vehicle navigation. Nevertheless, the GNSS is vulnerable to various ground interferences, which significantly break down the continuity of the navigation system. Meanwhile, the GNSS-based next-generation navigation devices are being developed to be smaller, more low-cost, and lightweight, as the commercial market forecasts. This work aims to answer whether the smartphone inertial measurement unit (IMU) is sufficient to support the GNSS baseband. Thus, a cascaded ultra-tightly coupled GNSS/inertial navigation system (INS) technique, where consumer-level smartphone sensors are used, is applied to improve the baseband of GNSS software-defined radios (SDRs). A Doppler value is predicted based on an integrated extended Kalman filter (EKF) navigator where the pseudorange-state-based measurements of GNSS and INS are fused. It is used to assist numerically controlled oscillators (NCOs) in the GNSS baseband. Then, an ultra-tight integration platform is built with the upgraded GNSS SDR, of which baseband processing is integrated with INS mechanization. Finally, tracking and carrier-based positioning performances are assessed in the proposed platform for the smartphone-IMU-aided GNSS baseband via kinematic field tests. The experimental results prove that extra hardware with only a few dollars instead of more expensive ones can improve the GNSS baseband efficiently.

preprint2021arXiv

Fusion of neural networks, for LIDAR-based evidential road mapping

LIDAR sensors are usually used to provide autonomous vehicles with 3D representations of their environment. In ideal conditions, geometrical models could detect the road in LIDAR scans, at the cost of a manual tuning of numerical constraints, and a lack of flexibility. We instead propose an evidential pipeline, to accumulate road detection results obtained from neural networks. First, we introduce RoadSeg, a new convolutional architecture that is optimized for road detection in LIDAR scans. RoadSeg is used to classify individual LIDAR points as either belonging to the road, or not. Yet, such point-level classification results need to be converted into a dense representation, that can be used by an autonomous vehicle. We thus secondly present an evidential road mapping algorithm, that fuses consecutive road detection results. We benefitted from a reinterpretation of logistic classifiers, which can be seen as generating a collection of simple evidential mass functions. An evidential grid map that depicts the road can then be obtained, by projecting the classification results from RoadSeg into grid cells, and by handling moving objects via conflict analysis. The system was trained and evaluated on real-life data. A python implementation maintains a 10 Hz framerate. Since road labels were needed for training, a soft labelling procedure, relying lane-level HD maps, was used to generate coarse training and validation sets. An additional test set was manually labelled for evaluation purposes. So as to reach satisfactory results, the system fuses road detection results obtained from three variants of RoadSeg, processing different LIDAR features.

preprint2020arXiv

A General Architecture for Behavior Modeling of Nonlinear Power Amplifier using Deep Convolutional Neural Network

Nonlinearity of power amplifier is one of the major limitations to the achievable capacity in wireless transmission systems. Nonlinear impairments are determined by the nonlinear distortions of the power amplifier and modulator imperfections. The Volterra model, several compact Volterra models and neural network models to establish a nonlinear model of power amplifier have all been demonstrated. However, the computational cost of these models increases and their implementation demands more signal processing resources as the signal bandwidth gets wider or the number of carrier aggregation. A completely different approach uses deep convolutional neural network to learn from the training data to figure out the nonlinear distortion. In this work, a low complexity, general architecture based on the deep real-valued convolutional neural network (DRVCNN) is proposed to build the nonlinear behavior of the power amplifier. With each of the multiple inputs equivalent to an input vector, the DRVCNN tensor weights are constructed from training data thanks to the current and historical envelope-dependent terms, I, and Q, which are components of the input. The effectiveness of the general framework in modeling single-carrier and multi-carrier power amplifiers is verified.

preprint2020arXiv

Autonomous Calibration of MEMS Gyros in Consumer Portable Devices

This paper presents a real-time calibration method for gyro sensors in consumer portable devices. The calibration happens automatically without the need for external equipment or user intervention. Multi-level constraints, including the pseudo-observations, the accelerometer and magnetometer measurements, and the quasi-static attitude updates, are used to make the method reliable and accurate under natural user motions. Walking tests with the Samsung Galaxy S3 and S4 smartphones showed that the method estimate promising calibration results even under challenging motion modes such as dangling and pocket, and in challenging indoor environments with frequent magnetic interferences.

preprint2020arXiv

Deep Reinforcement Learning (DRL): Another Perspective for Unsupervised Wireless Localization

Location is key to spatialize internet-of-things (IoT) data. However, it is challenging to use low-cost IoT devices for robust unsupervised localization (i.e., localization without training data that have known location labels). Thus, this paper proposes a deep reinforcement learning (DRL) based unsupervised wireless-localization method. The main contributions are as follows. (1) This paper proposes an approach to model a continuous wireless-localization process as a Markov decision process (MDP) and process it within a DRL framework. (2) To alleviate the challenge of obtaining rewards when using unlabeled data (e.g., daily-life crowdsourced data), this paper presents a reward-setting mechanism, which extracts robust landmark data from unlabeled wireless received signal strengths (RSS). (3) To ease requirements for model re-training when using DRL for localization, this paper uses RSS measurements together with agent location to construct DRL inputs. The proposed method was tested by using field testing data from multiple Bluetooth 5 smart ear tags in a pasture. Meanwhile, the experimental verification process reflected the advantages and challenges for using DRL in wireless localization.

preprint2020arXiv

Driving among Flatmobiles: Bird-Eye-View occupancy grids from a monocular camera for holistic trajectory planning

Camera-based end-to-end driving neural networks bring the promise of a low-cost system that maps camera images to driving control commands. These networks are appealing because they replace laborious hand engineered building blocks but their black-box nature makes them difficult to delve in case of failure. Recent works have shown the importance of using an explicit intermediate representation that has the benefits of increasing both the interpretability and the accuracy of networks' decisions. Nonetheless, these camera-based networks reason in camera view where scale is not homogeneous and hence not directly suitable for motion forecasting. In this paper, we introduce a novel monocular camera-only holistic end-to-end trajectory planning network with a Bird-Eye-View (BEV) intermediate representation that comes in the form of binary Occupancy Grid Maps (OGMs). To ease the prediction of OGMs in BEV from camera images, we introduce a novel scheme where the OGMs are first predicted as semantic masks in camera view and then warped in BEV using the homography between the two planes. The key element allowing this transformation to be applied to 3D objects such as vehicles, consists in predicting solely their footprint in camera-view, hence respecting the flat world hypothesis implied by the homography.

preprint2020arXiv

Inertial Sensing Meets Artificial Intelligence: Opportunity or Challenge?

The inertial navigation system (INS) has been widely used to provide self-contained and continuous motion estimation in intelligent transportation systems. Recently, the emergence of chip-level inertial sensors has expanded the relevant applications from positioning, navigation, and mobile mapping to location-based services, unmanned systems, and transportation big data. Meanwhile, benefit from the emergence of big data and the improvement of algorithms and computing power, artificial intelligence (AI) has become a consensus tool that has been successfully applied in various fields. This article reviews the research on using AI technology to enhance inertial sensing from various aspects, including sensor design and selection, calibration and error modeling, navigation and motion-sensing algorithms, multi-sensor information fusion, system evaluation, and practical application. Based on the over 30 representative articles selected from the nearly 300 related publications, this article summarizes the state of the art, advantages, and challenges on each aspect. Finally, it summarizes nine advantages and nine challenges of AI-enhanced inertial sensing and then points out future research directions.

preprint2020arXiv

Lidar for Autonomous Driving: The principles, challenges, and trends for automotive lidar and perception systems

Autonomous vehicles rely on their perception systems to acquire information about their immediate surroundings. It is necessary to detect the presence of other vehicles, pedestrians and other relevant entities. Safety concerns and the need for accurate estimations have led to the introduction of Light Detection and Ranging (LiDAR) systems in complement to the camera or radar-based perception systems. This article presents a review of state-of-the-art automotive LiDAR technologies and the perception algorithms used with those technologies. LiDAR systems are introduced first by analyzing the main components, from laser transmitter to its beam scanning mechanism. Advantages/disadvantages and the current status of various solutions are introduced and compared. Then, the specific perception pipeline for LiDAR data processing, from an autonomous vehicle perspective is detailed. The model-driven approaches and the emerging deep learning solutions are reviewed. Finally, we provide an overview of the limitations, challenges and trends for automotive LiDARs and perception systems.

preprint2020arXiv

Towards Robust Crowdsourcing-Based Localization: A Fingerprinting Accuracy Indicator Enhanced Wireless/Magnetic/Inertial Integration Approach

The next-generation internet of things (IoT) systems have an increasingly demand on intelligent localization which can scale with big data without human perception. Thus, traditional localization solutions without accuracy metric will greatly limit vast applications. Crowdsourcing-based localization has been proven to be effective for mass-market location-based IoT applications. This paper proposes an enhanced crowdsourcing-based localization method by integrating inertial, wireless, and magnetic sensors. Both wireless and magnetic fingerprinting accuracy are predicted in real time through the introduction of fingerprinting accuracy indicators (FAI) from three levels (i.e., signal, geometry, and database). The advantages and limitations of these FAI factors and their performances on predicting location errors and outliers are investigated. Furthermore, the FAI-enhanced extended Kalman filter (EKF) is proposed, which improved the dead-reckoning (DR)/WiFi, DR/Magnetic, and DR/WiFi/Magnetic integrated localization accuracy by 30.2 %, 19.4 %, and 29.0 %, and reduced the maximum location errors by 41.2 %, 28.4 %, and 44.2 %, respectively. These outcomes confirm the effectiveness of the FAI-enhanced EKF on improving both accuracy and reliability of multi-sensor integrated localization using crowdsourced data.

preprint2020arXiv

What happens to a ToF LiDAR in fog?

This article focuses on analyzing the performance of a typical time-of-flight (ToF) LiDAR under fog environment. By controlling the fog density within CEREMA Adverse Weather Facility 1 , the relations between the ranging performance and fogs are both qualitatively and quantitatively investigated. Furthermore, based on the collected data, a machine learning based model is trained to predict the minimum fog visibility that allows successful ranging for this type of LiDAR. The revealed experimental results and methods are helpful for ToF LiDAR specifications from automotive industry.

preprint2015arXiv

Large-angle quasi-self-collimation effect in a rod-type photonic crystal

A rod-type photonic crystal (PC) with a rectangular lattice shows a large-angle quasi-self-collimation (quasi-SC) effect by changing the symmetry of its rectangular lattice to straighten one of the isofrequency contours. To investigate the straightness of the isofrequency contour as well as the quasi-SC effect, we propose a straightness factor L based on the method of least squares. With L smaller than L0 (L0 = 0.01 is the critical value), the isofrequency contour is sufficiently straight to induce quasi-SC effect with the beam quasi-collimating in the structure. Furthermore, the efficiency of light coupling to the quasi-SC PC is studied, and can be greatly improved by applying a carefully designed antireflection layer.

You Li

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction

3D ToF LiDAR in Mobile Robotics: A Review

Incentivizing Federated Learning

Supporting GNSS Baseband Using Smartphone IMU and Ultra-Tight Integration

Fusion of neural networks, for LIDAR-based evidential road mapping

A General Architecture for Behavior Modeling of Nonlinear Power Amplifier using Deep Convolutional Neural Network

Autonomous Calibration of MEMS Gyros in Consumer Portable Devices

Deep Reinforcement Learning (DRL): Another Perspective for Unsupervised Wireless Localization

Driving among Flatmobiles: Bird-Eye-View occupancy grids from a monocular camera for holistic trajectory planning

Inertial Sensing Meets Artificial Intelligence: Opportunity or Challenge?

Lidar for Autonomous Driving: The principles, challenges, and trends for automotive lidar and perception systems

Towards Robust Crowdsourcing-Based Localization: A Fingerprinting Accuracy Indicator Enhanced Wireless/Magnetic/Inertial Integration Approach

What happens to a ToF LiDAR in fog?

Large-angle quasi-self-collimation effect in a rod-type photonic crystal