Researcher profile

Lei Gao

Lei Gao contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
19works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

19 published item(s)

preprint2026arXiv

Segment-Aligned Policy Optimization for Multi-Modal Reasoning

Existing reinforcement learning approaches for Large Language Models typically perform policy optimization at the granularity of individual tokens or entire response sequences. However, such formulations often misalign with the natural step-wise structure of reasoning processes, leading to suboptimal credit assignment and unstable training in multi-modal reasoning tasks. To bridge this gap, we propose Segment-Aligned Policy Optimization (SAPO), a novel reinforcement learning paradigm that treats coherent reasoning steps, rather than tokens or full sequences as fundamental units of policy update. SAPO introduces a step-wise Markov decision process abstraction over reasoning segments, accompanied by segment-level value estimation, advantage computation, and importance sampling mechanisms that are semantically aligned with reasoning boundaries. Experiments on representative reasoning benchmarks demonstrate that SAPO consistently outperforms token-level and sequence-level policy optimization methods, achieving significant accuracy improvements while exhibiting better training stability and value estimation consistency. Our work underscores the importance of aligning reinforcement learning updates with the intrinsic structure of reasoning, paving the way for more efficient and semantically grounded policy optimization in complex reasoning tasks. Codes and models will be released to ensure full reproducibility.

preprint2025arXiv

AcoustoBots: A swarm of robots for acoustophoretic multimodal interactions

Acoustophoresis has enabled novel interaction capabilities, such as levitation, volumetric displays, mid-air haptic feedback, and directional sound generation, to open new forms of multimodal interactions. However, its traditional implementation as a singular static unit limits its dynamic range and application versatility. This paper introduces AcoustoBots - a novel convergence of acoustophoresis with a movable and reconfigurable phased array of transducers for enhanced application versatility. We mount a phased array of transducers on a swarm of robots to harness the benefits of multiple mobile acoustophoretic units. This offers a more flexible and interactive platform that enables a swarm of acoustophoretic multimodal interactions. Our novel AcoustoBots design includes a hinge actuation system that controls the orientation of the mounted phased array of transducers to achieve high flexibility in a swarm of acoustophoretic multimodal interactions. In addition, we designed a BeadDispenserBot that can deliver particles to trapping locations, which automates the acoustic levitation interaction. These attributes allow AcoustoBots to independently work for a common cause and interchange between modalities, allowing for novel augmentations (e.g., a swarm of haptics, audio, and levitation) and bilateral interactions with users in an expanded interaction area. We detail our design considerations, challenges, and methodological approach to extend acoustophoretic central control in distributed settings. This work demonstrates a scalable acoustic control framework with two mobile robots, laying the groundwork for future deployment in larger robotic swarms. Finally, we characterize the performance of our AcoustoBots and explore the potential interactive scenarios they can enable.

preprint2022arXiv

Early systems change necessary for catalyzing long-term sustainability in a post-2030 agenda

Progress to-date towards the Sustainable Development Goals (SDGs) has fallen short of expectations and is unlikely to fully meet 2030 targets. Despite the little chance of imminent success, past assessments have mostly focused on short- and medium-term evaluations, limiting the ability to explore the longer-term effects of systemic interactions with time lags and delay. Here we undertake global systems modelling with a longer-term view than previous assessments to explore the drivers of sustainability progress and how they could emerge by 2030, 2050, and 2100 under different development pathways and towards quantitative targets. We find that early planning for systems change to shift from business-as-usual to more sustainable pathways is important for accelerating progress towards increasingly ambitious targets by 2030, 2050, and 2100. These findings indicate the importance of adopting longer-term timeframes and pathways to ensure that the necessary pre-conditions are in place for sustainability beyond the current 2030 Agenda.

preprint2022arXiv

Federated Learning for Internet of Things: Applications, Challenges, and Opportunities

Billions of IoT devices will be deployed in the near future, taking advantage of faster Internet speed and the possibility of orders of magnitude more endpoints brought by 5G/6G. With the growth of IoT devices, vast quantities of data that may contain users' private information will be generated. The high communication and storage costs, mixed with privacy concerns, will increasingly challenge the traditional ecosystem of centralized over-the-cloud learning and processing for IoT platforms. Federated Learning (FL) has emerged as the most promising alternative approach to this problem. In FL, training data-driven machine learning models is an act of collaboration between multiple clients without requiring the data to be brought to a central point, hence alleviating communication and storage costs and providing a great degree of user-level privacy. However, there are still some challenges existing in the real FL system implementation on IoT networks. In this paper, we will discuss the opportunities and challenges of FL in IoT platforms, as well as how it can enable diverse IoT applications. In particular, we identify and discuss seven critical challenges of FL in IoT platforms and highlight some recent promising approaches towards addressing them.

preprint2022arXiv

On the existence of pure, broadband toroidal sources in electrodynamics

Multipoles are paramount for describing electromagnetic fields in many areas of nanoscale optics, playing an essential role for the design of devices in plasmonics and all-dielectric nanophotonics. Challenging the traditional division into electric and magnetic moments, toroidal moments have been proposed as a physically distinct family of multipoles with significant contributions to the properties of matter. However, the apparent impossibility of separately measuring their response sheds doubt on their true physical significance. Here, we confirm the possibility of selectively exciting toroidal moments without any other multipole. We develop a set of general conditions that any current distribution must fulfill to be entirely described by toroidal moments, and prove our results in an analytically solvable case. Our new theory allows us to design and verify experimentally for the first time an artificial structure supporting a pure, broadband toroidal dipole response in the complete absence of the electric dipole and other 'ordinary' multipole contributions. In addition, we propose a structure capable of supporting a novel type of nonradiating source, a 'toroidal anapole', originating from the destructive interference of the toroidal dipole with the unconventional electromagnetic sources known as mean square radii. The results in this work provide conclusive evidence on the independent excitation of toroidal moments in electrodynamics.

preprint2022arXiv

Single-shot measurement of frequency-resolved state of polarization dynamics in ultrafast lasers using dispersed division-of-amplitude

Precise measurement of multi-parameters of ultrafast lasers is vital both in scientific investigations and technical applications, such as, optical field manipulation, pulse shaping, sample characteristics test, and biomedical imaging. Tremendous progress in parameter measurement of ultrafast laser has been made, including single-shot spectra acquired by time-stretch dispersive Fourier transform in spectral domain, and pulse magnification or compression realized by time lens in temporal domain. Nevertheless, single-shot measurement of frequency-resolved states of polarization (SOPs) of ultrafast lasers has not been reported so far, and the unregular SOP evolution dynamics in ultrafast pulses is hardly explored. Here, we demonstrate a new single-shot frequency-resolved SOPs measurement system by utilizing division-of-amplitude method under far-field approximation. Large dispersion is utilized to time-stretch the laser pulses, where the spectrum information is mapped into temporal waveform via dispersive Fourier transform. By calibrating system matrix with different wavelengths, the precise frequency-resolved SOPs are obtained together with high speed opto-electron detection. We demonstrate applications in direct measurement of transient mode-locked fiber laser dynamics. We observe complex frequency-dependent SOPs dynamics in the building up of dissipative solitons, and apparent discrepancy of SOPs between sideband and main peak in conventional solitons. Our observations reveal that the SOP plays a far more complex part in mode-locking process, which is different from the traditional viewpoint. Taking advantage of broadband achromatic optical elements, this method can be extended to measurement of much broad pulse lasers, which will pave the way for reliable measurement and precise control of ultrafast lasers with frequency-resolved SOPs structures.

preprint2022arXiv

Vision Transformer with Convolutions Architecture Search

Transformers exhibit great advantages in handling computer vision tasks. They model image classification tasks by utilizing a multi-head attention mechanism to process a series of patches consisting of split images. However, for complex tasks, Transformer in computer vision not only requires inheriting a bit of dynamic attention and global context, but also needs to introduce features concerning noise reduction, shifting, and scaling invariance of objects. Therefore, here we take a step forward to study the structural characteristics of Transformer and convolution and propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS). The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture while maintaining the benefits of the multi-head attention mechanism. The searched block-based backbone network can extract feature maps at different scales. These features are compatible with a wider range of visual tasks, such as image classification (32 M parameters, 82.0% Top-1 accuracy on ImageNet-1K) and object detection (50.4% mAP on COCO2017). The proposed topology based on the multi-head attention mechanism and CNN adaptively associates relational features of pixels with multi-scale features of objects. It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.

preprint2021arXiv

A Complete Discriminative Tensor Representation Learning for Two-Dimensional Correlation Analysis

As an effective tool for two-dimensional data analysis, two-dimensional canonical correlation analysis (2DCCA) is not only capable of preserving the intrinsic structural information of original two-dimensional (2D) data, but also reduces the computational complexity effectively. However, due to the unsupervised nature, 2DCCA is incapable of extracting sufficient discriminatory representations, resulting in an unsatisfying performance. In this letter, we propose a complete discriminative tensor representation learning (CDTRL) method based on linear correlation analysis for analyzing 2D signals (e.g. images). This letter shows that the introduction of the complete discriminatory tensor representation strategy provides an effective vehicle for revealing, and extracting the discriminant representations across the 2D data sets, leading to improved results. Experimental results show that the proposed CDTRL outperforms state-of-the-art methods on the evaluated data sets.

preprint2021arXiv

Discriminative Multiple Canonical Correlation Analysis for Information Fusion

In this paper, we propose the Discriminative Multiple Canonical Correlation Analysis (DMCCA) for multimodal information analysis and fusion. DMCCA is capable of extracting more discriminative characteristics from multimodal information representations. Specifically, it finds the projected directions which simultaneously maximize the within-class correlation and minimize the between-class correlation, leading to better utilization of the multimodal information. In the process, we analytically demonstrate that the optimally projected dimension by DMCCA can be quite accurately predicted, leading to both superior performance and substantial reduction in computational cost. We further verify that Canonical Correlation Analysis (CCA), Multiple Canonical Correlation Analysis (MCCA) and Discriminative Canonical Correlation Analysis (DCCA) are special cases of DMCCA, thus establishing a unified framework for Canonical Correlation Analysis. We implement a prototype of DMCCA to demonstrate its performance in handwritten digit recognition and human emotion recognition. Extensive experiments show that DMCCA outperforms the traditional methods of serial fusion, CCA, MCCA and DCCA.

preprint2021arXiv

Federated Learning for Internet of Things: A Federated Learning Framework for On-device Anomaly Data Detection

Federated learning can be a promising solution for enabling IoT cybersecurity (i.e., anomaly detection in the IoT environment) while preserving data privacy and mitigating the high communication/storage overhead (e.g., high-frequency data from time-series sensors) of centralized over-the-cloud approaches. In this paper, to further push forward this direction with a comprehensive study in both algorithm and system design, we build FedIoT platform that contains FedDetect algorithm for on-device anomaly data detection and a system design for realistic evaluation of federated learning on IoT devices. Furthermore, the proposed FedDetect learning framework improves the performance by utilizing a local adaptive optimizer (e.g., Adam) and a cross-round learning rate scheduler. In a network of realistic IoT devices (Raspberry PI), we evaluate FedIoT platform and FedDetect algorithm in both model and system performance. Our results demonstrate the efficacy of federated learning in detecting a wider range of attack types occurred at multiple devices. The system efficiency analysis indicates that both end-to-end training time and memory cost are affordable and promising for resource-constrained IoT devices. The source code is publicly available at https://github.com/FedML-AI/FedIoT.

preprint2021arXiv

MLMA-Net: multi-level multi-attentional learning for multi-label object detection in textile defect images

For the sake of recognizing and classifying textile defects, deep learning-based methods have been proposed and achieved remarkable success in single-label textile images. However, detecting multi-label defects in a textile image remains challenging due to the coexistence of multiple defects and small-size defects. To address these challenges, a multi-level, multi-attentional deep learning network (MLMA-Net) is proposed and built to 1) increase the feature representation ability to detect small-size defects; 2) generate a discriminative representation that maximizes the capability of attending the defect status, which leverages higher-resolution feature maps for multiple defects. Moreover, a multi-label object detection dataset (DHU-ML1000) in textile defect images is built to verify the performance of the proposed model. The results demonstrate that the network extracts more distinctive features and has better performance than the state-of-the-art approaches on the real-world industrial dataset.

preprint2021arXiv

Online Behavioral Analysis with Application to Emotion State Identification

In this paper, we propose a novel discriminative model for online behavioral analysis with application to emotion state identification. The proposed model is able to extract more discriminative characteristics from behavioral data effectively and find the direction of optimal projection efficiently to satisfy requirements of online data analysis, leading to better utilization of the behavioral information to produce more accurate recognition results.

preprint2021arXiv

The Labeled Multiple Canonical Correlation Analysis for Information Fusion

The objective of multimodal information fusion is to mathematically analyze information carried in different sources and create a new representation which will be more effectively utilized in pattern recognition and other multimedia information processing tasks. In this paper, we introduce a new method for multimodal information fusion and representation based on the Labeled Multiple Canonical Correlation Analysis (LMCCA). By incorporating class label information of the training samples,the proposed LMCCA ensures that the fused features carry discriminative characteristics of the multimodal information representations, and are capable of providing superior recognition performance. We implement a prototype of LMCCA to demonstrate its effectiveness on handwritten digit recognition,face recognition and object recognition utilizing multiple features,bimodal human emotion recognition involving information from both audio and visual domains. The generic nature of LMCCA allows it to take as input features extracted by any means,including those by deep learning (DL) methods. Experimental results show that the proposed method enhanced the performance of both statistical machine learning (SML) methods, and methods based on DL.

preprint2021arXiv

The Property of Frequency Shift in 2D-FRFT Domain with Application to Image Encryption

The Fractional Fourier Transform (FRFT) has been playing a unique and increasingly important role in signal and image processing. In this letter, we investigate the property of frequency shift in two-dimensional FRFT (2D-FRFT) domain. It is shown that the magnitude of image reconstruction from phase information is frequency shift-invariant in 2D-FRFT domain, enhancing the robustness of image encryption, an important multimedia security task. Experiments are conducted to demonstrate the effectiveness of this property against the frequency shift attack, improving the robustness of image encryption.

preprint2020arXiv

Dynamic Partial Removal: A Neural Network Heuristic for Large Neighborhood Search

This paper presents a novel neural network design that learns the heuristic for Large Neighborhood Search (LNS). LNS consists of a destroy operator and a repair operator that specify a way to carry out the neighborhood search to solve the Combinatorial Optimization problems. The proposed approach in this paper applies a Hierarchical Recurrent Graph Convolutional Network (HRGCN) as a LNS heuristic, namely Dynamic Partial Removal, with the advantage of adaptive destruction and the potential to search across a large scale, as well as the context-awareness in both spatial and temporal perspective. This model is generalized as an efficient heuristic approach to different combinatorial optimization problems, especially to the problems with relatively tight constraints. We apply this model to vehicle routing problem (VRP) in this paper as an example. The experimental results show that this approach outperforms the traditional LNS heuristics on the same problem as well. The source code is available at \href{https://github.com/water-mirror/DPR}{https://github.com/water-mirror/DPR}.

preprint2020arXiv

Learn to Design the Heuristics for Vehicle Routing Problem

This paper presents an approach to learn the local-search heuristics that iteratively improves the solution of Vehicle Routing Problem (VRP). A local-search heuristics is composed of a destroy operator that destructs a candidate solution, and a following repair operator that rebuilds the destructed one into a new one. The proposed neural network, as trained through actor-critic framework, consists of an encoder in form of a modified version of Graph Attention Network where node embeddings and edge embeddings are integrated, and a GRU-based decoder rendering a pair of destroy and repair operators. Experiment results show that it outperforms both the traditional heuristics algorithms and the existing neural combinatorial optimization for VRP on medium-scale data set, and is able to tackle the large-scale data set (e.g., over 400 nodes) which is a considerable challenge in this area. Moreover, the need for expertise and handcrafted heuristics design is eliminated due to the fact that the proposed network learns to design the heuristics with a better performance. Our implementation is available online.

preprint2020arXiv

Phase-gradient metasurfaces based on local Fabry-Perot resonances

In this work we present a new mechanism for designing phase-gradient metasurfaces (PGMs) to control an electromagnetic wavefront with high efficiency. Specifically, we design a transmission-type PGM formed by a periodic subwavelength metallic slit array filled with identical dielectrics of different heights. It is found that when Fabry-Perot (FP) resonances occur locally inside the dielectric regions, in addition to the common phenomenon of complete transmission, the transmitted phase differences between two adjacent slits are exactly the same, being a non-zero constant. These local FP resonances ensure total phase shift across a supercell that can fully cover the range of 0 to 2Pi, satisfying the design requirements of PGMs. More studies reveal that due to local FP resonances, there is a one-to-one correspondence between the phase difference and the permittivity of the filled dielectric. A similar approach can be extended to the reflection-type case and other wavefront transformation, creating new opportunities for wave manipulation.

preprint2020arXiv

Predicting Geographic Information with Neural Cellular Automata

This paper presents a novel framework using neural cellular automata (NCA) to regenerate and predict geographic information. The model extends the idea of using NCA to generate/regenerate a specific image by training the model with various geographic data, and thus, taking the traffic condition map as an example, the model is able to predict traffic conditions by giving certain induction information. Our research verified the analogy between NCA and gene in biology, while the innovation of the model significantly widens the boundary of possible applications based on NCAs. From our experimental results, the model shows great potentials in its usability and versatility which are not available in previous studies. The code for model implementation is available at https://redacted.

preprint2020arXiv

Switchable bifunctional metasurfaces: nearly perfect retroreflection and absorption at THz regime

Here we make use of vanadium dioxide (VO2) to design a bifunctional metasurface working at the same targeted frequency. With the increase of temperature, the functionality of the designed metasurface can switch from a multi-channel retroreflector to a perfect absorber, caused by the phase transition of VO2 from insulator to conductor. Different from traditional bifunctional metasurfaces designed by simple composition of two functionalities, our proposed bifunctional metasurface is based on the interaction between two functionalities. The device shows good potential for the combination of wavefront manipulation and optical absorption, therefore providing a promising approach for switchable detection and anti-detection devices.