Researcher profile

Lei Cheng

Lei Cheng contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

Intent-Driven Semantic ID Generation for Grounded Conversational News Recommendation

Conversational news recommendation requires grounding each suggestion in a rapidly evolving article corpus while addressing implicit user intents that lack explicit retrievable keywords. To characterize this scenario, we identify 6 intent types from production dialogues: five are implicit and pose fundamental challenges to standard RAG pipelines, forming a critical retrieve-first bottleneck. To address these issues, we introduce intent-driven Semantic ID (SID) generation under a Generate-then-Match paradigm. With two-stage training that consists of multi-task SID alignment and GPT-4 Chain-of-Thought distillation, an LLM maps diverse intents to hierarchical SID prefixes, which are then fuzzy-matched to the current news pool to guarantee fully grounded recommendations. Profile-Aware Dual-Signal Reasoning (PADR) further enables cold-start users to obtain valid recommendations using only profiles. On a mainstream Chinese news platform, our 7B model achieves 0% hallucination and 12.4% L1 match in the 152K open-generation SID space (4x random baseline). It matches GPT-4+Hybrid RAG on L1 while surpassing it on finer-grained metrics (L2 2x, Category +1.2pp) at ~100x lower cost. Cold-start users, where existing baselines score 0%, achieve 18.0% L1 (6x random), the highest among all user groups.

preprint2026arXiv

Performance Monitoring of Proton Exchange Membrane Water Electrolyzer by Transformers-Based Machine Learning Model

Green hydrogen plays an essential role in decarbonization, with capacity projected to scale to 560 GW by 2030 (vs. 1.39 GW in 2023) in net-zero settings. Proton exchange membrane (PEM) electrolysis is one of the most promising technology routes to green hydrogen production, and real-time system health monitoring of PEM electrolyzers is essential for their scalable deployment. In lab settings, performance degradation can be characterized through electrochemical testing protocols by periodic pauses of normal operation. Such interruption is not practical for full-scale stack deployments, limiting system operators' ability to make real-time assessments of state-of-health (SoH). We present a machine learning (ML) framework that performs virtual electrochemical characterization during normal operation. The method uses an encoder-decoder transformer, conditioned on operational data, to reconstruct characterization outputs, focusing here on polarization curves. Inspired by patch-based sequence tokenization, we segment the inputs into patches and encode them to form meaningful tokens, which substantially improves learning efficiency. Across four longitudinal runs, lasting up to 478 hours on different test cells and loading cycles, the model accurately reconstructed polarization curves and achieved 10x reduction in mean squared error (MSE) compared to a vanilla transformer. This proof-of-concept demonstrates that ML models can enable continuous performance monitoring for PEM electrolyzers and that the encoder captures meaningful latent representations of SoH, opening up opportunities to derive interpretable indicators in future work.

preprint2026arXiv

Reconciling Contradictory Views on the Effectiveness of SFT in LLMs: An Interaction Perspective

This paper explores a scientific question in supervised fine-tuning (SFT): why SFT is broadly effective for small-scale deep neural networks, yet can produce inconsistent or even detrimental effects when applied to large language models (LLMs). Recent advances in interaction-based explanations suggest that interactions between words/tokens provide a faithful metric for quantifying the inference patterns encoded by LLMs. We find that the evolution of interactions during SFT can effectively explain the inconsistent effectiveness of SFT for LLMs. Specifically, we find that (1) SFT primarily removes noise-like interactions, while rarely acquiring reliable new interactions. (2) This denoising stage is extremely brief, after which continued fine-tuning tends to introduce overfitted interactions. We validate these findings across multiple LLMs and datasets. Our findings provide new insights into early stopping and offer practical guidance for LLM training.

preprint2026arXiv

StreamPhy: Streaming Inference of High-Dimensional Physical Dynamics via State Space Models

Inferring the evolution of high-dimensional and multi-modal (e.g., spatio-temporal) physical fields from irregular sparse measurements in real time is a fundamental challenge in science and engineering. Existing approaches, including diffusion-based generative models and functional tensor methods, typically operate in offline settings, depend on full temporal observations, or incur substantial inference cost. We propose StreamPhy, an end-to-end framework that enables efficient and accurate streaming inference of full-field physical dynamics from incoming irregular sparse measurements. The framework integrates a data-adaptive observation encoder that is robust to arbitrary observation patterns, a structured state-space model that supports memory-efficient online updates across irregular time intervals, and an expressive Functional Tensor Feature-wise Linear Modulation (FT-FiLM) decoder for continuous-field generation. We prove that FT-FiLM is more expressive than the functional Tucker model, admitting a richer function class for handling complex dynamics. Experiments on three representative physical systems under challenging sampling patterns show that StreamPhy consistently outperforms state-of-the-art baselines, with at least 48\% improvement in accuracy and up to 20--100X faster inference than diffusion-based methods.

preprint2022arXiv

Bayesian Low-rank Matrix Completion with Dual-graph Embedding: Prior Analysis and Tuning-free Inference

Recently, there is a revival of interest in low-rank matrix completion-based unsupervised learning through the lens of dual-graph regularization, which has significantly improved the performance of multidisciplinary machine learning tasks such as recommendation systems, genotype imputation and image inpainting. While the dual-graph regularization contributes a major part of the success, computational costly hyper-parameter tunning is usually involved. To circumvent such a drawback and improve the completion performance, we propose a novel Bayesian learning algorithm that automatically learns the hyper-parameters associated with dual-graph regularization, and at the same time, guarantees the low-rankness of matrix completion. Notably, a novel prior is devised to promote the low-rankness of the matrix and encode the dual-graph information simultaneously, which is more challenging than the single-graph counterpart. A nontrivial conditional conjugacy between the proposed priors and likelihood function is then explored such that an efficient algorithm is derived under variational inference framework. Extensive experiments using synthetic and real-world datasets demonstrate the state-of-the-art performance of the proposed learning algorithm for various data analysis tasks.

preprint2022arXiv

Classification of Long Sequential Data using Circular Dilated Convolutional Neural Networks

Classification of long sequential data is an important Machine Learning task and appears in many application scenarios. Recurrent Neural Networks, Transformers, and Convolutional Neural Networks are three major techniques for learning from sequential data. Among these methods, Temporal Convolutional Networks (TCNs) which are scalable to very long sequences have achieved remarkable progress in time series regression. However, the performance of TCNs for sequence classification is not satisfactory because they use a skewed connection protocol and output classes at the last position. Such asymmetry restricts their performance for classification which depends on the whole sequence. In this work, we propose a symmetric multi-scale architecture called Circular Dilated Convolutional Neural Network (CDIL-CNN), where every position has an equal chance to receive information from other positions at the previous layers. Our model gives classification logits in all positions, and we can apply a simple ensemble learning to achieve a better decision. We have tested CDIL-CNN on various long sequential datasets. The experimental results show that our method has superior performance over many state-of-the-art approaches.

preprint2022arXiv

Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention

Self-Attention is a widely used building block in neural modeling to mix long-range data elements. Most self-attention neural networks employ pairwise dot-products to specify the attention coefficients. However, these methods require $O(N^2)$ computing cost for sequence length $N$. Even though some approximation methods have been introduced to relieve the quadratic cost, the performance of the dot-product approach is still bottlenecked by the low-rank constraint in the attention matrix factorization. In this paper, we propose a novel scalable and effective mixing building block called Paramixer. Our method factorizes the interaction matrix into several sparse matrices, where we parameterize the non-zero entries by MLPs with the data elements as input. The overall computing cost of the new building block is as low as $O(N \log N)$. Moreover, all factorizing matrices in Paramixer are full-rank, so it does not suffer from the low-rank bottleneck. We have tested the new method on both synthetic and various real-world long sequential data sets and compared it with several state-of-the-art attention networks. The experimental results show that Paramixer has better performance in most learning tasks.

preprint2022arXiv

Tensor-based Basis Function Learning for Three-dimensional Sound Speed Fields

Basis function learning is the stepping stone towards effective three-dimensional (3D) sound speed field (SSF) inversion for various acoustic signal processing tasks, including ocean acoustic tomography, underwater target localization/tracking, and underwater communications. Classical basis functions include the empirical orthogonal functions (EOFs), Fourier basis functions, and their combinations. The unsupervised machine learning method, e.g., the K-SVD algorithm, has recently tapped into the basis function design, showing better representation performance than the EOFs. However, existing methods do not consider basis function learning approaches that treat 3D SSF data as a third-order tensor, and thus cannot fully utilize the 3D interactions/correlations therein. To circumvent such a drawback, basis function learning is linked to tensor decomposition in this paper, which is the primary drive for recent multi-dimensional data mining. In particular, a tensor-based basis function learning framework is proposed, which can include the classical basis functions (using EOFs and/or Fourier basis functions) as its special cases. This provides a unified tensor perspective for understanding and representing 3D SSFs. Numerical results using the South China Sea 3D SSF data have demonstrated the excellent performance of the tensor-based basis functions.

preprint2022arXiv

Towards Flexible Sparsity-Aware Modeling: Automatic Tensor Rank Learning Using The Generalized Hyperbolic Prior

Tensor rank learning for canonical polyadic decomposition (CPD) has long been deemed as an essential yet challenging problem. In particular, since the tensor rank controls the complexity of the CPD model, its inaccurate learning would cause overfitting to noise or underfitting to the signal sources, and even destroy the interpretability of model parameters. However, the optimal determination of a tensor rank is known to be a non-deterministic polynomial-time hard (NP-hard) task. Rather than exhaustively searching for the best tensor rank via trial-and-error experiments, Bayesian inference under the Gaussian-gamma prior was introduced in the context of probabilistic CPD modeling, and it was shown to be an effective strategy for automatic tensor rank determination. This triggered flourishing research on other structured tensor CPDs with automatic tensor rank learning. On the other side of the coin, these research works also reveal that the Gaussian-gamma model does not perform well for high-rank tensors and/or low signal-to-noise ratios (SNRs). To overcome these drawbacks, in this paper, we introduce a more advanced generalized hyperbolic (GH) prior to the probabilistic CPD model, which not only includes the Gaussian-gamma model as a special case, but also is more flexible to adapt to different levels of sparsity. Based on this novel probabilistic model, an algorithm is developed under the framework of variational inference, where each update is obtained in a closed-form. Extensive numerical results, using synthetic data and real-world datasets, demonstrate the significantly improved performance of the proposed method in learning both low as well as high tensor ranks even for low SNR cases.

preprint2022arXiv

Towards Tailored Models on Private AIoT Devices: Federated Direct Neural Architecture Search

Neural networks often encounter various stringent resource constraints while deploying on edge devices. To tackle these problems with less human efforts, automated machine learning becomes popular in finding various neural architectures that fit diverse Artificial Intelligence of Things (AIoT) scenarios. Recently, to prevent the leakage of private information while enable automated machine intelligence, there is an emerging trend to integrate federated learning and neural architecture search (NAS). Although promising as it may seem, the coupling of difficulties from both tenets makes the algorithm development quite challenging. In particular, how to efficiently search the optimal neural architecture directly from massive non-independent and identically distributed (non-IID) data among AIoT devices in a federated manner is a hard nut to crack. In this paper, to tackle this challenge, by leveraging the advances in ProxylessNAS, we propose a Federated Direct Neural Architecture Search (FDNAS) framework that allows for hardware-friendly NAS from non- IID data across devices. To further adapt to both various data distributions and different types of devices with heterogeneous embedded hardware platforms, inspired by meta-learning, a Cluster Federated Direct Neural Architecture Search (CFDNAS) framework is proposed to achieve device-aware NAS, in the sense that each device can learn a tailored deep learning model for its particular data distribution and hardware constraint. Extensive experiments on non-IID datasets have shown the state-of-the-art accuracy-efficiency trade-offs achieved by the proposed solution in the presence of both data and device heterogeneity.

preprint2021arXiv

Training Time Minimization for Federated Edge Learning with Optimized Gradient Quantization and Bandwidth Allocation

Training a machine learning model with federated edge learning (FEEL) is typically time-consuming due to the constrained computation power of edge devices and limited wireless resources in edge networks. In this paper, the training time minimization problem is investigated in a quantized FEEL system, where the heterogeneous edge devices send quantized gradients to the edge server via orthogonal channels. In particular, a stochastic quantization scheme is adopted for compression of uploaded gradients, which can reduce the burden of per-round communication but may come at the cost of increasing number of communication rounds. The training time is modeled by taking into account the communication time, computation time and the number of communication rounds. Based on the proposed training time model, the intrinsic trade-off between the number of communication rounds and per-round latency is characterized. Specifically, we analyze the convergence behavior of the quantized FEEL in terms of the optimality gap. Further, a joint data-and-model-driven fitting method is proposed to obtain the exact optimality gap, based on which the closed-form expressions for the number of communication rounds and the total training time are obtained. Constrained by total bandwidth, the training time minimization problem is formulated as a joint quantization level and bandwidth allocation optimization problem. To this end, an algorithm based on alternating optimization is proposed, which alternatively solves the subproblem of quantization optimization via successive convex approximation and the subproblem of bandwidth allocation via bisection search. With different learning tasks and models, the validation of our analysis and the near-optimal performance of the proposed optimization algorithm are demonstrated by the experimental results.

preprint2020arXiv

A Covariance-based User Activity Detection and Channel Estimation Approach with Novel Pilot Design

This paper studies the massive machine-type communications (mMTC) for the future Internet of Things (IoT) applications, where a large number of IoT devices exist in the network and a random subset of them become active at each time instant. Building upon the fact that the covariance matrix of the received signal can be accurately estimated in the spatial domain if the base station (BS) is equipped with a massive number of antennas, we propose a covariance-based device activity detection and channel estimation strategy in a massive MIMO (multiple-input multiple-output) aided mMTC system. For this strategy, a novel approach for the pilot sequence design is first provided, where the pilot of each device is merely determined by a unique phase parameter. Then, by estimating the phase parameters of the active pilot sequences that contribute to the received covariance matrix, an efficient algorithm is proposed to detect the active devices without the prior information about the total number of active devices. At last, given the estimation of active devices, channel estimation is conducted based on the conventional minimum mean-squared error (MMSE) approach. It is worth noting that our proposed strategy is able to obtain all the results in closed-forms, and is thus of much lower complexity compared to the existing strategies that are based on iterative algorithms for device detection and channel estimation.

preprint2020arXiv

Edge Learning with Unmanned Ground Vehicle: Joint Path, Energy and Sample Size Planning

Edge learning (EL), which uses edge computing as a platform to execute machine learning algorithms, is able to fully exploit the massive sensing data generated by Internet of Things (IoT). However, due to the limited transmit power at IoT devices, collecting the sensing data in EL systems is a challenging task. To address this challenge, this paper proposes to integrate unmanned ground vehicle (UGV) with EL. With such a scheme, the UGV could improve the communication quality by approaching various IoT devices. However, different devices may transmit different data for different machine learning jobs and a fundamental question is how to jointly plan the UGV path, the devices' energy consumption, and the number of samples for different jobs? This paper further proposes a graph-based path planning model, a network energy consumption model and a sample size planning model that characterizes F-measure as a function of the minority class sample size. With these models, the joint path, energy and sample size planning (JPESP) problem is formulated as a large-scale mixed integer nonlinear programming (MINLP) problem, which is nontrivial to solve due to the high-dimensional discontinuous variables related to UGV movement. To this end, it is proved that each IoT device should be served only once along the path, thus the problem dimension is significantly reduced. Furthermore, to handle the discontinuous variables, a tabu search (TS) based algorithm is derived, which converges in expectation to the optimal solution to the JPESP problem. Simulation results under different task scenarios show that our optimization schemes outperform the fixed EL and the full path EL schemes.

preprint2020arXiv

Varied fusion reaction probability induced by ion stopping modification in laser-driven plasma with different temperature

The dynamics of nuclear reaction in plasma is a fundamental issue in many high energy density researches, such as the astrophysical reactions and the inertial confinement fusion. The effective reaction cross-sections and ion stopping power in plasma need to be taken into account to analyze the reactivity. In this research, we have experimentally investigated the from D-D reactions from interactions between deuteron beams and deuterated polystyrene (CD) plasma, driven by two laser pulses respectively. The neutron yields, plasma density and deuteron energy loss in plasma have been measured, and the plasma temperature and deuteron stopping power have been analyzed from simulations. It is shown that, compared with a cold target, the reaction probability in plasma conditions can be enhanced or suppressed, which is ascribed to the deuteron stopping power modifications in plasma. In hotter CD plasma, the energy loss of moderate energetic deuterons reduces, which leads to higher D-D reaction probability, while the contrary happens in colder plasma. This work provides new understanding of fusion reactions in plasma environment.