Source author record

Ming Lei

Ming Lei appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT eess.AS Sound Machine Learning physics.optics eess.SP physics.app-ph Artificial Intelligence Computation and Language cond-mat cond-mat.mtrl-sci physics.chem-ph physics.ins-det

Catalog footprint

What is connected

19works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

FED-FSTQ: Fisher-Guided Token Quantization for Communication-Efficient Federated Fine-Tuning of LLMs on Edge Devices

Federated fine-tuning provides a practical route to adapt large language models (LLMs) on edge devices without centralizing private data, yet in mobile deployments the training wall-clock is often bottlenecked by straggler-limited uplink communication under heterogeneous bandwidth and intermittent participation. Although parameter-efficient fine-tuning (PEFT) reduces trainable parameters, per-round payloads remain prohibitive in non-IID regimes, where uniform compression can discard rare but task-critical signals. We propose Fed-FSTQ, a Fisher-guided token quantization system primitive for communication-efficient federated LLM fine-tuning. Fed-FSTQ employs a lightweight Fisher proxy to estimate token sensitivity, coupling importance-aware token selection with non-uniform mixed-precision quantization to allocate higher fidelity to informative evidence while suppressing redundant transmission. The method is model-agnostic, serves as a drop-in module for standard federated PEFT pipelines, e.g., LoRA, without modifying the server aggregation rule, and supports bandwidth-heterogeneous clients via compact sparse message packing. Experiments on multilingual QA and medical QA under non-IID partitions show that Fed-FSTQ reduces cumulative uplink traffic required to reach a fixed quality threshold by 46x relative to a standard LoRA baseline, and improves end-to-end wall-clock time-to-accuracy by 52%. Furthermore, enabling Fisher-guided token reduction at inference yields up to a 1.55x end-to-end speedup on NVIDIA Jetson-class edge devices, demonstrating deployability under tight resource constraints.

preprint2022arXiv

ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech

Expressive text-to-speech (TTS) has become a hot research topic recently, mainly focusing on modeling prosody in speech. Prosody modeling has several challenges: 1) the extracted pitch used in previous prosody modeling works have inevitable errors, which hurts the prosody modeling; 2) different attributes of prosody (e.g., pitch, duration and energy) are dependent on each other and produce the natural prosody together; and 3) due to high variability of prosody and the limited amount of high-quality data for TTS training, the distribution of prosody cannot be fully shaped. To tackle these issues, we propose ProsoSpeech, which enhances the prosody using quantized latent vectors pre-trained on large-scale unpaired and low-quality text and speech data. Specifically, we first introduce a word-level prosody encoder, which quantizes the low-frequency band of the speech and compresses prosody attributes in the latent prosody vector (LPV). Then we introduce an LPV predictor, which predicts LPV given word sequence. We pre-train the LPV predictor on large-scale text and low-quality speech data and fine-tune it on the high-quality TTS dataset. Finally, our model can generate expressive speech conditioned on the predicted LPV. Experimental results show that ProsoSpeech can generate speech with richer prosody compared with baseline methods.

preprint2021arXiv

DeviceTTS: A Small-Footprint, Fast, Stable Network for On-Device Text-to-Speech

With the number of smart devices increasing, the demand for on-device text-to-speech (TTS) increases rapidly. In recent years, many prominent End-to-End TTS methods have been proposed, and have greatly improved the quality of synthesized speech. However, to ensure the qualified speech, most TTS systems depend on large and complex neural network models, and it's hard to deploy these TTS systems on-device. In this paper, a small-footprint, fast, stable network for on-device TTS is proposed, named as DeviceTTS. DeviceTTS makes use of a duration predictor as a bridge between encoder and decoder so as to avoid the problem of words skipping and repeating in Tacotron. As we all know, model size is a key factor for on-device TTS. For DeviceTTS, Deep Feedforward Sequential Memory Network (DFSMN) is used as the basic component. Moreover, to speed up inference, mix-resolution decoder is proposed for balance the inference speed and speech quality. Experiences are done with WORLD and LPCNet vocoder. Finally, with only 1.4 million model parameters and 0.099 GFLOPS, DeviceTTS achieves comparable performance with Tacotron and FastSpeech. As far as we know, the DeviceTTS can meet the needs of most of the devices in practical application.

preprint2021arXiv

Large cross-polarized Raman signal in CrI$_3$: A first-principles study

We find unusually large cross-polarized (and anti-symmetric) Raman signature of A$_{\rm g}$ phonon mode in CrI$_3$, in agreement with experiments. The signal is present only when the following three effects are considered in concert: ferromagnetism on Cr atoms, spin-orbit interaction, and resonant effects. Somewhat surprisingly, we find that the relevant spin-orbit interaction potential originates from iodine atoms, despite magnetism being mostly on chromium atoms. We analyze the Raman signature as a function of magnetic order, the direction of the magnetic moment, energy and polarization of light used for Raman scattering, as well as carrier lifetime. In addition to a strong cross-polarized Raman signal, we also find unusually strong phonon modulated magneto-optical Kerr effect (MOKE) in CrI$_3$.

preprint2021arXiv

Ultra-wideband electrostrictive mechanical antenna

Conventional mechanical antennas provide a strategy in long-wave communication with a surprisingly compact size below 1/1,000 of the wavelength. However, the narrow bandwidth and weak field intensity seriously hamper its practical applications. Here, we present a mechanical antenna based on the electrostrictive effect of PMN-PT-based relaxor ferroelectric ceramic to improve radiation capacity and achieve ultra-wideband characteristics (10 kHz - 1 MHz, the relative bandwidth is beyond 196%). Determined by the different underlying mechanism, the mechanical antenna based on the electrostrictive effect exhibits excellent communication properties from traditional mechanical antennas. The functions of signal coding, transmitting, receiving, and decoding were experimentally demonstrated. This approach offers a promising way of constructing mechanical antennas for long-wave communication.

preprint2021arXiv

Zero-order-free complex beam shaping

The unwanted zero-order diffraction is still an issue in beam shaping using pixelated spatial light modulators. In this paper, we report a new approach for zero-order free beam shaping by designing an asymmetric triangle reflector and introducing a digital blazed grating and a digital lens to the phase hologram addressed onto the spatial light modulator. By adding the digital lens phase to the previously reported complex-amplitude coding algorithms, we realized the generation of complex beams without the burden of zero-order diffraction. We comparatively investigated the produced complex light fields using the modified complex-amplitude coding algorithms to validate the proposed method.

preprint2020arXiv

A PDD Decoder for Binary Linear Codes With Neural Check Polytope Projection

Linear Programming (LP) is an important decoding technique for binary linear codes. However, the advantages of LP decoding, such as low error floor and strong theoretical guarantee, etc., come at the cost of high computational complexity and poor performance at the low signal-to-noise ratio (SNR) region. In this letter, we adopt the penalty dual decomposition (PDD) framework and propose a PDD algorithm to address the fundamental polytope based maximum likelihood (ML) decoding problem. Furthermore, we propose to integrate machine learning techniques into the most time-consuming part of the PDD decoding algorithm, i.e., check polytope projection (CPP). Inspired by the fact that a multi-layer perception (MLP) can theoretically approximate any nonlinear mapping function, we present a specially designed neural CPP (NCPP) algorithm to decrease the decoding latency. Simulation results demonstrate the effectiveness of the proposed algorithms.

preprint2020arXiv

ADMM-based Decoder for Binary Linear Codes Aided by Deep Learning

Inspired by the recent advances in deep learning (DL), this work presents a deep neural network aided decoding algorithm for binary linear codes. Based on the concept of deep unfolding, we design a decoding network by unfolding the alternating direction method of multipliers (ADMM)-penalized decoder. In addition, we propose two improved versions of the proposed network. The first one transforms the penalty parameter into a set of iteration-dependent ones, and the second one adopts a specially designed penalty function, which is based on a piecewise linear function with adjustable slopes. Numerical results show that the resulting DL-aided decoders outperform the original ADMM-penalized decoder for various low density parity check (LDPC) codes with similar computational complexity.

preprint2020arXiv

Hydrogen plasma favored modification of anatase TiO$_2$ (001) surface with desirable water splitting performance

We show that when TiO$_2$ anatase (001) is exposed to hydrogen plasma that the pristine surface termination becomes unfavorable to another, slightly modified, surface. On this modified surface the topmost TiO$_2$ layer is intact but out of registry with the bottom layers. Nevertheless, the modified surface has significantly improved ability to split water under exposure to sunlight. We show by explicit calculation of the water splitting reaction that the energy barrier that exists on a pristine surface is not present on the modified surface. The valence band maximum of the surface is raised relative to the pristine surface, which is a favorable way of adjusting the band gap in TiO$_2$ to the solar spectrum.

preprint2020arXiv

Off-axis optical trapping and transverse spinning of metallic microparticles with a linearly polarized Gaussian beam

Optical trapping of metallic microparticles remains a big challenge because of the strong scattering and absorption of light by the particles. In the paper, we report a new mechanism for stable trapping of metallic microparticles by using a tightly focused linearly polarized Gaussian spot. We theoretically and experimentally demonstrated that metallic microparticles were confined off the optical axis by such a trap. In the meanwhile, transverse spinning motion occurred as a consequence of the asymmetric force field acting on the particle by the trap. The off-axis trapping and transverse spinning of metallic microparticles provide new manners for the manipulation of metallic microparticles. The works reported in this paper are also of significance for a better understanding of the mechanical interaction between light and metallic particles.

preprint2020arXiv

SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition

End-to-end speech recognition has become popular in recent years, since it can integrate the acoustic, pronunciation and language models into a single neural network. Among end-to-end approaches, attention-based methods have emerged as being superior. For example, Transformer, which adopts an encoder-decoder architecture. The key improvement introduced by Transformer is the utilization of self-attention instead of recurrent mechanisms, enabling both encoder and decoder to capture long-range dependencies with lower computational complexity.In this work, we propose boosting the self-attention ability with a DFSMN memory block, forming the proposed memory equipped self-attention (SAN-M) mechanism. Theoretical and empirical comparisons have been made to demonstrate the relevancy and complementarity between self-attention and the DFSMN memory block. Furthermore, the proposed SAN-M provides an efficient mechanism to integrate these two modules. We have evaluated our approach on the public AISHELL-1 benchmark and an industrial-level 20,000-hour Mandarin speech recognition task. On both tasks, SAN-M systems achieved much better performance than the self-attention based Transformer baseline system. Specially, it can achieve a CER of 6.46% on the AISHELL-1 task even without using any external LM, comfortably outperforming other state-of-the-art systems.

preprint2020arXiv

Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition

Recently, streaming end-to-end automatic speech recognition (E2E-ASR) has gained more and more attention. Many efforts have been paid to turn the non-streaming attention-based E2E-ASR system into streaming architecture. In this work, we propose a novel online E2E-ASR system by using Streaming Chunk-Aware Multihead Attention(SCAMA) and a latency control memory equipped self-attention network (LC-SAN-M). LC-SAN-M uses chunk-level input to control the latency of encoder. As to SCAMA, a jointly trained predictor is used to control the output of encoder when feeding to decoder, which enables decoder to generate output in streaming manner. Experimental results on the open 170-hour AISHELL-1 and an industrial-level 20000-hour Mandarin speech recognition tasks show that our approach can significantly outperform the MoChA-based baseline system under comparable setup. On the AISHELL-1 task, our proposed method achieves a character error rate (CER) of 7.39%, to the best of our knowledge, which is the best published performance for online ASR.

preprint2014arXiv

A Framework of Performance Analysis for Distributed Antenna Systems Based on Random Matrix Theory

Future communications systems will definitely be built on green infrastructures. To realize such a goal, recently a new network infrastructure named cloud radio access network (C-RAN) is proposed by China Mobile to enhance network coverage and save energy simultaneously. In C-RANs, to order to save more energy the radio front ends are separated from the colocated baseband units and distributively located in physical positions. C-RAN can be recognized as a variant of distributed antenna systems (DASs). In this paper we analyze the performance of C-RANS using random matrix theory. Due to the fact that the antennas are distributed geographically instead of being installed nearby, the variances of the entries in the considered channel matrix are different from each other. To the best of the authors' knowledge, the work on random matrices with elements having different variances is largely open, which is of great importance for DASs. In our work, some fundamental results on the eigenvalue distributions of the random matrices with different variances are derived first. Then based on these fundamental conclusions the outage probability of the considered DAS is derived. Finally, the accuracy of our analytical results is assessed by some numerical results.

preprint2014arXiv

Diversity Multiplexing Tradeoff of the Half-duplex Slow Fading Multiple Access Channel based on Generalized Quantize-and-Forward Scheme

This paper investigates the Diversity Multiplexing Tradeoff (DMT) of the generalized quantize-and-forward (GQF) relaying scheme over the slow fading half-duplex multiple-access relay channel (HD-MARC). The compress-and-forward (CF) scheme has been shown to achieve the optimal DMT when the channel state information (CSI) of the relay-destination link is available at the relay. However, having the CSI of relay-destination link at relay is not always possible due to the practical considerations of the wireless system. In contrast, in this work, the DMT of the GQF scheme is derived without relay-destination link CSI at the relay. It is shown that even without knowledge of relay-destination CSI, the GQF scheme achieves the same DMT, achievable by CF scheme with full knowledge of CSI.

preprint2014arXiv

Half-Duplex Relaying for the Multiuser Channel

This work focuses on studying the half-duplex (HD) relaying in the Multiple Access Relay Channel (MARC) and the Compound Multiple Access Channel with a Relay (cMACr). A generalized Quantize-and-Forward (GQF) has been proposed to establish the achievable rate regions. Such scheme is developed based on the variation of the Quantize-and-Forward (QF) scheme and single block with two slots coding structure. The results in this paper can also be considered as a significant extension of the achievable rate region of Half-Duplex Relay Channel (HDRC). Furthermore, the rate regions based on GQF scheme is extended to the Gaussian channel case. The scheme performance is shown through some numerical examples.

preprint2014arXiv

Performance of the Generalized Quantize-and-Forward Scheme over the Multiple-Access Relay Channel

This work focuses on the half-duplex (HD) relaying based on the generalized quantize-and-forward (GQF) scheme in the slow fading Multiple Access Relay Channel (MARC) where the relay has no channel state information (CSI) of the relay-to-destination link. Relay listens to the channel in the first slot of the transmission block and cooperatively transmits to the destination in the second slot. In order to investigate the performance of the GQF, the following steps have been taken: 1)The GQF scheme is applied to establish the achievable rate regions of the discrete memoryless half-duplex MARC and the corresponding additive white Gaussian noise channel. This scheme is developed based on the generalization of the Quantize-and-Forward (QF) scheme and single block with two slots coding structure. 2) as the general performance measure of the slow fading channel, the common outage probability and the expected sum rate (total throughput) of the GQF scheme have been characterized. The numerical examples show that when the relay has no access to the CSI of the relay-destination link, the GQF scheme outperforms other relaying schemes, e.g., classic compress-and-forward (CF), decode-and-forward (DF) and amplify-and-forward (AF). 3) for a MAC channel with heterogeneous user channels and quality-of-service (QoS) requirements, individual outage probability and total throughput of the GQF scheme are also obtained and shown to outperform the classic CF scheme.

preprint2014arXiv

Quantized CSI-Based Tomlinson-Harashima Precoding in Multiuser MIMO Systems

This paper considers the implementation of Tomlinson-Harashima (TH) precoding for multiuser MIMO systems based on quantized channel state information (CSI) at the transmitter side. Compared with the results in [1], our scheme applies to more general system setting where the number of users in the system can be less than or equal to the number of transmit antennas. We also study the achievable average sum rate of the proposed quantized CSI-based TH precoding scheme. The expressions of the upper bounds on both the average sum rate of the systems with quantized CSI and the mean loss in average sum rate due to CSI quantization are derived. We also present some numerical results. The results show that the nonlinear TH precoding can achieve much better performance than that of linear zero-forcing precoding for both perfect CSI and quantized CSI cases. In addition, our derived upper bound on the mean rate loss for TH precoding converges to the true rate loss faster than that of zeroforcing precoding obtained in [2] as the number of feedback bits becomes large. Both the analytical and numerical results show that nonlinear precoding suffers from imperfect CSI more than linear precoding does.

preprint2013arXiv

Multiple-Level Power Allocation Strategy for Secondary Users in Cognitive Radio Networks

In this paper, we propose a multiple-level power allocation strategy for the secondary user (SU) in cognitive radio (CR) networks. Different from the conventional strategies, where SU either stays silent or transmit with a constant/binary power depending on the busy/idle status of the primary user (PU), the proposed strategy allows SU to choose different power levels according to a carefully designed function of the receiving energy. The way of the power level selection is optimized to maximize the achievable rate of SU under the constraints of average transmit power at SU and average interference power at PU. Simulation results demonstrate that the proposed strategy can significantly improve the performance of SU compared to the conventional strategies.

preprint1996arXiv

Numerical confirmation of universality of transmission micro-symmetry relations in a four-probe quantum dot

We study the crossover behavior of the Hall resistance between the integer quantum Hall regime and a regime dominated by the Aharonov-Bohm oscillations, in a system of 4-probe quantum dot with an artificial impurity confined inside. In a previous study [M. Lei, N.J. Zhu and Hong Guo, Phys. Rev. B. 52, 16784, (1995)], a peculiar set of symmetry relations between various scattering probabilities were found in this crossover regime. In this paper we examine the universality of this set of symmetry relations using different shapes of the quantum dot and positions of the artificial impurity. The symmetry holds for these changes and we conclude that in this transport regime the general behavior of the Hall resistance is determined by the competition of the quantum Hall and Aharonov-Bohm effects, rather than by the detailed shapes of the structure.

Ming Lei

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

FED-FSTQ: Fisher-Guided Token Quantization for Communication-Efficient Federated Fine-Tuning of LLMs on Edge Devices

ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech

DeviceTTS: A Small-Footprint, Fast, Stable Network for On-Device Text-to-Speech

Large cross-polarized Raman signal in CrI$_3$: A first-principles study

Ultra-wideband electrostrictive mechanical antenna

Zero-order-free complex beam shaping

A PDD Decoder for Binary Linear Codes With Neural Check Polytope Projection

ADMM-based Decoder for Binary Linear Codes Aided by Deep Learning

Hydrogen plasma favored modification of anatase TiO$_2$ (001) surface with desirable water splitting performance

Off-axis optical trapping and transverse spinning of metallic microparticles with a linearly polarized Gaussian beam

SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition

Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition

A Framework of Performance Analysis for Distributed Antenna Systems Based on Random Matrix Theory

Diversity Multiplexing Tradeoff of the Half-duplex Slow Fading Multiple Access Channel based on Generalized Quantize-and-Forward Scheme

Half-Duplex Relaying for the Multiuser Channel

Performance of the Generalized Quantize-and-Forward Scheme over the Multiple-Access Relay Channel

Quantized CSI-Based Tomlinson-Harashima Precoding in Multiuser MIMO Systems

Multiple-Level Power Allocation Strategy for Secondary Users in Cognitive Radio Networks

Numerical confirmation of universality of transmission micro-symmetry relations in a four-probe quantum dot