Source author record

Xuewei Zhang

Xuewei Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.mes-hall Sound Computation and Language eess.AS eess.SP Networking and Internet Architecture

Catalog footprint

What is connected

6works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

IKFST: IOO and KOO Algorithms for Accelerated and Precise WFST-based End-to-End Automatic Speech Recognition

End-to-end automatic speech recognition has become the dominant paradigm in both academia and industry. To enhance recognition performance, the Weighted Finite-State Transducer (WFST) is widely adopted to integrate acoustic and language models through static graph composition, providing robust decoding and effective error correction. However, WFST decoding relies on a frame-by-frame autoregressive search over CTC posterior probabilities, which severely limits inference efficiency. Motivated by establishing a more principled compatibility between WFST decoding and CTC modeling, we systematically study the two fundamental components of CTC outputs, namely blank and non-blank frames, and identify a key insight: blank frames primarily encode positional information, while non-blank frames carry semantic content. Building on this observation, we introduce Keep-Only-One and Insert-Only-One, two decoding algorithms that explicitly exploit the structural roles of blank and non-blank frames to achieve significantly faster WFST-based inference without compromising recognition accuracy. Experiments on large-scale in-house, AISHELL-1, and LibriSpeech datasets demonstrate state-of-the-art recognition accuracy with substantially reduced decoding latency, enabling truly efficient and high-performance WFST decoding in modern speech recognition systems.

preprint2022arXiv

Caching Scalable Videos in the Edge of Wireless Cellular Networks

By pre-fetching popular videos into the local caches of edge nodes, wireless edge caching provides an effective means of reducing repeated content deliveries. To meet the various viewing quality requirements of multimedia users, scalable video coding (SVC) is integrated with edge caching, where the constituent layers of scalable videos are flexibly cached and transmitted to users. In this article, we discuss the challenges arising from the different content popularity and various viewing requirements of scalable videos, and present the diverse types of cached contents as well as the corresponding transmission schemes. We provide an overview of the existing caching schemes, and summarize the criteria of making caching decisions. A case study is then presented, where the transmission delay is quantified and used as the performance metric. Simulation results confirm that giving cognizance to the realistic requirements of end users is capable of significantly reducing the content transmission delay, compared to the existing caching schemes operating without SVC. The results also verify that the transmission delay of the proposed random caching scheme is lower than that of the caching scheme which only provides local caching gain.

preprint2021arXiv

3D hinge transport in acoustic higher-order topological insulators

The discovery of topologically protected boundary states in topological insulators opens a new avenue toward exploring novel transport phenomena. The one-way feature of boundary states against disorders and impurities prospects great potential in applications of electronic and classical wave devices. Particularly, for the 3D higher-order topological insulators, it can host hinge states, which allow the energy to transport along the hinge channels. However, the hinge states haveonly been observed along a single hinge, and a natural question arises: whether the hinge states can exist simultaneously on all the three independent directions of one sample? Here we theoretically predict and experimentally observe the hinge states on three different directions of a higher-order topological phononic crystal, and demonstrate their robust one-way transport from hinge to hinge. Therefore, 3D topological hinge transport is successfully achieved. The novel sound transport may serve as the basis for acoustic devices of unconventional functions.

preprint2021arXiv

Higher-order topological semimetal in acoustic crystals

The notion of higher-order topological insulators has endowed materials with topological states beyond the first order. Particularly, a three-dimensional (3D) higher-order topological insulator can host topologically protected 1D hinge states, referred to as the second-order topological insulator, or 0D corner states, referred to as the third-order topological insulator. Similarly, a 3D higher-order topological semimetal can be envisaged if it hosts states on the 1D hinges. Here we report the realization of a second-order topological Weyl semimetal in a 3D-printed acoustic crystal, which possesses Weyl points in 3D momentum space, 2D Fermi arc states on surfaces and 1D gapless states on hinges. Like the arc surface states, the hinge states also connect the projections of the Weyl points. Our experimental results evidence the existence of the higher-order topological semimetal, which may pave the way towards innovative acoustic devices.

preprint2020arXiv

Learning-Based Multi-Channel Access in 5G and Beyond Networks with Fast Time-Varying Channels

We propose a learning-based scheme to investigate the dynamic multi-channel access (DMCA) problem in the fifth generation (5G) and beyond networks with fast time-varying channels wherein the channel parameters are unknown. The proposed learning-based scheme can maintain near-optimal performance for a long time, even in the sharp changing channels. This scheme greatly reduces processing delay, and effectively alleviates the error due to decision lag, which is cased by the non-immediacy of the information acquisition and processing. We first propose a psychology-based personalized quality of service model after introducing the network model with unknown channel parameters and the streaming model. Then, two access criteria are presented for the living streaming model and the buffered streaming model. Their corresponding optimization problems are also formulated. The optimization problems are solved by learning-based DMCA scheme, which combines the recurrent neural network with deep reinforcement learning. In the learning-based DMCA scheme, the agent mainly invokes the proposed prediction-based deep deterministic policy gradient algorithm as the learning algorithm. As a novel technical paradigm, our scheme has strong universality, since it can be easily extended to solve other problems in wireless communications. The real channel data-based simulation results validate that the performance of the learning-based scheme approaches that derived from the exhaustive search when making a decision at each time-slot, and is superior to the exhaustive search method when making a decision at every few time-slots.

preprint2015arXiv

THCHS-30 : A Free Chinese Speech Corpus

Speech data is crucially important for speech recognition research. There are quite some speech databases that can be purchased at prices that are reasonable for most research institutes. However, for young people who just start research activities or those who just gain initial interest in this direction, the cost for data is still an annoying barrier. We support the `free data' movement in speech recognition: research institutes (particularly supported by public funds) publish their data freely so that new researchers can obtain sufficient data to kick of their career. In this paper, we follow this trend and release a free Chinese speech database THCHS-30 that can be used to build a full- edged Chinese speech recognition system. We report the baseline system established with this database, including the performance under highly noisy conditions.