Source author record

Xun Zhou

Xun Zhou appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Artificial Intelligence Computation and Language Machine Learning physics.optics Sound

Catalog footprint

What is connected

7works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

IndexTTS 2.5 Technical Report

In prior work, we introduced IndexTTS 2, a zero-shot neural text-to-speech foundation model comprising two core components: a transformer-based Text-to-Semantic (T2S) module and a non-autoregressive Semantic-to-Mel (S2M) module, which together enable faithful emotion replication and establish the first autoregressive duration-controllable generative paradigm. Building upon this, we present IndexTTS 2.5, which significantly enhances multilingual coverage, inference speed, and overall synthesis quality through four key improvements: 1) Semantic Codec Compression: we reduce the semantic codec frame rate from 50 Hz to 25 Hz, halving sequence length and substantially lowering both training and inference costs; 2) Architectural Upgrade: we replace the U-DiT-based backbone of the S2M module with a more efficient Zipformer-based modeling architecture, achieving notable parameter reduction and faster mel-spectrogram generation; 3) Multilingual Extension: We propose three explicit cross-lingual modeling strategies, boundary-aware alignment, token-level concatenation, and instruction-guided generation, establishing practical design principles for zero-shot multilingual emotional TTS that supports Chinese, English, Japanese, and Spanish, and enables robust emotion transfer even without target-language emotional training data; 4) Reinforcement Learning Optimization: we apply GRPO in post-training of the T2S module, improving pronunciation accuracy and natrualness. Experiments show that IndexTTS 2.5 not only supports broader language coverage but also replicates emotional prosody in unseen languages under the same zero-shot setting. IndexTTS 2.5 achieves a 2.28 times improvement in RTF while maintaining comparable WER and speaker similarity to IndexTTS 2.

preprint2024arXiv

ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers

The emergence of Large Language Models (LLMs) such as ChatGPT and LLaMA encounter limitations in domain-specific tasks, with these models often lacking depth and accuracy in specialized areas, and exhibiting a decrease in general capabilities when fine-tuned, particularly analysis ability in small sized models. To address these gaps, we introduce ICE-GRT, utilizing Reinforcement Learning from Human Feedback (RLHF) grounded in Proximal Policy Optimization (PPO), demonstrating remarkable ability in in-domain scenarios without compromising general task performance. Our exploration of ICE-GRT highlights its understanding and reasoning ability to not only generate robust answers but also to provide detailed analyses of the reasons behind the answer. This capability marks a significant progression beyond the scope of Supervised Fine-Tuning models. The success of ICE-GRT is dependent on several crucial factors, including Appropriate Data, Reward Size Scaling, KL-Control, Advantage Normalization, etc. The ICE-GRT model exhibits state-of-the-art performance in domain-specific tasks and across 12 general Language tasks against equivalent size and even larger size LLMs, highlighting the effectiveness of our approach. We provide a comprehensive analysis of the ICE-GRT, underscoring the significant advancements it brings to the field of LLM.

preprint2022arXiv

HintNet: Hierarchical Knowledge Transfer Networks for Traffic Accident Forecasting on Heterogeneous Spatio-Temporal Data

Traffic accident forecasting is a significant problem for transportation management and public safety. However, this problem is challenging due to the spatial heterogeneity of the environment and the sparsity of accidents in space and time. The occurrence of traffic accidents is affected by complex dependencies among spatial and temporal features. Recent traffic accident prediction methods have attempted to use deep learning models to improve accuracy. However, most of these methods either focus on small-scale and homogeneous areas such as populous cities or simply use sliding-window-based ensemble methods, which are inadequate to handle heterogeneity in large regions. To address these limitations, this paper proposes a novel Hierarchical Knowledge Transfer Network (HintNet) model to better capture irregular heterogeneity patterns. HintNet performs a multi-level spatial partitioning to separate sub-regions with different risks and learns a deep network model for each level using spatio-temporal and graph convolutions. Through knowledge transfer across levels, HintNet archives both higher accuracy and higher training efficiency. Extensive experiments on a real-world accident dataset from the state of Iowa demonstrate that HintNet outperforms the state-of-the-art methods on spatially heterogeneous and large-scale areas.

preprint2016arXiv

Wireless Power Meets Energy Harvesting: A Joint Energy Allocation Approach in OFDM-based System

This paper investigates an orthogonal frequency division multiplexing (OFDM)-based wireless powered communication system, where one user harvests energy from an energy access point (EAP) to power its information transmission to a data access point (DAP). The channels from the EAP to the user, i.e., the wireless energy transfer (WET) link, and from the user to the DAP, i.e., the wireless information transfer (WIT) link, vary over both time slots and sub-channels (SCs) in general. To avoid interference at DAP, WET and WIT are scheduled over orthogonal SCs at any slot. Our objective is to maximize the achievable rate at the DAP by jointly optimizing the SC allocation over time and the power allocation over time and SCs for both WET and WIT links. Assuming availability of full channel state information (CSI), the structural results for the optimal SC/power allocation are obtained and an offline algorithm is proposed to solve the problem. Furthermore, we propose a low-complexity online algorithm when causal CSI is available.

preprint2014arXiv

Terahertz in-line digital holography of dragonfly hindwing: amplitude and phase reconstruction at enhanced resolution by extrapolation

We report here on terahertz (THz) digital holography on a biological specimen. A continuous-wave (CW) THz in-line holographic setup was built based on a 2.52 THz CO2 pumped THz laser and a pyroelectric array detector. We introduced novel statistical method of obtaining true intensity values for the pyroelectric array detector's pixels. Absorption and phase-shifting images of a dragonfly's hind wing were reconstructed simultaneously from single in-line hologram. Furthermore, we applied phase retrieval routines to eliminate twin image and enhanced the resolution of the reconstructions by hologram extrapolation beyond the detector area. The finest observed features are 35 μm width cross veins.

preprint2014arXiv

Wireless Information and Power Transfer in Multiuser OFDM Systems

In this paper, we study the optimal design for simultaneous wireless information and power transfer (SWIPT) in downlink multiuser orthogonal frequency division multiplexing (OFDM) systems. For information transmission, we consider two types of multiple access schemes, namely, time division multiple access (TDMA) and orthogonal frequency division multiple access (OFDMA). At the receiver side, due to the practical limitation that circuits for harvesting energy from radio signals are not yet able to decode the carried information directly, each user applies either time switching (TS) or power splitting (PS) to coordinate the energy harvesting (EH) and information decoding (ID) processes. For the TDMA-based information transmission, we employ TS at the receivers; for the OFDMA-based information transmission, we employ PS at the receivers. Under the above two scenarios, we address the problem of maximizing the weighted sum-rate over all users by varying the time/frequency power allocation and either TS or PS ratio, subject to a minimum harvested energy constraint on each user as well as a peak and/or total transmission power constraint. For the TS scheme, by an appropriate variable transformation the problem is reformulated as a convex problem, for which the optimal power allocation and TS ratio are obtained by the Lagrange duality method. For the PS scheme, we propose an iterative algorithm to optimize the power allocation, subcarrier (SC) allocation and the PS ratio for each user. The performances of the two schemes are compared numerically as well as analytically for the special case of single-user setup. It is revealed that the peak power constraint imposed on each OFDM SC as well as the number of users in the system play a key role in the rate-energy performance comparison by the two proposed schemes.

preprint2013arXiv

Wireless Information and Power Transfer: Architecture Design and Rate-Energy Tradeoff

Simultaneous information and power transfer over the wireless channels potentially offers great convenience to mobile users. Yet practical receiver designs impose technical constraints on its hardware realization, as practical circuits for harvesting energy from radio signals are not yet able to decode the carried information directly. To make theoretical progress, we propose a general receiver operation, namely, dynamic power splitting (DPS), which splits the received signal with adjustable power ratio for energy harvesting and information decoding, separately. Three special cases of DPS, namely, time switching (TS), static power splitting (SPS) and on-off power splitting (OPS) are investigated. The TS and SPS schemes can be treated as special cases of OPS. Moreover, we propose two types of practical receiver architectures, namely, separated versus integrated information and energy receivers. The integrated receiver integrates the front-end components of the separated receiver, thus achieving a smaller form factor. The rate-energy tradeoff for the two architectures are characterized by a so-called rate-energy (R-E) region. The optimal transmission strategy is derived to achieve different rate-energy tradeoffs. With receiver circuit power consumption taken into account, it is shown that the OPS scheme is optimal for both receivers. For the ideal case when the receiver circuit does not consume power, the SPS scheme is optimal for both receivers. In addition, we study the performance for the two types of receivers under a realistic system setup that employs practical modulation. Our results provide useful insights to the optimal practical receiver design for simultaneous wireless information and power transfer (SWIPT).

Xun Zhou

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

IndexTTS 2.5 Technical Report

ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers

HintNet: Hierarchical Knowledge Transfer Networks for Traffic Accident Forecasting on Heterogeneous Spatio-Temporal Data

Wireless Power Meets Energy Harvesting: A Joint Energy Allocation Approach in OFDM-based System

Terahertz in-line digital holography of dragonfly hindwing: amplitude and phase reconstruction at enhanced resolution by extrapolation

Wireless Information and Power Transfer in Multiuser OFDM Systems

Wireless Information and Power Transfer: Architecture Design and Rate-Energy Tradeoff