Source author record

Hui Wang

Hui Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Sound Artificial Intelligence Computation and Language cond-mat.stat-mech eess.AS Graphics math.NT Neurons and Cognition nlin.AO physics.app-ph physics.optics quant-ph

Catalog footprint

What is connected

9works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

CosyEdit: Unlocking End-to-End Speech Editing Capability from Zero-Shot Text-to-Speech Models

Automatic speech editing aims to modify spoken content based on textual instructions, yet traditional cascade systems suffer from complex preprocessing pipelines and a reliance on explicit external temporal alignment. Addressing these limitations, we propose CosyEdit, an end-to-end speech editing model adapted from CosyVoice through task-specific fine-tuning and an optimized inference procedure, which internalizes speech-text alignment while ensuring high consistency between the speech before and after editing. By fine-tuning on only 250 hours of supervised data from our curated GigaEdit dataset, our 400M-parameter model achieves reliable speech editing performance. Experiments on the RealEdit benchmark indicate that CosyEdit not only outperforms several billion-parameter language model baselines but also matches the performance of state-of-the-art cascade approaches. These results demonstrate that, with task-specific fine-tuning and inference optimization, robust and efficient speech editing capabilities can be unlocked from a zero-shot TTS model, yielding a novel and cost-effective end-to-end solution for high-quality speech editing.

preprint2026arXiv

eTracer: Towards Traceable Text Generation via Claim-Level Grounding

How can system-generated responses be efficiently verified, especially in the high-stakes biomedical domain? To address this challenge, we introduce eTracer, a plug-and-play framework that enables traceable text generation by grounding claims against contextual evidence. Through post-hoc grounding, each response claim is aligned with contextual evidence that either supports or contradicts it. Building on claim-level grounding results, eTracer not only enables users to precisely trace responses back to their contextual source but also quantifies response faithfulness, thereby enabling the verifiability and trustworthiness of generated responses. Experiments show that our claim-level grounding approach alleviates the limitations of conventional grounding methods in aligning generated statements with contextual sentence-level evidence, resulting in substantial improvements in overall grounding quality and user verification efficiency. The code and data are available at https://github.com/chubohao/eTracer.

preprint2026arXiv

Explicit bounds for the graphicality of the prime gap sequence

We establish explicit unconditional results on the graphic properties of the prime gap sequence. Let $p_n$ denote the $n$-th prime number (with $p_0=1$) and $\mathrm{PD}_n = (p_\ell - p_{\ell-1})_{\ell=1}^n$ be the sequence of the first $n$ prime gaps. Building upon the recent work by Erdős \emph{et al}, which proved the graphic nature of $\mathrm{PD}_n$ for large $n$ unconditionally, and for all $n$ under RH, we provide the first explicit unconditional threshold such that: (1) For all $n \geq \exp\exp(30.5)$, $\mathrm{PD}_n$ is graphic. (2) For all $n \geq \exp\exp(34.5)$, every realization $G_n$ of $\mathrm{PD}_n$ satisfies that $(G_n, p_{n+1}-p_n)$ is DPG-graphic. Our proofs utilize a more refined criterion for when a sequence is graphic, and better estimates for the first moment of large prime gaps proven through an explicit zero-free region and explicit zero-density estimate for the Riemann zeta function.

preprint2026arXiv

LightFormer: A lightweight and efficient decoder for remote sensing image segmentation

Deep learning techniques have achieved remarkable success in the semantic segmentation of remote sensing images and in land-use change detection. Nevertheless, their real-time deployment on edge platforms remains constrained by decoder complexity. Herein, we introduce LightFormer, a lightweight decoder for time-critical tasks that involve unstructured targets, such as disaster assessment, unmanned aerial vehicle search-and-rescue, and cultural heritage monitoring. LightFormer employs a feature-fusion and refinement module built on channel processing and a learnable gating mechanism to aggregate multi-scale, multi-range information efficiently, which drastically curtails model complexity. Furthermore, we propose a spatial information selection module (SISM) that integrates long-range attention with a detail preservation branch to capture spatial dependencies across multiple scales, thereby substantially improving the recognition of unstructured targets in complex scenes. On the ISPRS Vaihingen benchmark, LightFormer attains 99.9% of GLFFNet's mIoU (83.9% vs. 84.0%) while requiring only 14.7% of its FLOPs and 15.9% of its parameters, thus achieving an excellent accuracy-efficiency trade-off. Consistent results on LoveDA, ISPRS Potsdam, RescueNet, and FloodNet further demonstrate its robustness and superior perception of unstructured objects. These findings highlight LightFormer as a practical solution for remote sensing applications where both computational economy and high-precision segmentation are imperative.

preprint2026arXiv

Macroscopic dynamics of quadratic integrate-and-fire neurons subject to correlated noise

The presence of correlated noise, arising from a mixture of independent fluctuations and a common noisy input shared across the neural population, is a ubiquitous feature of neural circuits, yet its impact on collective network dynamics remains poorly understood. We analyze a network of quadratic integrate-and-fire neurons driven by Gaussian noise with a tunable degree of correlation. Using the cumulant expansion method, we derive a reduced set of effective mean-field equations that accurately describe the evolution of the population's mean firing rate and membrane potential. Our analysis reveals a counterintuitive phenomenon: increasing the noise correlation strength suppresses the mean network activity, an effect we term correlated-noise-inhibited spiking. Furthermore, within a specific parameter regime, the network exhibits metastability, manifesting itself as spontaneous, noise-driven transitions between distinct high- and low-activity states. These results provide a theoretical framework for reducing the dynamics of complex stochastic networks and demonstrate how correlated noise can fundamentally regulate macroscopic neural activity, with implications for understanding state transitions in biological systems.

preprint2026arXiv

SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models

The Speaker Diarization and Recognition (SDR) task aims to predict "who spoke when and what" within an audio clip, which is a crucial task in various real-world multi-speaker scenarios such as meeting transcription and dialogue systems. Existing SDR systems typically adopt a cascaded framework, combining multiple modules such as speaker diarization (SD) and automatic speech recognition (ASR). The cascaded systems suffer from several limitations, such as error propagation, difficulty in handling overlapping speech, and lack of joint optimization for exploring the synergy between SD and ASR tasks. To address these limitations, we introduce SpeakerLM, a unified multimodal large language model for SDR that jointly performs SD and ASR in an end-to-end manner. Moreover, to facilitate diverse real-world scenarios, we incorporate a flexible speaker registration mechanism into SpeakerLM, enabling SDR under different speaker registration settings. SpeakerLM is progressively developed with a multi-stage training strategy on large-scale real data. Extensive experiments show that SpeakerLM demonstrates strong data scaling capability and generalizability, outperforming state-of-the-art cascaded baselines on both in-domain and out-of-domain public SDR benchmarks. Furthermore, experimental results show that the proposed speaker registration mechanism effectively ensures robust SDR performance of SpeakerLM across diverse speaker registration conditions and varying numbers of registered speakers.

preprint2026arXiv

Thermally adaptive textile inspired by morpho butterfly for all-season comfort and visible aesthetics

A longstanding challenge in personal thermal management has been transitioning from static, appearance-limited passive radiative cooling (PDRC) materials to systems that are both dynamically adaptive and visually versatile. The central hurdle remains the inherent compromise between color saturation and cooling power. Inspired by organisms such as butterflies, which decouple structural color from thermal function, we present a smart textile that seamlessly merges a dynamic thermochromic layer with static photonic crystals (PCs). This design enables the solar reflectance to be autonomously switched-from approximately 0.6 in the colored state for heating to about 0.9 in the high-reflectance state for cooling. Consequently, outdoor experiments validated substantial temperature regulation: the fabric achieves a surface temperature reduction of 3-4 °C in summer and a heating difference of <1 °C in winter compared to commercial reference materials, all while maintaining high-saturation colors. This dual-mode operation offers a viable pathway for achieving adaptive, aesthetic, and energy-free thermal comfort.

preprint2026arXiv

VoxScene: Anchor-Conditioned Voxel Diffusion for Indoor Scene Arrangement

We present VoxScene, a novel anchor-conditioned voxel diffusion framework tailored for 3D scene synthesis. Current data-driven layout generation techniques typically rely on bounding proxies or implicit representations, which overlook volumetric structures. This geometric blindness inevitably leads to severe physical collisions and structural entanglement, particularly in densely populated environments. To overcome these limitations, we shift the paradigm to an explicit, object-centric voxel representation. Our pipeline sequentially synthesizes discrete volumetric occupancies conditioned on prior anchors and local context. By exploiting the mutually exclusive nature of discrete voxels, our approach eliminates spatial ambiguities and guarantees collision-free arrangements, even in highly complex environments. Furthermore, the synthesized high-fidelity voxel grids serve as discriminative geometric queries for downstream asset retrieval. Extensive experiments demonstrate the universality of our method, achieving state-of-the-art physical plausibility and unlocking shape diversity compared to existing layout planners.

preprint2025arXiv

Realization of an untrusted intermediate relay architecture using a quantum dot single-photon source

To fully exploit the potential of quantum technologies, quantum networks are needed to link different systems, significantly enhancing applications in computing, cryptography, and metrology. Central to these networks are quantum relays that can facilitate long-distance entanglement distribution and quantum communication. In this work, we present a modular and scalable quantum relay architecture using a high-quality single-photon source. The proposed network incorporates three untrusted intermediate nodes and is capable of a repetition rate of 304.52 MHz. We use a measurement-device-independent protocol to demonstrate secure key establishment over fibers covering up to 300 kilometers. This study highlights the potential of single-photon sources in quantum relays to enhance information transmission, expand network coverage, and improve deployment flexibility, with promising applications in future quantum networks.

Hui Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

CosyEdit: Unlocking End-to-End Speech Editing Capability from Zero-Shot Text-to-Speech Models

eTracer: Towards Traceable Text Generation via Claim-Level Grounding

Explicit bounds for the graphicality of the prime gap sequence

LightFormer: A lightweight and efficient decoder for remote sensing image segmentation

Macroscopic dynamics of quadratic integrate-and-fire neurons subject to correlated noise

SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models

Thermally adaptive textile inspired by morpho butterfly for all-season comfort and visible aesthetics

VoxScene: Anchor-Conditioned Voxel Diffusion for Indoor Scene Arrangement

Realization of an untrusted intermediate relay architecture using a quantum dot single-photon source