Source author record

Hui Lu

Hui Lu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language eess.AS Sound Artificial Intelligence eess.SP Human-Computer Interaction Multimedia Networking and Internet Architecture Programming Languages Symbolic Computation

Catalog footprint

What is connected

6works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

How Should LLMs Listen While Speaking? A Study of User-Stream Routing in Full-Duplex Spoken Dialogue

Full-duplex spoken dialogue requires a model to keep listening while generating its own spoken response. This is challenging for large language models (LLMs), which are designed to extend a single coherent sequence and do not naturally support user input arriving during generation. We argue that how the user stream is routed into the LLM is therefore a key architectural question for full-duplex modeling. To study this question, we extend a text-only LLM into a unified full-duplex spoken dialogue system and compare two routing strategies under a shared training pipeline: (i) channel fusion, which injects the user stream directly into the LLM input, and (ii) cross-attention routing, which keeps the user stream as external memory accessed through cross-attention adapters. Experiments on spoken question answering and full-duplex interaction benchmarks reveal a clear tradeoff. Channel fusion yields stronger semantic grounding and consistently better question-answering performance. However, under semantically overlapping conditions such as user interruptions, it is more vulnerable to context corruption: if the model fails to stop in time, the overlapping user stream can interfere with ongoing generation and lead to semantically incoherent continuations. Cross-attention routing underperforms on question answering, but better preserves the LLM generation context and is more robust to this failure mode. These results establish user-stream routing as a central design axis in full-duplex spoken dialogue and offer practical guidance on the tradeoff between semantic integration and context robustness. We provide a demo page for qualitative inspection.

preprint2023arXiv

Olapa-MCoT: Enhancing the Chinese Mathematical Reasoning Capability of LLMs

CoT (Chain-of-Thought) is a way to solve reasoning problems for LLMs . Recently, many researches appear for improving the CoT capability of LLMs. In this work, we also proposed Olapa-MCoT, which is a LLMs based on llama2-13B PLM for finetuning and alignment learning. During the alignment training, we proposed the SimRRHF algorithm and Incorrect Data Relearning and mainly focused on optimizing the Chinese mathematical reasoning ability of Olapa-MCoT. The experiment achieved significant results, with the accuracy of Chinese mathematical reasoning up to 50%, 36% rise compared to llama2-13B. In addition, the accuracy of English reasoning ability also increased by nearly 4%.

preprint2022arXiv

A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS

The generative adversarial network (GAN) has shown its outstanding capability in improving Non-Autoregressive TTS (NAR-TTS) by adversarially training it with an extra model that discriminates between the real and the generated speech. To maximize the benefits of GAN, it is crucial to find a powerful discriminator that can capture rich distinguishable information. In this paper, we propose a multi-scale time-frequency spectrogram discriminator to help NAR-TTS generate high-fidelity Mel-spectrograms. It treats the spectrogram as a 2D image to exploit the correlation among different components in the time-frequency domain. And a U-Net-based model structure is employed to discriminate at different scales to capture both coarse-grained and fine-grained information. We conduct subjective tests to evaluate the proposed approach. Both multi-scale and time-frequency discriminating bring significant improvement in the naturalness and fidelity. When combining the neural vocoder, it is shown more effective and concise than fine-tuning the vocoder. Finally, we visualize the discriminating maps to compare their difference to verify the effectiveness of multi-scale discriminating.

preprint2022arXiv

Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation

Dysarthric speech reconstruction (DSR), which aims to improve the quality of dysarthric speech, remains a challenge, not only because we need to restore the speech to be normal, but also must preserve the speaker's identity. The speaker representation extracted by the speaker encoder (SE) optimized for speaker verification has been explored to control the speaker identity. However, the SE may not be able to fully capture the characteristics of dysarthric speakers that are previously unseen. To address this research problem, we propose a novel multi-task learning strategy, i.e., adversarial speaker adaptation (ASA). The primary task of ASA fine-tunes the SE with the speech of the target dysarthric speaker to effectively capture identity-related information, and the secondary task applies adversarial training to avoid the incorporation of abnormal speaking patterns into the reconstructed speech, by regularizing the distribution of reconstructed speech to be close to that of reference speech with high quality. Experiments show that the proposed approach can achieve enhanced speaker similarity and comparable speech naturalness with a strong baseline approach. Compared with dysarthric speech, the reconstructed speech achieves 22.3% and 31.5% absolute word error rate reduction for speakers with moderate and moderate-severe dysarthria respectively. Our demo page is released here: https://wendison.github.io/ASA-DSR-demo/

preprint2020arXiv

On Termination of Polynomial Programs with Equality Conditions

We investigate the termination problem of a family of multi-path polynomial programs (MPPs), in which all assignments to program variables are polynomials, and test conditions of loops and conditional statements are polynomial equalities. We show that the set of non-terminating inputs (NTI) of such a program is algorithmically computable, thus leading to the decidability of its termination. To the best of our knowledge, the considered family of MPPs is hitherto the largest one for which termination is decidable. We present an explicit recursive function which is essentially Ackermannian, to compute the maximal length of ascending chains of polynomial ideals under a control function, and thereby obtain a complete answer to the questions raised by Seidenberg. This maximal length facilitates a precise complexity analysis of our algorithms for computing the NTI and deciding termination of MPPs. We extend our method to programs with polynomial guarded commands and show how an incomplete procedure for MPPs with inequality guards can be obtained. An application of our techniques to invariant generation of polynomial programs is further presented.

preprint2015arXiv

Improving Routing Efficiency through Intermediate Target Based Geographic Routing

The greedy strategy of geographical routing may cause the local minimum problem when there is a hole in the routing area. It depends on other strategies such as perimeter routing to find a detour path, which can be long and result in inefficiency of the routing protocol. In this paper, we propose a new approach called Intermediate Target based Geographic Routing (ITGR) to solve the long detour path problem. The basic idea is to use previous experience to determine the destination areas that are shaded by the holes. The novelty of the approach is that a single forwarding path can be used to determine a shaded area that may cover many destination nodes. We design an efficient method for the source to find out whether a destination node belongs to a shaded area. The source then selects an intermediate node as the tentative target and greedily forwards packets to it, which in turn forwards the packet to the final destination by greedy routing. ITGR can combine multiple shaded areas to improve the efficiency of representation and routing. We perform simulations and demonstrate that ITGR significantly reduces the routing path length, compared with existing geographic routing protocols.