Source author record

Li Wan

Li Wan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.mes-hall eess.AS Machine Learning Sound Computer Vision cond-mat.soft cond-mat.str-el math-ph math.MP physics.chem-ph physics.comp-ph physics.optics

Catalog footprint

What is connected

9works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

SLM-TTA: A Framework for Test-Time Adaptation of Generative Spoken Language Models

Spoken Language Models (SLMs) are increasingly central to modern speech-driven applications, but performance degrades under acoustic shift - real-world noise, reverberation, and microphone variation. Prior solutions rely on offline domain adaptation, which is post-hoc, data-intensive, and slow. We introduce the first test-time adaptation (TTA) framework for generative SLMs that process interleaved audio-text prompts. Our method updates a small, targeted subset of parameters during inference using only the incoming utterance, requiring no source data or labels. This stabilizes token distributions and improves robustness to acoustic variability without degrading core task accuracy. Evaluated on automatic speech recognition, speech translation, and 19 audio understanding tasks from AIR-Bench, our approach yields consistent gains under diverse corruptions. Because adaptation touches only a small fraction of weights, it is both compute- and memory-efficient, supporting deployment on resource-constrained platforms. This work enhances the robustness and adaptability of generative SLMs for real-world speech-driven applications.

preprint2022arXiv

Self-Supervised Speaker Verification with Simple Siamese Network and Self-Supervised Regularization

Training speaker-discriminative and robust speaker verification systems without speaker labels is still challenging and worthwhile to explore. In this study, we propose an effective self-supervised learning framework and a novel regularization strategy to facilitate self-supervised speaker representation learning. Different from contrastive learning-based self-supervised learning methods, the proposed self-supervised regularization (SSReg) focuses exclusively on the similarity between the latent representations of positive data pairs. We also explore the effectiveness of alternative online data augmentation strategies on both the time domain and frequency domain. With our strong online data augmentation strategy, the proposed SSReg shows the potential of self-supervised learning without using negative pairs and it can significantly improve the performance of self-supervised speaker representation learning with a simple Siamese network architecture. Comprehensive experiments on the VoxCeleb datasets demonstrate that our proposed self-supervised approach obtains a 23.4% relative improvement by adding the effective self-supervised regularization and outperforms other previous works.

preprint2022arXiv

Speaker Diarization with LSTM

For many years, i-vector based audio embedding techniques were the dominant approach for speaker verification and speaker diarization applications. However, mirroring the rise of deep learning in various domains, neural network based audio embeddings, also known as d-vectors, have consistently demonstrated superior speaker verification performance. In this paper, we build on the success of d-vector based speaker verification systems to develop a new d-vector based approach to speaker diarization. Specifically, we combine LSTM-based d-vector audio embeddings with recent work in non-parametric clustering to obtain a state-of-the-art speaker diarization system. Our system is evaluated on three standard public datasets, suggesting that d-vector based diarization systems offer significant advantages over traditional i-vector based systems. We achieved a 12.0% diarization error rate on NIST SRE 2000 CALLHOME, while our model is trained with out-of-domain data from voice search logs.

preprint2020arXiv

Personal VAD: Speaker-Conditioned Voice Activity Detection

In this paper, we propose "personal VAD", a system to detect the voice activity of a target speaker at the frame level. This system is useful for gating the inputs to a streaming on-device speech recognition system, such that it only triggers for the target user, which helps reduce the computational cost and battery consumption, especially in scenarios where a keyword detector is unpreferable. We achieve this by training a VAD-alike neural network that is conditioned on the target speaker embedding or the speaker verification score. For each frame, personal VAD outputs the probabilities for three classes: non-speech, target speaker speech, and non-target speaker speech. Under our optimal setup, we are able to train a model with only 130K parameters that outperforms a baseline system where individually trained standard VAD and speaker recognition networks are combined to perform the same task.

preprint2016arXiv

Finite Size Effects of Thermal Conductivity for One-Dimensional Mesoscopic Systems

The finite size effects of the thermal conductivity $κ$ have been studied in the phonon space. It is found that only a few phonon modes are selected to take part in the thermal transport when the size $L$ of the system is decreased. The amount of the selected phonon modes is proportional to the $L$. In this way, $κ$ decreases with the decreasing of $L$. Such mechanism for the size effect of $κ$ found in this work is beyond the Phonon-Boundary scattering. The exponent $α$ of the power law $κ\sim L^α$ has been fitted, showing that the exponent is not universal.

preprint2016arXiv

Iso-electric point of fluid

Iso-electric point(IEP) is the PH, at which the $ζ$ potential is measured to be zero. The occurrence of IEP has been understood due to the neutralization of surface charge density (SCD) at the solid-liquid interface. In this work, we use the potential trap model to study the sources of the surface charge density at verious PC and PH, by taking the water-silica system as an example. It is revealed that in the case of $PH<8$, the SCD is mainly originated from the dissociation of water molecules. And the bulk ions trapped at the interface can dominate the SCD when $PH>9$. Due to the mass action law, the dissociation of water molecules is suppressed at the PH close to IEP, leading to a zero surface charge density. In this way, zero $ζ$ potential is obtained at the IEP. It has also been obtained that the increase of the salt concentration in the water can decrease the $ζ$ potential, but increase the surface charge density.

preprint2014arXiv

End-to-End Integration of a Convolutional Network, Deformable Parts Model and Non-Maximum Suppression

Deformable Parts Models and Convolutional Networks each have achieved notable performance in object detection. Yet these two approaches find their strengths in complementary areas: DPMs are well-versed in object composition, modeling fine-grained spatial relationships between parts; likewise, ConvNets are adept at producing powerful image features, having been discriminatively trained directly on the pixels. In this paper, we propose a new model that combines these two approaches, obtaining the advantages of each. We train this model using a new structured loss function that considers all bounding boxes within an image, rather than isolated object instances. This enables the non-maximal suppression (NMS) operation, previously treated as a separate post-processing stage, to be integrated into the model. This allows for discriminative training of our combined Convnet + DPM + NMS model in end-to-end fashion. We evaluate our system on PASCAL VOC 2007 and 2011 datasets, achieving competitive results on both benchmarks.

preprint2011arXiv

A new method to find full complex roots of a complex dispersion equation for light propagation

A new numerical method is presented to find full complex roots of a complex dispersion equation. For the application of the solution, the complex dispersion equation of a cylindrical metallic nanowire is investigated. By using this method, locus of Brewster angle, complex dispersion curves of Surface Plasmon Polaritons (SPPs) and complex bulk modes can be obtained in once calculation. Approximate analytical solution to the complex dispersion equation has also been derived to verify our method.

preprint2011arXiv

Analytical solutions to zeroth-order dispersion relations of a cylindrical metallic nanowire

Zeroth-order complex dispersion relations of a cylindrical metallic nanowire have been solved out analytically with approximate methods. The analytical solutions are valid for the sections of the dispersion relations whose frequencies are close to the Surface Plasmon frequency. The back bending of the Surface Plasmon-Polaritons(SPPs) can be well described by the analytical solutions, confirming that the back bending is originated from the metal Ohmic loss. The utility of the back bending point in the dispersion relation for the measurement of the metallic Ohimc loss has also been suggested.

Li Wan

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

SLM-TTA: A Framework for Test-Time Adaptation of Generative Spoken Language Models

Self-Supervised Speaker Verification with Simple Siamese Network and Self-Supervised Regularization

Speaker Diarization with LSTM

Personal VAD: Speaker-Conditioned Voice Activity Detection

Finite Size Effects of Thermal Conductivity for One-Dimensional Mesoscopic Systems

Iso-electric point of fluid

End-to-End Integration of a Convolutional Network, Deformable Parts Model and Non-Maximum Suppression

A new method to find full complex roots of a complex dispersion equation for light propagation

Analytical solutions to zeroth-order dispersion relations of a cylindrical metallic nanowire