Source author record

Xiao-Hui Li

Xiao-Hui Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning Computation and Language Computer Vision eess.AS math.FA Sound

Catalog footprint

What is connected

4works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Minimizing Modality Gap from the Input Side: Your Speech LLM Can Be a Prosody-Aware Text LLM

Speech large language models (SLMs) are typically built from text large language model (TLM) checkpoints, yet they still suffer from a substantial modality gap. Prior work has mainly attempted to reduce this gap from the output side by making speech generation more text-like, but the gap remains. We argue that the key remaining bottleneck lies on the input side. We propose TextPro-SLM, an SLM that makes spoken input more closely resemble that of a prosody-aware text LLM. TextPro-SLM combines WhisperPro, a unified speech encoder that produces synchronized text tokens and prosody embeddings, with an LLM backbone trained to preserve the semantic capabilities of the original TLM while learning paralinguistic understanding. Experiments show that TextPro-SLM achieves the lowest modality gap among leading SLMs at both 3B and 7B scales, while also delivering strong overall performance on paralinguistic understanding tasks. These gains are achieved with only roughly 1,000 hours of LLM training audio, suggesting that reducing the modality gap from the input side is both effective and data-efficient.

preprint2022arXiv

Example Perplexity

Some examples are easier for humans to classify than others. The same should be true for deep neural networks (DNNs). We use the term example perplexity to refer to the level of difficulty of classifying an example. In this paper, we propose a method to measure the perplexity of an example and investigate what factors contribute to high example perplexity. The related codes and resources are available at https://github.com/vaynexie/Example-Perplexity.

preprint2020arXiv

Quantitative Evaluations on Saliency Methods: An Experimental Study

It has been long debated that eXplainable AI (XAI) is an important topic, but it lacks rigorous definition and fair metrics. In this paper, we briefly summarize the status quo of the metrics, along with an exhaustive experimental study based on them, including faithfulness, localization, false-positives, sensitivity check, and stability. With the experimental results, we conclude that among all the methods we compare, no single explanation method dominates others in all metrics. Nonetheless, Gradient-weighted Class Activation Mapping (Grad-CAM) and Randomly Input Sampling for Explanation (RISE) perform fairly well in most of the metrics. Utilizing a set of filtered metrics, we further present a case study to diagnose the classification bases for models. While providing a comprehensive experimental study of metrics, we also examine measuring factors that are missed in current metrics and hope this valuable work could serve as a guide for future research.

preprint2016arXiv

Energy and Laplacian of Fractal Interpolation Functions

In this paper, we first characterize the finiteness of fractal interpolation functions (FIFs) on post critical finite self-similar sets. Then we study the Laplacian of FIFs with uniform vertical scaling factors on Sierpinski gasket (SG). As an application, we prove that the solution of the following Dirichlet problem on SG is an FIF with uniform vertical scaling factor $\frac{1}{5}$: $Δu=0$ on $SG\setminus \{q_1,q_2,q_3\}$, and $u(q_i)=a_i$, $i=1,2,3$, where $q_i$, $i=1,2,3$, are boundary points of SG.