Researcher profile

Yunpeng Li

Yunpeng Li contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2026arXiv

DentalX: Context-Aware Dental Disease Detection with Radiographs

Diagnosing dental diseases from radiographs is time-consuming and challenging due to the subtle nature of diagnostic evidence. Existing methods, which rely on object detection models designed for natural images with more distinct target patterns, struggle to detect dental diseases that present with far less visual support. To address this challenge, we propose {\bf DentalX}, a novel context-aware dental disease detection approach that leverages oral structure information to mitigate the visual ambiguity inherent in radiographs. Specifically, we introduce a structural context extraction module that learns an auxiliary task: semantic segmentation of dental anatomy. The module extracts meaningful structural context and integrates it into the primary disease detection task to enhance the detection of subtle dental diseases. Extensive experiments on a dedicated benchmark demonstrate that DentalX significantly outperforms prior methods in both tasks. This mutual benefit arises naturally during model optimization, as the correlation between the two tasks is effectively captured. Our code is available at https://github.com/zhiqin1998/DentYOLOX.

preprint2026arXiv

S^2tory: Story Spine Distillation for Movie Script Summarization

Movie scripts pose a fundamental challenge for automatic summarization due to their non-linear, cross-cut narrative structure, which makes surface-level saliency methods ineffective at preserving core story progression. To address this, we introduce S^2tory (Story Spine Distillation), a narratology-grounded framework that leverages character development trajectories to identify plot nuclei, the essential events that drive the narrative forward, while filtering out peripheral satellite events that merely enrich atmosphere or emotion. Our Narrative Expert Agent (NEAgent) performs theory-constrained reasoning, whose distilled knowledge conditions a small model to identify plot nuclei. Another model then uses these plot nuclei to generate the summary. Experiments on the MovieSum dataset demonstrate state-of-the-art semantic fidelity at approximately 3.5x compression, and zero-shot evaluation on BookSum confirms strong out-of-domain generalization. Human evaluation further validates that narratological theory provides an indispensable foundation for modeling complex, non-linear narratives.

preprint2024arXiv

StreamVC: Real-Time Low-Latency Voice Conversion

We present StreamVC, a streaming voice conversion solution that preserves the content and prosody of any source speech while matching the voice timbre from any target speech. Unlike previous approaches, StreamVC produces the resulting waveform at low latency from the input signal even on a mobile platform, making it applicable to real-time communication scenarios like calls and video conferencing, and addressing use cases such as voice anonymization in these scenarios. Our design leverages the architecture and training strategy of the SoundStream neural audio codec for lightweight high-quality speech synthesis. We demonstrate the feasibility of learning soft speech units causally, as well as the effectiveness of supplying whitened fundamental frequency information to improve pitch stability without leaking the source timbre information.

preprint2022arXiv

Augmented Sliced Wasserstein Distances

While theoretically appealing, the application of the Wasserstein distance to large-scale machine learning problems has been hampered by its prohibitive computational cost. The sliced Wasserstein distance and its variants improve the computational efficiency through the random projection, yet they suffer from low accuracy if the number of projections is not sufficiently large, because the majority of projections result in trivially small values. In this work, we propose a new family of distance metrics, called augmented sliced Wasserstein distances (ASWDs), constructed by first mapping samples to higher-dimensional hypersurfaces parameterized by neural networks. It is derived from a key observation that (random) linear projections of samples residing on these hypersurfaces would translate to much more flexible nonlinear projections in the original sample space, so they can capture complex structures of the data distribution. We show that the hypersurfaces can be optimized by gradient ascent efficiently. We provide the condition under which the ASWD is a valid metric and show that this can be obtained by an injective neural network architecture. Numerical results demonstrate that the ASWD significantly outperforms other Wasserstein variants for both synthetic and real-world problems.

preprint2022arXiv

Batch-Ensemble Stochastic Neural Networks for Out-of-Distribution Detection

Out-of-distribution (OOD) detection has recently received much attention from the machine learning community due to its importance in deploying machine learning models in real-world applications. In this paper we propose an uncertainty quantification approach by modelling the distribution of features. We further incorporate an efficient ensemble mechanism, namely batch-ensemble, to construct the batch-ensemble stochastic neural networks (BE-SNNs) and overcome the feature collapse problem. We compare the performance of the proposed BE-SNNs with the other state-of-the-art approaches and show that BE-SNNs yield superior performance on several OOD benchmarks, such as the Two-Moons dataset, the FashionMNIST vs MNIST dataset, FashionMNIST vs NotMNIST dataset, and the CIFAR10 vs SVHN dataset.

preprint2022arXiv

Boosting Independent Component Analysis

Independent component analysis is intended to recover the mutually independent components from their linear mixtures. This technique has been widely used in many fields, such as data analysis, signal processing, and machine learning. To alleviate the dependency on prior knowledge concerning unknown sources, many nonparametric methods have been proposed. In this paper, we present a novel boosting-based algorithm for independent component analysis. Our algorithm consists of maximizing likelihood estimation via boosting and seeking unmixing matrix by the fixed-point method. A variety of experiments validate its performance compared with many of the presently known algorithms.

preprint2022arXiv

CLSEG: Contrastive Learning of Story Ending Generation

Story Ending Generation (SEG) is a challenging task in natural language generation. Recently, methods based on Pre-trained Language Models (PLM) have achieved great prosperity, which can produce fluent and coherent story endings. However, the pre-training objective of PLM-based methods is unable to model the consistency between story context and ending. The goal of this paper is to adopt contrastive learning to generate endings more consistent with story context, while there are two main challenges in contrastive learning of SEG. First is the negative sampling of wrong endings inconsistent with story contexts. The second challenge is the adaptation of contrastive learning for SEG. To address these two issues, we propose a novel Contrastive Learning framework for Story Ending Generation (CLSEG), which has two steps: multi-aspect sampling and story-specific contrastive learning. Particularly, for the first issue, we utilize novel multi-aspect sampling mechanisms to obtain wrong endings considering the consistency of order, causality, and sentiment. To solve the second issue, we well-design a story-specific contrastive training strategy that is adapted for SEG. Experiments show that CLSEG outperforms baselines and can produce story endings with stronger consistency and rationality.

preprint2022arXiv

Conditional Measurement Density Estimation in Sequential Monte Carlo via Normalizing Flow

Tuning of measurement models is challenging in real-world applications of sequential Monte Carlo methods. Recent advances in differentiable particle filters have led to various efforts to learn measurement models through neural networks. But existing approaches in the differentiable particle filter framework do not admit valid probability densities in constructing measurement models, leading to incorrect quantification of the measurement uncertainty given state information. We propose to learn expressive and valid probability densities in measurement models through conditional normalizing flows, to capture the complex likelihood of measurements given states. We show that the proposed approach leads to improved estimation performance and faster training convergence in a visual tracking experiment.

preprint2022arXiv

Imitation Learning with Sinkhorn Distances

Imitation learning algorithms have been interpreted as variants of divergence minimization problems. The ability to compare occupancy measures between experts and learners is crucial in their effectiveness in learning from demonstrations. In this paper, we present tractable solutions by formulating imitation learning as minimization of the Sinkhorn distance between occupancy measures. The formulation combines the valuable properties of optimal transport metrics in comparing non-overlapping distributions with a cosine distance cost defined in an adversarially learned feature space. This leads to a highly discriminative critic network and optimal transport plan that subsequently guide imitation learning. We evaluate the proposed approach using both the reward metric and the Sinkhorn distance metric on a number of MuJoCo experiments. For the implementation and reproducing results please refer to the following repository https://github.com/gpapagiannis/sinkhorn-imitation.

preprint2022arXiv

Learning to Denoise Historical Music

We propose an audio-to-audio neural network model that learns to denoise old music recordings. Our model internally converts its input into a time-frequency representation by means of a short-time Fourier transform (STFT), and processes the resulting complex spectrogram using a convolutional neural network. The network is trained with both reconstruction and adversarial objectives on a synthetic noisy music dataset, which is created by mixing clean music with real noise samples extracted from quiet segments of old recordings. We evaluate our method quantitatively on held-out test examples of the synthetic dataset, and qualitatively by human rating on samples of actual historical recordings. Our results show that the proposed method is effective in removing noise, while preserving the quality and details of the original music.

preprint2022arXiv

Particle Flow Gaussian Particle Filter

State estimation in non-linear models is performed by tracking the posterior distribution recursively. A plethora of algorithms have been proposed for this task. Among them, the Gaussian particle filter uses a weighted set of particles to construct a Gaussian approximation to the posterior. In this paper, we propose to use invertible particle flow methods, derived under the Gaussian boundary conditions for a flow equation, to generate a proposal distribution close to the posterior. The resultant particle flow Gaussian particle filter (PFGPF) algorithm retains the asymptotic properties of Gaussian particle filters, with the potential for improved state estimation performance in high dimensional spaces. We compare the performance of PFGPF with the particle flow filters and particle flow particle filters in two challenging numerical simulation examples.

preprint2021arXiv

Real-time Speech Frequency Bandwidth Extension

In this paper we propose a lightweight model for frequency bandwidth extension of speech signals, increasing the sampling frequency from 8kHz to 16kHz while restoring the high frequency content to a level almost indistinguishable from the 16kHz ground truth. The model architecture is based on SEANet (Sound EnhAncement Network), a wave-to-wave fully convolutional model, which uses a combination of feature losses and adversarial losses to reconstruct an enhanced version of the input speech. In addition, we propose a variant of SEANet that can be deployed on-device in streaming mode, achieving an architectural latency of 16ms. When profiled on a single core of a mobile CPU, processing one 16ms frame takes only 1.5ms. The low latency makes it viable for bi-directional voice communication systems.

preprint2021arXiv

Second-order Approximation of Minimum Discrimination Information in Independent Component Analysis

Independent Component Analysis (ICA) is intended to recover the mutually independent sources from their linear mixtures, and F astICA is one of the most successful ICA algorithms. Although it seems reasonable to improve the performance of F astICA by introducing more nonlinear functions to the negentropy estimation, the original fixed-point method (approximate Newton method) in F astICA degenerates under this circumstance. To alleviate this problem, we propose a novel method based on the second-order approximation of minimum discrimination information (MDI). The joint maximization in our method is consisted of minimizing single weighted least squares and seeking unmixing matrix by the fixed-point method. Experimental results validate its efficiency compared with other popular ICA algorithms.