Source author record

Yunpeng Li

Yunpeng Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence eess.AS eess.SP Computation and Language Computer Vision Data Structures and Algorithms Discrete Mathematics Robotics Sound Computer Science and Game Theory Digital Libraries math.MG

Catalog footprint

What is connected

18works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

DentalX: Context-Aware Dental Disease Detection with Radiographs

Diagnosing dental diseases from radiographs is time-consuming and challenging due to the subtle nature of diagnostic evidence. Existing methods, which rely on object detection models designed for natural images with more distinct target patterns, struggle to detect dental diseases that present with far less visual support. To address this challenge, we propose {\bf DentalX}, a novel context-aware dental disease detection approach that leverages oral structure information to mitigate the visual ambiguity inherent in radiographs. Specifically, we introduce a structural context extraction module that learns an auxiliary task: semantic segmentation of dental anatomy. The module extracts meaningful structural context and integrates it into the primary disease detection task to enhance the detection of subtle dental diseases. Extensive experiments on a dedicated benchmark demonstrate that DentalX significantly outperforms prior methods in both tasks. This mutual benefit arises naturally during model optimization, as the correlation between the two tasks is effectively captured. Our code is available at https://github.com/zhiqin1998/DentYOLOX.

preprint2026arXiv

S^2tory: Story Spine Distillation for Movie Script Summarization

Movie scripts pose a fundamental challenge for automatic summarization due to their non-linear, cross-cut narrative structure, which makes surface-level saliency methods ineffective at preserving core story progression. To address this, we introduce S^2tory (Story Spine Distillation), a narratology-grounded framework that leverages character development trajectories to identify plot nuclei, the essential events that drive the narrative forward, while filtering out peripheral satellite events that merely enrich atmosphere or emotion. Our Narrative Expert Agent (NEAgent) performs theory-constrained reasoning, whose distilled knowledge conditions a small model to identify plot nuclei. Another model then uses these plot nuclei to generate the summary. Experiments on the MovieSum dataset demonstrate state-of-the-art semantic fidelity at approximately 3.5x compression, and zero-shot evaluation on BookSum confirms strong out-of-domain generalization. Human evaluation further validates that narratological theory provides an indispensable foundation for modeling complex, non-linear narratives.

preprint2024arXiv

StreamVC: Real-Time Low-Latency Voice Conversion

We present StreamVC, a streaming voice conversion solution that preserves the content and prosody of any source speech while matching the voice timbre from any target speech. Unlike previous approaches, StreamVC produces the resulting waveform at low latency from the input signal even on a mobile platform, making it applicable to real-time communication scenarios like calls and video conferencing, and addressing use cases such as voice anonymization in these scenarios. Our design leverages the architecture and training strategy of the SoundStream neural audio codec for lightweight high-quality speech synthesis. We demonstrate the feasibility of learning soft speech units causally, as well as the effectiveness of supplying whitened fundamental frequency information to improve pitch stability without leaking the source timbre information.

preprint2022arXiv

Augmented Sliced Wasserstein Distances

While theoretically appealing, the application of the Wasserstein distance to large-scale machine learning problems has been hampered by its prohibitive computational cost. The sliced Wasserstein distance and its variants improve the computational efficiency through the random projection, yet they suffer from low accuracy if the number of projections is not sufficiently large, because the majority of projections result in trivially small values. In this work, we propose a new family of distance metrics, called augmented sliced Wasserstein distances (ASWDs), constructed by first mapping samples to higher-dimensional hypersurfaces parameterized by neural networks. It is derived from a key observation that (random) linear projections of samples residing on these hypersurfaces would translate to much more flexible nonlinear projections in the original sample space, so they can capture complex structures of the data distribution. We show that the hypersurfaces can be optimized by gradient ascent efficiently. We provide the condition under which the ASWD is a valid metric and show that this can be obtained by an injective neural network architecture. Numerical results demonstrate that the ASWD significantly outperforms other Wasserstein variants for both synthetic and real-world problems.

preprint2022arXiv

Batch-Ensemble Stochastic Neural Networks for Out-of-Distribution Detection

Out-of-distribution (OOD) detection has recently received much attention from the machine learning community due to its importance in deploying machine learning models in real-world applications. In this paper we propose an uncertainty quantification approach by modelling the distribution of features. We further incorporate an efficient ensemble mechanism, namely batch-ensemble, to construct the batch-ensemble stochastic neural networks (BE-SNNs) and overcome the feature collapse problem. We compare the performance of the proposed BE-SNNs with the other state-of-the-art approaches and show that BE-SNNs yield superior performance on several OOD benchmarks, such as the Two-Moons dataset, the FashionMNIST vs MNIST dataset, FashionMNIST vs NotMNIST dataset, and the CIFAR10 vs SVHN dataset.

preprint2022arXiv

Boosting Independent Component Analysis

Independent component analysis is intended to recover the mutually independent components from their linear mixtures. This technique has been widely used in many fields, such as data analysis, signal processing, and machine learning. To alleviate the dependency on prior knowledge concerning unknown sources, many nonparametric methods have been proposed. In this paper, we present a novel boosting-based algorithm for independent component analysis. Our algorithm consists of maximizing likelihood estimation via boosting and seeking unmixing matrix by the fixed-point method. A variety of experiments validate its performance compared with many of the presently known algorithms.

preprint2022arXiv

CLSEG: Contrastive Learning of Story Ending Generation

Story Ending Generation (SEG) is a challenging task in natural language generation. Recently, methods based on Pre-trained Language Models (PLM) have achieved great prosperity, which can produce fluent and coherent story endings. However, the pre-training objective of PLM-based methods is unable to model the consistency between story context and ending. The goal of this paper is to adopt contrastive learning to generate endings more consistent with story context, while there are two main challenges in contrastive learning of SEG. First is the negative sampling of wrong endings inconsistent with story contexts. The second challenge is the adaptation of contrastive learning for SEG. To address these two issues, we propose a novel Contrastive Learning framework for Story Ending Generation (CLSEG), which has two steps: multi-aspect sampling and story-specific contrastive learning. Particularly, for the first issue, we utilize novel multi-aspect sampling mechanisms to obtain wrong endings considering the consistency of order, causality, and sentiment. To solve the second issue, we well-design a story-specific contrastive training strategy that is adapted for SEG. Experiments show that CLSEG outperforms baselines and can produce story endings with stronger consistency and rationality.

preprint2022arXiv

Conditional Measurement Density Estimation in Sequential Monte Carlo via Normalizing Flow

Tuning of measurement models is challenging in real-world applications of sequential Monte Carlo methods. Recent advances in differentiable particle filters have led to various efforts to learn measurement models through neural networks. But existing approaches in the differentiable particle filter framework do not admit valid probability densities in constructing measurement models, leading to incorrect quantification of the measurement uncertainty given state information. We propose to learn expressive and valid probability densities in measurement models through conditional normalizing flows, to capture the complex likelihood of measurements given states. We show that the proposed approach leads to improved estimation performance and faster training convergence in a visual tracking experiment.

preprint2022arXiv

Imitation Learning with Sinkhorn Distances

Imitation learning algorithms have been interpreted as variants of divergence minimization problems. The ability to compare occupancy measures between experts and learners is crucial in their effectiveness in learning from demonstrations. In this paper, we present tractable solutions by formulating imitation learning as minimization of the Sinkhorn distance between occupancy measures. The formulation combines the valuable properties of optimal transport metrics in comparing non-overlapping distributions with a cosine distance cost defined in an adversarially learned feature space. This leads to a highly discriminative critic network and optimal transport plan that subsequently guide imitation learning. We evaluate the proposed approach using both the reward metric and the Sinkhorn distance metric on a number of MuJoCo experiments. For the implementation and reproducing results please refer to the following repository https://github.com/gpapagiannis/sinkhorn-imitation.

preprint2022arXiv

Learning to Denoise Historical Music

We propose an audio-to-audio neural network model that learns to denoise old music recordings. Our model internally converts its input into a time-frequency representation by means of a short-time Fourier transform (STFT), and processes the resulting complex spectrogram using a convolutional neural network. The network is trained with both reconstruction and adversarial objectives on a synthetic noisy music dataset, which is created by mixing clean music with real noise samples extracted from quiet segments of old recordings. We evaluate our method quantitatively on held-out test examples of the synthetic dataset, and qualitatively by human rating on samples of actual historical recordings. Our results show that the proposed method is effective in removing noise, while preserving the quality and details of the original music.

preprint2022arXiv

Particle Flow Gaussian Particle Filter

State estimation in non-linear models is performed by tracking the posterior distribution recursively. A plethora of algorithms have been proposed for this task. Among them, the Gaussian particle filter uses a weighted set of particles to construct a Gaussian approximation to the posterior. In this paper, we propose to use invertible particle flow methods, derived under the Gaussian boundary conditions for a flow equation, to generate a proposal distribution close to the posterior. The resultant particle flow Gaussian particle filter (PFGPF) algorithm retains the asymptotic properties of Gaussian particle filters, with the potential for improved state estimation performance in high dimensional spaces. We compare the performance of PFGPF with the particle flow filters and particle flow particle filters in two challenging numerical simulation examples.

preprint2021arXiv

Real-time Speech Frequency Bandwidth Extension

In this paper we propose a lightweight model for frequency bandwidth extension of speech signals, increasing the sampling frequency from 8kHz to 16kHz while restoring the high frequency content to a level almost indistinguishable from the 16kHz ground truth. The model architecture is based on SEANet (Sound EnhAncement Network), a wave-to-wave fully convolutional model, which uses a combination of feature losses and adversarial losses to reconstruct an enhanced version of the input speech. In addition, we propose a variant of SEANet that can be deployed on-device in streaming mode, achieving an architectural latency of 16ms. When profiled on a single core of a mobile CPU, processing one 16ms frame takes only 1.5ms. The low latency makes it viable for bi-directional voice communication systems.

preprint2021arXiv

Second-order Approximation of Minimum Discrimination Information in Independent Component Analysis

Independent Component Analysis (ICA) is intended to recover the mutually independent sources from their linear mixtures, and F astICA is one of the most successful ICA algorithms. Although it seems reasonable to improve the performance of F astICA by introducing more nonlinear functions to the negentropy estimation, the original fixed-point method (approximate Newton method) in F astICA degenerates under this circumstance. To alleviate this problem, we propose a novel method based on the second-order approximation of minimum discrimination information (MDI). The joint maximization in our method is consisted of minimizing single weighted least squares and seeking unmixing matrix by the fixed-point method. Experimental results validate its efficiency compared with other popular ICA algorithms.

preprint2016arXiv

Dynamic routing for social information sharing

Today mobile users are intensively interconnected thanks to the emerging mobile social networks, where they share location-based information with each other when traveling on different routes and visit different areas of the city. In our model the information collected is aggregated over all users' trips and made publicly available as a public good. Due to information overlap, the total useful content amount increases with the diversity in path choices made by the users, and it is crucial to motivate selfish users to choose different paths despite the potentially higher costs associated with their trips. In this paper we combine the benefits from social information sharing with the fundamental routing problem where a unit mass of non-atomic selfish users decide their trips in a non-cooperative game by choosing between a high-cost and a low-cost path. To remedy the inefficient low-content equilibrium where all users choose to explore a single path (the low-cost path), we propose and analyse two new incentive mechanisms that can be used by the social network application, one based on side payments and the other on restricting access to content for users that choose the low cost path. We also obtain interesting price of anarchy results that show some fundamental tradeoffs between achieving path diversity and maintaining greater user participation, motivating a combined mechanism to further increase the social welfare. Our model extends classical dynamic routing in the case of externalities caused from traffic on different paths of the network.

preprint2015arXiv

A New Exact Algorithm for Traveling Salesman Problem with Time Complexity Interval (O(n^4), O(n^3*2^n))

Traveling salesman problem is a NP-hard problem. Until now, researchers have not found a polynomial time algorithm for traveling salesman problem. Among the existing algorithms, dynamic programming algorithm can solve the problem in time O(n^2*2^n) where n is the number of nodes in the graph. The branch-and-cut algorithm has been applied to solve the problem with a large number of nodes. However, branch-and-cut algorithm also has an exponential worst-case running time. In this paper, a new exact algorithm for traveling salesman problem is proposed. The algorithm can be used to solve an arbitrary instance of traveling salesman problem in real life and the time complexity interval of the algorithm is (O(n^4), O(n^3*2^n)). It means that for some instances, the algorithm can find the optimal solution in polynomial time although the algorithm also has an exponential worst-case running time. In other words, the algorithm tells us that not all the instances of traveling salesman problem need exponential time to compute the optimal solution. The algorithm of this paper can not only assist us to solve traveling salesman problem better, but also can assist us to deepen the comprehension of the relationship between NP-complete and P. Therefore, it is considerable in the further research on traveling salesman problem and NP-hard problem.

preprint2015arXiv

A New Single-Source Shortest Path Algorithm for Nonnegative Weight Graph

The single-source shortest path problem is a classical problem in the research field of graph algorithm. In this paper, a new single-source shortest path algorithm for nonnegative weight graph is proposed. The algorithm can compress multi-round Fibonacci heap operations to one round to save running time relative to Dijkstra's algorithm using Fibonacci heap. The time complexity of the algorithm is also O(m+nlogn) in the worst case, where m is the number of edges and n is the number of nodes. However, the bound can be linear in some case, for example, when edge weights of a graph are all the same and the hop count of the longest shortest path is much less than n.Based on the theoretical analyses, we demonstrate that the algorithm is faster than Dijkstra's algorithm using Fibonacci heap in average situation when n is large enough.

preprint2014arXiv

Defuzzify firstly or finally: Dose it matter in fuzzy DEMATEL under uncertain environment?

Decision-Making Trial and Evaluation Laboratory (DEMATEL) method is widely used in many real applications. With the desirable property of efficient handling with the uncertain information in decision making, the fuzzy DEMATEL is heavily studied. Recently, Dytczak and Ginda suggested to defuzzify the fuzzy numbers firstly and then use the classical DEMATEL to obtain the final result. In this short paper, we show that it is not reasonable in some situations. The results of defuzzification at the first step are not coincide with the results of defuzzification at the final step.It seems that the alternative is to defuzzification in the final step in fuzzy DEMATEL.

preprint2013arXiv

A brief network analysis of Artificial Intelligence publication

In this paper, we present an illustration to the history of Artificial Intelligence(AI) with a statistical analysis of publish since 1940. We collected and mined through the IEEE publish data base to analysis the geological and chronological variance of the activeness of research in AI. The connections between different institutes are showed. The result shows that the leading community of AI research are mainly in the USA, China, the Europe and Japan. The key institutes, authors and the research hotspots are revealed. It is found that the research institutes in the fields like Data Mining, Computer Vision, Pattern Recognition and some other fields of Machine Learning are quite consistent, implying a strong interaction between the community of each field. It is also showed that the research of Electronic Engineering and Industrial or Commercial applications are very active in California. Japan is also publishing a lot of papers in robotics. Due to the limitation of data source, the result might be overly influenced by the number of published articles, which is to our best improved by applying network keynode analysis on the research community instead of merely count the number of publish.

Yunpeng Li

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

DentalX: Context-Aware Dental Disease Detection with Radiographs

S^2tory: Story Spine Distillation for Movie Script Summarization

StreamVC: Real-Time Low-Latency Voice Conversion

Augmented Sliced Wasserstein Distances

Batch-Ensemble Stochastic Neural Networks for Out-of-Distribution Detection

Boosting Independent Component Analysis

CLSEG: Contrastive Learning of Story Ending Generation

Conditional Measurement Density Estimation in Sequential Monte Carlo via Normalizing Flow

Imitation Learning with Sinkhorn Distances

Learning to Denoise Historical Music

Particle Flow Gaussian Particle Filter

Real-time Speech Frequency Bandwidth Extension

Second-order Approximation of Minimum Discrimination Information in Independent Component Analysis

Dynamic routing for social information sharing

A New Exact Algorithm for Traveling Salesman Problem with Time Complexity Interval (O(n^4), O(n^3*2^n))

A New Single-Source Shortest Path Algorithm for Nonnegative Weight Graph

Defuzzify firstly or finally: Dose it matter in fuzzy DEMATEL under uncertain environment?

A brief network analysis of Artificial Intelligence publication