Source author record

Le Li

Le Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Computation and Language cond-mat.mes-hall cond-mat.mtrl-sci Information Retrieval Networking and Internet Architecture physics.app-ph physics.ins-det

Catalog footprint

What is connected

11works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

OmniSelect: Dynamic Modality-Aware Token Compression for Efficient Omni-modal Large Language Models

Omnimodal large language models (OmniLLMs) have recently gained increasing attention for unified audio-video understanding. However, processing long multimodal token sequences introduces substantial computational overhead, making efficient token compression crucial. Existing methods typically rely on fixed, modality-specific guidance, which fails to account for the varying importance of modalities across different queries. To address this limitation, we propose $\textbf{OmniSelect}$, a training-free, modality-adaptive token pruning framework that dynamically selects appropriate compression strategies for multimodal inputs. Specifically, we leverage a lightweight AudioCLIP model to estimate cross-modal relevance and categorize each input into three pruning regimes: Audio-Centric, Video-Centric, and Uniform pruning. Based on these relevance scores, OmniSelect further performs fine-grained token pruning within each temporal group, adaptively allocating pruning ratios to preserve informative tokens across modalities. By explicitly modeling modality preference and enabling dynamic strategy selection, OmniSelect effectively avoids the pitfalls of one-size-fits-all compression. Extensive experiments demonstrate that our method achieves efficient multimodal token reduction while maintaining strong performance, without requiring any additional training.

preprint2022arXiv

Youling: an AI-Assisted Lyrics Creation System

Recently, a variety of neural models have been proposed for lyrics generation. However, most previous work completes the generation process in a single pass with little human intervention. We believe that lyrics creation is a creative process with human intelligence centered. AI should play a role as an assistant in the lyrics creation process, where human interactions are crucial for high-quality creation. This paper demonstrates \textit{Youling}, an AI-assisted lyrics creation system, designed to collaborate with music creators. In the lyrics generation process, \textit{Youling} supports traditional one pass full-text generation mode as well as an interactive generation mode, which allows users to select the satisfactory sentences from generated candidates conditioned on preceding context. The system also provides a revision module which enables users to revise undesired sentences or words of lyrics repeatedly. Besides, \textit{Youling} allows users to use multifaceted attributes to control the content and format of generated lyrics. The demo video of the system is available at https://youtu.be/DFeNpHk0pm4.

preprint2020arXiv

Controllable Descendant Face Synthesis

Kinship face synthesis is an interesting topic raised to answer questions like "what will your future children look like?". Published approaches to this topic are limited. Most of the existing methods train models for one-versus-one kin relation, which only consider one parent face and one child face by directly using an auto-encoder without any explicit control over the resemblance of the synthesized face to the parent face. In this paper, we propose a novel method for controllable descendant face synthesis, which models two-versus-one kin relation between two parent faces and one child face. Our model consists of an inheritance module and an attribute enhancement module, where the former is designed for accurate control over the resemblance between the synthesized face and parent faces, and the latter is designed for control over age and gender. As there is no large scale database with father-mother-child kinship annotation, we propose an effective strategy to train the model without using the ground truth descendant faces. No carefully designed image pairs are required for learning except only age and gender labels of training faces. We conduct comprehensive experimental evaluations on three public benchmark databases, which demonstrates encouraging results.

preprint2020arXiv

Fast response deep-ultraviolet photodetector based on high-quality single-crystalline CVD diamond

A high-performance fast response deep ultraviolet (UV) photodetectors with interdigitated Ti/Au planar electrodes using a lithography technology has fabricated on homoepitaxial diamond. The device shows a high ultraviolet photocurrent at 213 nm, which is eight orders of magnitude higher than the dark current at the bias voltages of 30 V. In addition, the time-resolved photoresponse measurements using a mechanical method and a pulsed 213 nm laser show that a good cycling, low persistent photoconductivity, and transient time-resolved response of up to 2.4 ns.

preprint2020arXiv

Joint Face Completion and Super-resolution using Multi-scale Feature Relation Learning

Previous research on face restoration often focused on repairing a specific type of low-quality facial images such as low-resolution (LR) or occluded facial images. However, in the real world, both the above-mentioned forms of image degradation often coexist. Therefore, it is important to design a model that can repair LR occluded images simultaneously. This paper proposes a multi-scale feature graph generative adversarial network (MFG-GAN) to implement the face restoration of images in which both degradation modes coexist, and also to repair images with a single type of degradation. Based on the GAN, the MFG-GAN integrates the graph convolution and feature pyramid network to restore occluded low-resolution face images to non-occluded high-resolution face images. The MFG-GAN uses a set of customized losses to ensure that high-quality images are generated. In addition, we designed the network in an end-to-end format. Experimental results on the public-domain CelebA and Helen databases show that the proposed approach outperforms state-of-the-art methods in performing face super-resolution (up to 4x or 8x) and face completion simultaneously. Cross-database testing also revealed that the proposed approach has good generalizability.

preprint2019arXiv

Facial Expression Restoration Based on Improved Graph Convolutional Networks

Facial expression analysis in the wild is challenging when the facial image is with low resolution or partial occlusion. Considering the correlations among different facial local regions under different facial expressions, this paper proposes a novel facial expression restoration method based on generative adversarial network by integrating an improved graph convolutional network (IGCN) and region relation modeling block (RRMB). Unlike conventional graph convolutional networks taking vectors as input features, IGCN can use tensors of face patches as inputs. It is better to retain the structure information of face patches. The proposed RRMB is designed to address facial generative tasks including inpainting and super-resolution with facial action units detection, which aims to restore facial expression as the ground-truth. Extensive experiments conducted on BP4D and DISFA benchmarks demonstrate the effectiveness of our proposed method through quantitative and qualitative evaluations.

preprint2014arXiv

Document Clustering Based On Max-Correntropy Non-Negative Matrix Factorization

Nonnegative matrix factorization (NMF) has been successfully applied to many areas for classification and clustering. Commonly-used NMF algorithms mainly target on minimizing the $l_2$ distance or Kullback-Leibler (KL) divergence, which may not be suitable for nonlinear case. In this paper, we propose a new decomposition method by maximizing the correntropy between the original and the product of two low-rank matrices for document clustering. This method also allows us to learn the new basis vectors of the semantic feature space from the data. To our knowledge, we haven't seen any work has been done by maximizing correntropy in NMF to cluster high dimensional document data. Our experiment results show the supremacy of our proposed method over other variants of NMF algorithm on Reuters21578 and TDT2 databasets.

preprint2014arXiv

Graph Regularized Non-negative Matrix Factorization By Maximizing Correntropy

Non-negative matrix factorization (NMF) has proved effective in many clustering and classification tasks. The classic ways to measure the errors between the original and the reconstructed matrix are $l_2$ distance or Kullback-Leibler (KL) divergence. However, nonlinear cases are not properly handled when we use these error measures. As a consequence, alternative measures based on nonlinear kernels, such as correntropy, are proposed. However, the current correntropy-based NMF only targets on the low-level features without considering the intrinsic geometrical distribution of data. In this paper, we propose a new NMF algorithm that preserves local invariance by adding graph regularization into the process of max-correntropy-based matrix factorization. Meanwhile, each feature can learn corresponding kernel from the data. The experiment results of Caltech101 and Caltech256 show the benefits of such combination against other NMF algorithms for the unsupervised image clustering.

preprint2014arXiv

Location Aided Energy Balancing Strategy in Green Cellular Networks

Most cellular network communication strategies are focused on data traffic scenarios rather than energy balance and efficient utilization. Thus mobile users in hot cells may suffer from low throughput due to energy loading imbalance problem. In state of art cellular network technologies, relay stations extend cell coverage and enhance signal strength for mobile users. However, busy traffic makes the relay stations in hot area run out of energy quickly. In this paper, we propose an energy balancing strategy in which the mobile nodes are able to dynamically select and hand over to the relay station with the highest potential energy capacity to resume communication. Key to the strategy is that each relay station merely maintains two parameters that contains the trend of its previous energy consumption and then predicts its future quantity of energy, which is defined as the relay station potential energy capacity. Then each mobile node can select the relay station with the highest potential energy capacity. Simulations demonstrate that our approach significantly increase the aggregate throughput and the average life time of relay stations in cellular network environment.

preprint2014arXiv

Strong shape dependence of the Morin transition in alpha-Fe2O3 single-crystalline nanostructures

Single-crystalline alpha-Fe2O3 nanorings (short nanotubes) and nanotubes were synthesized by a hydrothermal method. High-resolution transmission electron microscope and selected-area electron diffraction confirm that the axial directions of both nanorings and nanotubes are parallel to the crystalline c-axis. What is intriguing is that the Morin transition occurs at about 210 K in the short nanotubes with a mean tube length of about 115 nm and a mean outer diameter of 169 nm while it disappears in the nanotubes with a mean tube length of about 317 nm and a mean outer diameter of 148 nm. Detailed analyses of magnetization data, x-ray diffraction spectra, and room-temperature Mossbauer spectra demonstrate that this very strong shape dependence of the Morin transition is intrinsic to hematite. We can quantitatively explain this intriguing shape dependence in terms of opposite signs of the surface magnetic anisotropy constants in the surface planes parallel and perpendicular to the c-axis (that is, K_parallel = -0.37 erg/cm^2 and K_perp = 0.42 erg/cm^{2}).

preprint2013arXiv

Adaptive Learning of Region-based pLSA Model for Total Scene Annotation

In this paper, we present a region-based pLSA model to accomplish the task of total scene annotation. To be more specific, we not only properly generate a list of tags for each image, but also localizing each region with its corresponding tag. We integrate advantages of different existing region-based works: employ efficient and powerful JSEG algorithm for segmentation so that each region can easily express meaningful object information; the introduction of pLSA model can help better capturing semantic information behind the low-level features. Moreover, we also propose an adaptive padding mechanism to automatically choose the optimal padding strategy for each region, which directly increases the overall system performance. Finally we conduct 3 experiments to verify our ideas on Corel database and demonstrate the effectiveness and accuracy of our system.

Le Li

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

OmniSelect: Dynamic Modality-Aware Token Compression for Efficient Omni-modal Large Language Models

Youling: an AI-Assisted Lyrics Creation System

Controllable Descendant Face Synthesis

Fast response deep-ultraviolet photodetector based on high-quality single-crystalline CVD diamond

Joint Face Completion and Super-resolution using Multi-scale Feature Relation Learning

Facial Expression Restoration Based on Improved Graph Convolutional Networks

Document Clustering Based On Max-Correntropy Non-Negative Matrix Factorization

Graph Regularized Non-negative Matrix Factorization By Maximizing Correntropy

Location Aided Energy Balancing Strategy in Green Cellular Networks

Strong shape dependence of the Morin transition in alpha-Fe2O3 single-crystalline nanostructures

Adaptive Learning of Region-based pLSA Model for Total Scene Annotation