Source author record

Siwei Feng

Siwei Feng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning Computation and Language Robotics

Catalog footprint

What is connected

5works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

OmniSelect: Dynamic Modality-Aware Token Compression for Efficient Omni-modal Large Language Models

Omnimodal large language models (OmniLLMs) have recently gained increasing attention for unified audio-video understanding. However, processing long multimodal token sequences introduces substantial computational overhead, making efficient token compression crucial. Existing methods typically rely on fixed, modality-specific guidance, which fails to account for the varying importance of modalities across different queries. To address this limitation, we propose $\textbf{OmniSelect}$, a training-free, modality-adaptive token pruning framework that dynamically selects appropriate compression strategies for multimodal inputs. Specifically, we leverage a lightweight AudioCLIP model to estimate cross-modal relevance and categorize each input into three pruning regimes: Audio-Centric, Video-Centric, and Uniform pruning. Based on these relevance scores, OmniSelect further performs fine-grained token pruning within each temporal group, adaptively allocating pruning ratios to preserve informative tokens across modalities. By explicitly modeling modality preference and enabling dynamic strategy selection, OmniSelect effectively avoids the pitfalls of one-size-fits-all compression. Extensive experiments demonstrate that our method achieves efficient multimodal token reduction while maintaining strong performance, without requiring any additional training.

preprint2022arXiv

Federated Learning for Personalized Humor Recognition

Computational understanding of humor is an important topic under creative language understanding and modeling. It can play a key role in complex human-AI interactions. The challenge here is that human perception of humorous content is highly subjective. The same joke may receive different funniness ratings from different readers. This makes it highly challenging for humor recognition models to achieve personalization in practical scenarios. Existing approaches are generally designed based on the assumption that users have a consensus on whether a given text is humorous or not. Thus, they cannot handle diverse humor preferences well. In this paper, we propose the FedHumor approach for the recognition of humorous content in a personalized manner through Federated Learning (FL). Extending a pre-trained language model, FedHumor guides the fine-tuning process by considering diverse distributions of humor preferences from individuals. It incorporates a diversity adaptation strategy into the FL paradigm to train a personalized humor recognition model. To the best of our knowledge, FedHumor is the first text-based personalized humor recognition model through federated learning. Extensive experiments demonstrate the advantage of FedHumor in recognizing humorous texts compared to nine state-of-the-art humor recognition approaches with superior capability for handling the diversity in humor labels produced by users with diverse preferences.

preprint2022arXiv

Polynomial Time Near-Time-Optimal Multi-Robot Path Planning in Three Dimensions with Applications to Large-Scale UAV Coordination

For enabling efficient, large-scale coordination of unmanned aerial vehicles (UAVs) under the labeled setting, in this work, we develop the first polynomial time algorithm for the reconfiguration of many moving bodies in three-dimensional spaces, with provable $1.x$ asymptotic makespan optimality guarantee under high robot density. More precisely, on an $m_1\times m_2 \times m_3$ grid, $m_1\ge m_2\ge m_3$, our method computes solutions for routing up to $\frac{m_1m_2m_3}{3}$ uniquely labeled robots with uniformly randomly distributed start and goal configurations within a makespan of $m_1 + 2m_2 +2m_3+o(m_1)$, with high probability. Because the makespan lower bound for such instances is $m_1 + m_2+m_3 - o(m_1)$, also with high probability, as $m_1 \to \infty$, $\frac{m_1+2m_2+2m_3}{m_1+m_2+m_3}$ optimality guarantee is achieved. $\frac{m_1+2m_2+2m_3}{m_1+m_2+m_3} \in (1, \frac{5}{3}]$, yielding $1.x$ optimality. In contrast, it is well-known that multi-robot path planning is NP-hard to optimally solve. In numerical evaluations, our method readily scales to support the motion planning of over $100,000$ robots in 3D while simultaneously achieving $1.x$ optimality. We demonstrate the application of our method in coordinating many quadcopters in both simulation and hardware experiments.

preprint2020arXiv

Multi-Participant Multi-Class Vertical Federated Learning

Federated learning (FL) is a privacy-preserving paradigm for training collective machine learning models with locally stored data from multiple participants. Vertical federated learning (VFL) deals with the case where participants sharing the same sample ID space but having different feature spaces, while label information is owned by one participant. Current studies of VFL only support two participants, and mostly focus on binaryclass logistic regression problems. In this paper, we propose the Multi-participant Multi-class Vertical Federated Learning (MMVFL) framework for multi-class VFL problems involving multiple parties. Extending the idea of multi-view learning (MVL), MMVFL enables label sharing from its owner to other VFL participants in a privacypreserving manner. To demonstrate the effectiveness of MMVFL, a feature selection scheme is incorporated into MMVFL to compare its performance against supervised feature selection and MVL-based approaches. Experiment results on real-world datasets show that MMVFL can effectively share label information among multiple VFL participants and match multi-class classification performance of existing approaches.

preprint2016arXiv

Wavelet-Based Semantic Features for Hyperspectral Signature Discrimination

Hyperspectral signature classification is a quantitative analysis approach for hyperspectral imagery which performs detection and classification of the constituent materials at the pixel level in the scene. The classification procedure can be operated directly on hyperspectral data or performed by using some features extracted from the corresponding hyperspectral signatures containing information like the signature's energy or shape. In this paper, we describe a technique that applies non-homogeneous hidden Markov chain (NHMC) models to hyperspectral signature classification. The basic idea is to use statistical models (such as NHMC) to characterize wavelet coefficients which capture the spectrum semantics (i.e., structural information) at multiple levels. Experimental results show that the approach based on NHMC models can outperform existing approaches relevant in classification tasks.