Source author record

Yeqing Li

Yeqing Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning Information Theory math.IT

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models

Utilizing vision and language models (VLMs) pre-trained on large-scale image-text pairs is becoming a promising paradigm for open-vocabulary visual recognition. In this work, we extend this paradigm by leveraging motion and audio that naturally exist in video. We present \textbf{MOV}, a simple yet effective method for \textbf{M}ultimodal \textbf{O}pen-\textbf{V}ocabulary video classification. In MOV, we directly use the vision encoder from pre-trained VLMs with minimal modifications to encode video, optical flow and audio spectrogram. We design a cross-modal fusion mechanism to aggregate complimentary multimodal information. Experiments on Kinetics-700 and VGGSound show that introducing flow or audio modality brings large performance gains over the pre-trained VLM and existing methods. Specifically, MOV greatly improves the accuracy on base classes, while generalizes better on novel classes. MOV achieves state-of-the-art results on UCF and HMDB zero-shot video classification benchmarks, significantly outperforming both traditional zero-shot methods and recent methods based on VLMs. Code and models will be released.

preprint2015arXiv

SIRF: Simultaneous Image Registration and Fusion in A Unified Framework

In this paper, we propose a novel method for image fusion with a high-resolution panchromatic image and a low-resolution multispectral image at the same geographical location. The fusion is formulated as a convex optimization problem which minimizes a linear combination of a least-squares fitting term and a dynamic gradient sparsity regularizer. The former is to preserve accurate spectral information of the multispectral image, while the latter is to keep sharp edges of the high-resolution panchromatic image. We further propose to simultaneously register the two images during the fusing process, which is naturally achieved by virtue of the dynamic gradient sparsity property. An efficient algorithm is then devised to solve the optimization problem, accomplishing a linear computational complexity in the size of the output image in each iteration. We compare our method against seven state-of-the-art image fusion methods on multispectral image datasets from four satellites. Extensive experimental results demonstrate that the proposed method substantially outperforms the others in terms of both spatial and spectral qualities. We also show that our method can provide high-quality products from coarsely registered real-world datasets. Finally, a MATLAB implementation is provided to facilitate future research.

preprint2014arXiv

Forest Sparsity for Multi-channel Compressive Sensing

In this paper, we investigate a new compressive sensing model for multi-channel sparse data where each channel can be represented as a hierarchical tree and different channels are highly correlated. Therefore, the full data could follow the forest structure and we call this property as \emph{forest sparsity}. It exploits both intra- and inter- channel correlations and enriches the family of existing model-based compressive sensing theories. The proposed theory indicates that only $\mathcal{O}(Tk+\log(N/k))$ measurements are required for multi-channel data with forest sparsity, where $T$ is the number of channels, $N$ and $k$ are the length and sparsity number of each channel respectively. This result is much better than $\mathcal{O}(Tk+T\log(N/k))$ of tree sparsity, $\mathcal{O}(Tk+k\log(N/k))$ of joint sparsity, and far better than $\mathcal{O}(Tk+Tk\log(N/k))$ of standard sparsity. In addition, we extend the forest sparsity theory to the multiple measurement vectors problem, where the measurement matrix is a block-diagonal matrix. The result shows that the required measurement bound can be the same as that for dense random measurement matrix, when the data shares equal energy in each channel. A new algorithm is developed and applied on four example applications to validate the benefit of the proposed model. Extensive experiments demonstrate the effectiveness and efficiency of the proposed theory and algorithm.