Source author record

Deyu Meng

Deyu Meng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning eess.IV Artificial Intelligence Data Structures and Algorithms Databases Information Retrieval Multimedia Numerical Analysis physics.comp-ph physics.geo-ph

Catalog footprint

What is connected

39works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Distributional View for Visual Mechanistic Interpretability: KL-Minimal Soft-Constraint Principle

Most current paradigms in visual mechanistic interpretability (MI) remain confined to interpreting internal units of the vision model via heuristic methods (e.g., top-$K$ activation retrieval or optimization with regularization). In this work, we establish a theoretical distributional view for visual MI, which models the influence of a feature activation on the natural image distribution, thereby formulating a Kullback-Leibler (KL)-minimal optimization problem to model the MI task. Under this framework, statistical biases are identified within previous MI paradigms, which reveal that they may either be perceptually uninterpretable to humans (i.e., deviate from the natural image distribution), or mechanistically unfaithful to the vision models (i.e., unable to activate model features). To resolve the biases under the distributional view, we propose a model with a KL-minimal soft-constraint principle for visual MI that theoretically balances interpretability and faithfulness. We realize this principle via energy-guided diffusion posterior sampling. Extensive experiments validate the theoretical soundness of the proposed distributional view and demonstrate the practical effectiveness of our paradigm on the DINOv3 vision model.

preprint2026arXiv

Aligning Network Equivariance with Data Symmetry: A Theoretical Framework and Adaptive Approach for Image Restoration

Image restoration is an inherently ill posed inverse problem. Equivariant networks that embed geometric symmetry priors can mitigate this ill posedness and improve performance. However, current understanding of the relationship between network equivariance and data symmetry remains largely heuristic. Particularly for real world data with imperfect symmetry, existing research lacks a systematic theoretical framework to quantify symmetry, select transformation groups, or evaluate model data alignment. To bridge this gap, we conduct an analysis from an optimization perspective and formalize the intrinsic relationship among data symmetry priors, model equivariance, and generalization capability. Specifically, we propose for the first time a quantifiable definition of non strict symmetry at the dataset level (rather than sample level) and use it as a constraint to formulate the restoration inverse problem. We then show that the equivariance for restoration models can be naturally derived from this inverse problems incorporated the proposed symmetry constraints, and that the equivariance error of the optimal restoration operator is strictly bounded by the data symmetry error and the discretization mesh size. Furthermore, by analyzing the network's empirical risk, we demonstrate that aligning equivariance with data symmetry optimizes the bias variance trade off, minimizing the total expected risk. Guided by these insights, we propose a Sample Adaptive Equivariant Network that uses a hypernetwork and transformation learnable equivariant convolutions to dynamically align with each sample's inherent symmetry. Extensive experiments on super resolution, denoising, and deraining validate our theoretical findings and show significant superiority over standard baselines and traditional equivariant models. Our code and supplementary material are available at https://github.com/tanfy929/SA-Conv.

preprint2026arXiv

Deciphering Neural Reparameterized Full-Waveform Inversion with Neural Sensitivity Kernel and Wave Tangent Kernel

Full-waveform inversion (FWI) estimates unknown parameters in the wave equation from limited boundary measurements. Recent advances in neural reparameterized FWI (NeurFWI) demonstrate that representing the parameters using a neural network can reduce the reliance on the high-quality initial model and wavefield data, at the cost of slow high-resolution convergence. However, its underlying theoretical mechanism remains unclear. In this study, we establish the neural sensitivity kernel (NSK) and the wave tangent kernel (WTK) to analyze their convergence behavior from both model and data domains. These theoretical frameworks show that the neural tangent kernel (NTK) induced by neural representation adaptively modulates the original sensitivity and wave tangent kernels. This modulation leads to several key outcomes, i.e., the spectral filtering effect, the gradient wavenumber modulation, and the wave frequency bias, connecting the convergence behavior of NeurFWI with the eigen-structures of NSK and WTK. Building on these insights, we propose several enhanced NeurFWI methods with tailored eigen-structures in NSK and WTK to improve inversion performances and efficiency. We numerically validate these theoretical claims and the proposed methods in seismic exploration, and firstly extend their application to medical imaging.

preprint2026arXiv

HIR-ALIGN: Enhancing Hyperspectral Image Restoration via Diffusion-Based Data Generation

Hyperspectral image (HSI) restoration is crucial for reliable analysis, as real HSIs suffer from degradations like noise, blur, and resolution loss. However, existing models trained on source data often fail on target domains lacking clean references, a common occurrence in practice. To address this issue, we present HIR-ALIGN, a plug-and-play target-adaptive augmentation framework that enhances hyperspectral image restoration by augmenting limited training images with synthetic data that closely matches the target distribution using no extra data. It consists of three stages: (i) proxy generation, where off-the-shelf restoration models restore degraded target observations to produce semantics-preserving proxy HSIs that approximate target-domain clean images; (ii) distribution-adaptive synthesis, where a blur-robust unCLIP diffusion model generates target-aligned RGBs from proxy RGBs, with prompt conditioning and embedding-space noise initialization. Then, a warp-based spectral transfer module synthesizes HSIs by aligning each generated RGB with the proxy RGB, estimating soft patch-wise transport weights, and applying these weights and learnable local interpolation kernels to the proxy HSI; and (iii) aligned supervised finetuning, where restoration networks pretrained on the source distribution are finetuned using both the proxy HSIs and synthesized target-aligned HSIs, and are then deployed on degraded target images. We further provide theoretical analysis showing that augmentation-based finetuning can achieve lower target-domain restoration risk by jointly improving target distribution coverage and controlling spectral bias. Extensive experiments on simulated and real datasets across denoising and super-resolution tasks demonstrate that HIR-ALIGN consistently improves source-only supervised baselines, outperforming both source-only counterparts and representative unsupervised methods.

preprint2025arXiv

HIDFlowNet: A Flow-Based Deep Network for Hyperspectral Image Denoising

Hyperspectral image (HSI) denoising is essentially ill-posed since a noisy HSI can be degraded from multiple clean HSIs. However, existing deep learning (DL)-based approaches only restore one clean HSI from the given noisy HSI with a deterministic mapping, thus ignoring the ill-posed issue and always resulting in an over-smoothing problem. Additionally, these DL-based methods often neglect that noise is part of the high-frequency component and their network architectures fail to decouple the learning of low-frequency and high-frequency. To alleviate these issues, this paper proposes a flow-based HSI denoising network (HIDFlowNet) to directly learn the conditional distribution of the clean HSI given the noisy HSI and thus diverse clean HSIs can be sampled from the conditional distribution. Overall, our HIDFlowNet is induced from the generative flow model and is comprised of an invertible decoder and a conditional encoder, which can explicitly decouple the learning of low-frequency and high-frequency information of HSI. Specifically, the invertible decoder is built by staking a succession of invertible conditional blocks (ICBs) to capture the local high-frequency details. The conditional encoder utilizes down-sampling operations to obtain low-resolution images and uses transformers to capture correlations over a long distance so that global low-frequency information can be effectively extracted. Extensive experiments on simulated and real HSI datasets verify that our proposed HIDFlowNet can obtain better or comparable results compared with other state-of-the-art methods.

preprint2024arXiv

Gramformer: Learning Crowd Counting via Graph-Modulated Transformer

Transformer has been popular in recent crowd counting work since it breaks the limited receptive field of traditional CNNs. However, since crowd images always contain a large number of similar patches, the self-attention mechanism in Transformer tends to find a homogenized solution where the attention maps of almost all patches are identical. In this paper, we address this problem by proposing Gramformer: a graph-modulated transformer to enhance the network by adjusting the attention and input node features respectively on the basis of two different types of graphs. Firstly, an attention graph is proposed to diverse attention maps to attend to complementary information. The graph is building upon the dissimilarities between patches, modulating the attention in an anti-similarity fashion. Secondly, a feature-based centrality encoding is proposed to discover the centrality positions or importance of nodes. We encode them with a proposed centrality indices scheme to modulate the node features and similarity relationships. Extensive experiments on four challenging crowd counting datasets have validated the competitiveness of the proposed method. Code is available at {https://github.com/LoraLinH/Gramformer}.

preprint2024arXiv

Revisiting Nonlocal Self-Similarity from Continuous Representation

Nonlocal self-similarity (NSS) is an important prior that has been successfully applied in multi-dimensional data processing tasks, e.g., image and video recovery. However, existing NSS-based methods are solely suitable for meshgrid data such as images and videos, but are not suitable for emerging off-meshgrid data, e.g., point cloud and climate data. In this work, we revisit the NSS from the continuous representation perspective and propose a novel Continuous Representation-based NonLocal method (termed as CRNL), which has two innovative features as compared with classical nonlocal methods. First, based on the continuous representation, our CRNL unifies the measure of self-similarity for on-meshgrid and off-meshgrid data and thus is naturally suitable for both of them. Second, the nonlocal continuous groups can be more compactly and efficiently represented by the coupled low-rank function factorization, which simultaneously exploits the similarity within each group and across different groups, while classical nonlocal methods neglect the similarity across groups. This elaborately designed coupled mechanism allows our method to enjoy favorable performance over conventional NSS methods in terms of both effectiveness and efficiency. Extensive multi-dimensional data processing experiments on-meshgrid (e.g., image inpainting and image denoising) and off-meshgrid (e.g., climate data prediction and point cloud recovery) validate the versatility, effectiveness, and efficiency of our CRNL as compared with state-of-the-art methods.

preprint2022arXiv

An Efficient and Accurate Rough Set for Feature Selection, Classification and Knowledge Representation

This paper present a strong data mining method based on rough set, which can realize feature selection, classification and knowledge representation at the same time. Rough set has good interpretability, and is a popular method for feature selections. But low efficiency and low accuracy are its main drawbacks that limits its application ability. In this paper,corresponding to the accuracy, we first find the ineffectiveness of rough set because of overfitting, especially in processing noise attribute, and propose a robust measurement for an attribute, called relative importance.we proposed the concept of "rough concept tree" for knowledge representation and classification. Experimental results on public benchmark data sets show that the proposed framework achieves higher accurcy than seven popular or the state-of-the-art feature selection methods.

preprint2022arXiv

Blind Image Super-resolution with Elaborate Degradation Modeling on Noise and Kernel

While researches on model-based blind single image super-resolution (SISR) have achieved tremendous successes recently, most of them do not consider the image degradation sufficiently. Firstly, they always assume image noise obeys an independent and identically distributed (i.i.d.) Gaussian or Laplacian distribution, which largely underestimates the complexity of real noise. Secondly, previous commonly-used kernel priors (e.g., normalization, sparsity) are not effective enough to guarantee a rational kernel solution, and thus degenerates the performance of subsequent SISR task. To address the above issues, this paper proposes a model-based blind SISR method under the probabilistic framework, which elaborately models image degradation from the perspectives of noise and blur kernel. Specifically, instead of the traditional i.i.d. noise assumption, a patch-based non-i.i.d. noise model is proposed to tackle the complicated real noise, expecting to increase the degrees of freedom of the model for noise representation. As for the blur kernel, we novelly construct a concise yet effective kernel generator, and plug it into the proposed blind SISR method as an explicit kernel prior (EKP). To solve the proposed model, a theoretically grounded Monte Carlo EM algorithm is specifically designed. Comprehensive experiments demonstrate the superiority of our method over current state-of-the-arts on synthetic and real datasets. The source code is available at https://github.com/zsyOAOA/BSRDM.

preprint2022arXiv

Decoupled-and-Coupled Networks: Self-Supervised Hyperspectral Image Super-Resolution with Subpixel Fusion

Enormous efforts have been recently made to super-resolve hyperspectral (HS) images with the aid of high spatial resolution multispectral (MS) images. Most prior works usually perform the fusion task by means of multifarious pixel-level priors. Yet the intrinsic effects of a large distribution gap between HS-MS data due to differences in the spatial and spectral resolution are less investigated. The gap might be caused by unknown sensor-specific properties or highly-mixed spectral information within one pixel (due to low spatial resolution). To this end, we propose a subpixel-level HS super-resolution framework by devising a novel decoupled-and-coupled network, called DC-Net, to progressively fuse HS-MS information from the pixel- to subpixel-level, from the image- to feature-level. As the name suggests, DC-Net first decouples the input into common (or cross-sensor) and sensor-specific components to eliminate the gap between HS-MS images before further fusion, and then fully blends them by a model-guided coupled spectral unmixing (CSU) net. More significantly, we append a self-supervised learning module behind the CSU net by guaranteeing the material consistency to enhance the detailed appearances of the restored HS product. Extensive experimental results show the superiority of our method both visually and quantitatively and achieve a significant improvement in comparison with the state-of-the-arts. Furthermore, the codes and datasets will be available at https://sites.google.com/view/danfeng-hong for the sake of reproducibility.

preprint2022arXiv

Diagnosing Batch Normalization in Class Incremental Learning

Extensive researches have applied deep neural networks (DNNs) in class incremental learning (Class-IL). As building blocks of DNNs, batch normalization (BN) standardizes intermediate feature maps and has been widely validated to improve training stability and convergence. However, we claim that the direct use of standard BN in Class-IL models is harmful to both the representation learning and the classifier training, thus exacerbating catastrophic forgetting. In this paper we investigate the influence of BN on Class-IL models by illustrating such BN dilemma. We further propose BN Tricks to address the issue by training a better feature extractor while eliminating classification bias. Without inviting extra hyperparameters, we apply BN Tricks to three baseline rehearsal-based methods, ER, DER++ and iCaRL. Through comprehensive experiments conducted on benchmark datasets of Seq-CIFAR-10, Seq-CIFAR-100 and Seq-Tiny-ImageNet, we show that BN Tricks can bring significant performance gains to all adopted baselines, revealing its potential generality along this line of research.

preprint2022arXiv

Low-light Image Enhancement by Retinex Based Algorithm Unrolling and Adjustment

Motivated by their recent advances, deep learning techniques have been widely applied to low-light image enhancement (LIE) problem. Among which, Retinex theory based ones, mostly following a decomposition-adjustment pipeline, have taken an important place due to its physical interpretation and promising performance. However, current investigations on Retinex based deep learning are still not sufficient, ignoring many useful experiences from traditional methods. Besides, the adjustment step is either performed with simple image processing techniques, or by complicated networks, both of which are unsatisfactory in practice. To address these issues, we propose a new deep learning framework for the LIE problem. The proposed framework contains a decomposition network inspired by algorithm unrolling, and adjustment networks considering both global brightness and local brightness sensitivity. By virtue of algorithm unrolling, both implicit priors learned from data and explicit priors borrowed from traditional methods can be embedded in the network, facilitate to better decomposition. Meanwhile, the consideration of global and local brightness can guide designing simple yet effective network modules for adjustment. Besides, to avoid manually parameter tuning, we also propose a self-supervised fine-tuning strategy, which can always guarantee a promising performance. Experiments on a series of typical LIE datasets demonstrated the effectiveness of the proposed method, both quantitatively and visually, as compared with existing methods.

preprint2022arXiv

Two-Stream Graph Convolutional Network for Intra-oral Scanner Image Segmentation

Precise segmentation of teeth from intra-oral scanner images is an essential task in computer-aided orthodontic surgical planning. The state-of-the-art deep learning-based methods often simply concatenate the raw geometric attributes (i.e., coordinates and normal vectors) of mesh cells to train a single-stream network for automatic intra-oral scanner image segmentation. However, since different raw attributes reveal completely different geometric information, the naive concatenation of different raw attributes at the (low-level) input stage may bring unnecessary confusion in describing and differentiating between mesh cells, thus hampering the learning of high-level geometric representations for the segmentation task. To address this issue, we design a two-stream graph convolutional network (i.e., TSGCN), which can effectively handle inter-view confusion between different raw attributes to more effectively fuse their complementary information and learn discriminative multi-view geometric representations. Specifically, our TSGCN adopts two input-specific graph-learning streams to extract complementary high-level geometric representations from coordinates and normal vectors, respectively. Then, these single-view representations are further fused by a self-attention module to adaptively balance the contributions of different views in learning more discriminative multi-view representations for accurate and fully automatic tooth segmentation. We have evaluated our TSGCN on a real-patient dataset of dental (mesh) models acquired by 3D intraoral scanners. Experimental results show that our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation. Github: https://github.com/ZhangLingMing1/TSGCNet.

preprint2022arXiv

Unsupervised Local Discrimination for Medical Images

Contrastive learning, which aims to capture general representation from unlabeled images to initialize the medical analysis models, has been proven effective in alleviating the high demand for expensive annotations. Current methods mainly focus on instance-wise comparisons to learn the global discriminative features, however, pretermitting the local details to distinguish tiny anatomical structures, lesions, and tissues. To address this challenge, in this paper, we propose a general unsupervised representation learning framework, named local discrimination (LD), to learn local discriminative features for medical images by closely embedding semantically similar pixels and identifying regions of similar structures across different images. Specifically, this model is equipped with an embedding module for pixel-wise embedding and a clustering module for generating segmentation. And these two modules are unified through optimizing our novel region discrimination loss function in a mutually beneficial mechanism, which enables our model to reflect structure information as well as measure pixel-wise and region-wise similarity. Furthermore, based on LD, we propose a center-sensitive one-shot landmark localization algorithm and a shape-guided cross-modality segmentation model to foster the generalizability of our model. When transferred to downstream tasks, the learned representation by our method shows a better generalization, outperforming representation from 18 state-of-the-art (SOTA) methods and winning 9 out of all 12 downstream tasks. Especially for the challenging lesion segmentation tasks, the proposed method achieves significantly better performances. The source codes are publicly available at https://github.com/HuaiChen-1994/LDLearning.

preprint2020arXiv

A Model-driven Deep Neural Network for Single Image Rain Removal

Deep learning (DL) methods have achieved state-of-the-art performance in the task of single image rain removal. Most of current DL architectures, however, are still lack of sufficient interpretability and not fully integrated with physical structures inside general rain streaks. To this issue, in this paper, we propose a model-driven deep neural network for the task, with fully interpretable network structures. Specifically, based on the convolutional dictionary learning mechanism for representing rain, we propose a novel single image deraining model and utilize the proximal gradient descent technique to design an iterative algorithm only containing simple operators for solving the model. Such a simple implementation scheme facilitates us to unfold it into a new deep network architecture, called rain convolutional dictionary network (RCDNet), with almost every network module one-to-one corresponding to each operation involved in the algorithm. By end-to-end training the proposed RCDNet, all the rain kernels and proximal operators can be automatically extracted, faithfully characterizing the features of both rain and clean background layers, and thus naturally lead to its better deraining performance, especially in real scenarios. Comprehensive experiments substantiate the superiority of the proposed network, especially its well generality to diverse testing scenarios and good interpretability for all its modules, as compared with state-of-the-arts both visually and quantitatively. The source codes are available at \url{https://github.com/hongwang01/RCDNet}.

preprint2020arXiv

Ball k-means

This paper presents a novel accelerated exact k-means algorithm called the Ball k-means algorithm, which uses a ball to describe a cluster, focusing on reducing the point-centroid distance computation. The Ball k-means can accurately find the neighbor clusters for each cluster resulting distance computations only between a point and its neighbor clusters' centroids instead of all centroids. Moreover, each cluster can be divided into a stable area and an active area, and the later one can be further divided into annulus areas. The assigned cluster of the points in the stable area is not changed in the current iteration while the points in the annulus area will be adjusted within a few neighbor clusters in the current iteration. Also, there are no upper or lower bounds in the proposed Ball k-means. Furthermore, reducing centroid-centroid distance computation between iterations makes it efficient for large k clustering. The fast speed, no extra parameters and simple design of the Ball k-means make it an all-around replacement of the naive k-means algorithm.

preprint2020arXiv

Cross-Attention in Coupled Unmixing Nets for Unsupervised Hyperspectral Super-Resolution

The recent advancement of deep learning techniques has made great progress on hyperspectral image super-resolution (HSI-SR). Yet the development of unsupervised deep networks remains challenging for this task. To this end, we propose a novel coupled unmixing network with a cross-attention mechanism, CUCaNet for short, to enhance the spatial resolution of HSI by means of higher-spatial-resolution multispectral image (MSI). Inspired by coupled spectral unmixing, a two-stream convolutional autoencoder framework is taken as backbone to jointly decompose MS and HS data into a spectrally meaningful basis and corresponding coefficients. CUCaNet is capable of adaptively learning spectral and spatial response functions from HS-MS correspondences by enforcing reasonable consistency assumptions on the networks. Moreover, a cross-attention module is devised to yield more effective spatial-spectral information transfer in networks. Extensive experiments are conducted on three widely-used HS-MS datasets in comparison with state-of-the-art HSI-SR models, demonstrating the superiority of the CUCaNet in the HSI-SR application. Furthermore, the codes and datasets will be available at: https://github.com/danfenghong/ECCV2020_CUCaNet.

preprint2020arXiv

Dual Adversarial Network: Toward Real-world Noise Removal and Noise Generation

Real-world image noise removal is a long-standing yet very challenging task in computer vision. The success of deep neural network in denoising stimulates the research of noise generation, aiming at synthesizing more clean-noisy image pairs to facilitate the training of deep denoisers. In this work, we propose a novel unified framework to simultaneously deal with the noise removal and noise generation tasks. Instead of only inferring the posteriori distribution of the latent clean image conditioned on the observed noisy image in traditional MAP framework, our proposed method learns the joint distribution of the clean-noisy image pairs. Specifically, we approximate the joint distribution with two different factorized forms, which can be formulated as a denoiser mapping the noisy image to the clean one and a generator mapping the clean image to the noisy one. The learned joint distribution implicitly contains all the information between the noisy and clean images, avoiding the necessity of manually designing the image priors and noise assumptions as traditional. Besides, the performance of our denoiser can be further improved by augmenting the original training dataset with the learned generator. Moreover, we propose two metrics to assess the quality of the generated noisy image, for which, to the best of our knowledge, such metrics are firstly proposed along this research line. Extensive experiments have been conducted to demonstrate the superiority of our method over the state-of-the-arts both in the real noise removal and generation tasks. The training and testing code is available at https://github.com/zsyOAOA/DANet.

preprint2020arXiv

Learning Adaptive Loss for Robust Learning with Noisy Labels

Robust loss minimization is an important strategy for handling robust learning issue on noisy labels. Current robust loss functions, however, inevitably involve hyperparameter(s) to be tuned, manually or heuristically through cross validation, which makes them fairly hard to be generally applied in practice. Besides, the non-convexity brought by the loss as well as the complicated network architecture makes it easily trapped into an unexpected solution with poor generalization capability. To address above issues, we propose a meta-learning method capable of adaptively learning hyperparameter in robust loss functions. Specifically, through mutual amelioration between robust loss hyperparameter and network parameters in our method, both of them can be simultaneously finely learned and coordinated to attain solutions with good generalization capability. Four kinds of SOTA robust loss functions are attempted to be integrated into our algorithm, and comprehensive experiments substantiate the general availability and effectiveness of the proposed method in both its accuracy and generalization performance, as compared with conventional hyperparameter tuning strategy, even with carefully tuned hyperparameters.

preprint2020arXiv

LT-Net: Label Transfer by Learning Reversible Voxel-wise Correspondence for One-shot Medical Image Segmentation

We introduce a one-shot segmentation method to alleviate the burden of manual annotation for medical images. The main idea is to treat one-shot segmentation as a classical atlas-based segmentation problem, where voxel-wise correspondence from the atlas to the unlabelled data is learned. Subsequently, segmentation label of the atlas can be transferred to the unlabelled data with the learned correspondence. However, since ground truth correspondence between images is usually unavailable, the learning system must be well-supervised to avoid mode collapse and convergence failure. To overcome this difficulty, we resort to the forward-backward consistency, which is widely used in correspondence problems, and additionally learn the backward correspondences from the warped atlases back to the original atlas. This cycle-correspondence learning design enables a variety of extra, cycle-consistency-based supervision signals to make the training process stable, while also boost the performance. We demonstrate the superiority of our method over both deep learning-based one-shot segmentation methods and a classical multi-atlas segmentation method via thorough experiments.

preprint2020arXiv

Meta Feature Modulator for Long-tailed Recognition

Deep neural networks often degrade significantly when training data suffer from class imbalance problems. Existing approaches, e.g., re-sampling and re-weighting, commonly address this issue by rearranging the label distribution of training data to train the networks fitting well to the implicit balanced label distribution. However, most of them hinder the representative ability of learned features due to insufficient use of intra/inter-sample information of training data. To address this issue, we propose meta feature modulator (MFM), a meta-learning framework to model the difference between the long-tailed training data and the balanced meta data from the perspective of representation learning. Concretely, we employ learnable hyper-parameters (dubbed modulation parameters) to adaptively scale and shift the intermediate features of classification networks, and the modulation parameters are optimized together with the classification network parameters guided by a small amount of balanced meta data. We further design a modulator network to guide the generation of the modulation parameters, and such a meta-learner can be readily adapted to train the classification network on other long-tailed datasets. Extensive experiments on benchmark vision datasets substantiate the superiority of our approach on long-tailed recognition tasks beyond other state-of-the-art methods.

preprint2020arXiv

Meta Transition Adaptation for Robust Deep Learning with Noisy Labels

To discover intrinsic inter-class transition probabilities underlying data, learning with noise transition has become an important approach for robust deep learning on corrupted labels. Prior methods attempt to achieve such transition knowledge by pre-assuming strongly confident anchor points with 1-probability belonging to a specific class, generally infeasible in practice, or directly jointly estimating the transition matrix and learning the classifier from the noisy samples, always leading to inaccurate estimation misguided by wrong annotation information especially in large noise cases. To alleviate these issues, this study proposes a new meta-transition-learning strategy for the task. Specifically, through the sound guidance of a small set of meta data with clean labels, the noise transition matrix and the classifier parameters can be mutually ameliorated to avoid being trapped by noisy training samples, and without need of any anchor point assumptions. Besides, we prove our method is with statistical consistency guarantee on correctly estimating the desired transition matrix. Extensive synthetic and real experiments validate that our method can more accurately extract the transition matrix, naturally following its more robust performance than prior arts. Its essential relationship with label distribution learning is also discussed, which explains its fine performance even under no-noise scenarios.

preprint2020arXiv

Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype Prediction

Current machine learning has made great progress on computer vision and many other fields attributed to the large amount of high-quality training samples, while it does not work very well on genomic data analysis, since they are notoriously known as small data. In our work, we focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients that can guide treatment decisions for a specific individual through training on small data. In fact, doctors and clinicians always address this problem by studying several interrelated clinical variables simultaneously. We attempt to simulate such clinical perspective, and introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks and transfer it to help address new tasks. Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification. Observing that gene expression data have specifically high dimensionality and high noise properties compared with image data, we proposed a new extension of it by appending two modules to address these issues. Concretely, we append a feature selection layer to automatically filter out the disease-irrelated genes and incorporate a sample reweighting strategy to adaptively remove noisy data, and meanwhile the extended model is capable of learning from a limited number of training examples and generalize well. Simulations and real gene expression data experiments substantiate the superiority of the proposed method for predicting the subtypes of disease and identifying potential disease-related genes.

preprint2020arXiv

Structural Residual Learning for Single Image Rain Removal

To alleviate the adverse effect of rain streaks in image processing tasks, CNN-based single image rain removal methods have been recently proposed. However, the performance of these deep learning methods largely relies on the covering range of rain shapes contained in the pre-collected training rainy-clean image pairs. This makes them easily trapped into the overfitting-to-the-training-samples issue and cannot finely generalize to practical rainy images with complex and diverse rain streaks. Against this generalization issue, this study proposes a new network architecture by enforcing the output residual of the network possess intrinsic rain structures. Such a structural residual setting guarantees the rain layer extracted by the network finely comply with the prior knowledge of general rain streaks, and thus regulates sound rain shapes capable of being well extracted from rainy images in both training and predicting stages. Such a general regularization function naturally leads to both its better training accuracy and testing generalization capability even for those non-seen rain configurations. Such superiority is comprehensively substantiated by experiments implemented on synthetic and real datasets both visually and quantitatively as compared with current state-of-the-art methods.

preprint2016arXiv

A novel learning-based frame pooling method for Event Detection

Detecting complex events in a large video collection crawled from video websites is a challenging task. When applying directly good image-based feature representation, e.g., HOG, SIFT, to videos, we have to face the problem of how to pool multiple frame feature representations into one feature representation. In this paper, we propose a novel learning-based frame pooling method. We formulate the pooling weight learning as an optimization problem and thus our method can automatically learn the best pooling weight configuration for each specific event category. Experimental results conducted on TRECVID MED 2011 reveal that our method outperforms the commonly used average pooling and max pooling strategies on both high-level and low-level 2D image features.

preprint2016arXiv

Exploiting Multi-modal Curriculum in Noisy Web Data for Large-scale Concept Learning

Learning video concept detectors automatically from the big but noisy web data with no additional manual annotations is a novel but challenging area in the multimedia and the machine learning community. A considerable amount of videos on the web are associated with rich but noisy contextual information, such as the title, which provides weak annotations or labels about the video content. To leverage the big noisy web labels, this paper proposes a novel method called WEbly-Labeled Learning (WELL), which is established on the state-of-the-art machine learning algorithm inspired by the learning process of human. WELL introduces a number of novel multi-modal approaches to incorporate meaningful prior knowledge called curriculum from the noisy web videos. To investigate this problem, we empirically study the curriculum constructed from the multi-modal features of the videos collected from YouTube and Flickr. The efficacy and the scalability of WELL have been extensively demonstrated on two public benchmarks, including the largest multimedia dataset and the largest manually-labeled video set. The comprehensive experimental results demonstrate that WELL outperforms state-of-the-art studies by a statically significant margin on learning concepts from noisy web video data. In addition, the results also verify that WELL is robust to the level of noisiness in the video data. Notably, WELL trained on sufficient noisy web labels is able to achieve a comparable accuracy to supervised learning methods trained on the clean manually-labeled data.

preprint2016arXiv

Low-rank Matrix Factorization under General Mixture Noise Distributions

Many computer vision problems can be posed as learning a low-dimensional subspace from high dimensional data. The low rank matrix factorization (LRMF) represents a commonly utilized subspace learning strategy. Most of the current LRMF techniques are constructed on the optimization problems using L1-norm and L2-norm losses, which mainly deal with Laplacian and Gaussian noises, respectively. To make LRMF capable of adapting more complex noise, this paper proposes a new LRMF model by assuming noise as Mixture of Exponential Power (MoEP) distributions and proposes a penalized MoEP (PMoEP) model by combining the penalized likelihood method with MoEP distributions. Such setting facilitates the learned LRMF model capable of automatically fitting the real noise through MoEP distributions. Each component in this mixture is adapted from a series of preliminary super- or sub-Gaussian candidates. Moreover, by facilitating the local continuity of noise components, we embed Markov random field into the PMoEP model and further propose the advanced PMoEP-MRF model. An Expectation Maximization (EM) algorithm and a variational EM (VEM) algorithm are also designed to infer the parameters involved in the proposed PMoEP and the PMoEP-MRF model, respectively. The superseniority of our methods is demonstrated by extensive experiments on synthetic data, face modeling, hyperspectral image restoration and background subtraction.

preprint2016arXiv

Strategies for Searching Video Content with Text Queries or Video Examples

The large number of user-generated videos uploaded on to the Internet everyday has led to many commercial video search engines, which mainly rely on text metadata for search. However, metadata is often lacking for user-generated videos, thus these videos are unsearchable by current search engines. Therefore, content-based video retrieval (CBVR) tackles this metadata-scarcity problem by directly analyzing the visual and audio streams of each video. CBVR encompasses multiple research topics, including low-level feature design, feature fusion, semantic detector training and video search/reranking. We present novel strategies in these topics to enhance CBVR in both accuracy and speed under different query inputs, including pure textual queries and query by video examples. Our proposed strategies have been incorporated into our submission for the TRECVID 2014 Multimedia Event Detection evaluation, where our system outperformed other submissions in both text queries and video example queries, thus demonstrating the effectiveness of our proposed approaches.

preprint2016arXiv

Total Variation Regularized Tensor RPCA for Background Subtraction from Compressive Measurements

Background subtraction has been a fundamental and widely studied task in video analysis, with a wide range of applications in video surveillance, teleconferencing and 3D modeling. Recently, motivated by compressive imaging, background subtraction from compressive measurements (BSCM) is becoming an active research task in video surveillance. In this paper, we propose a novel tensor-based robust PCA (TenRPCA) approach for BSCM by decomposing video frames into backgrounds with spatial-temporal correlations and foregrounds with spatio-temporal continuity in a tensor framework. In this approach, we use 3D total variation (TV) to enhance the spatio-temporal continuity of foregrounds, and Tucker decomposition to model the spatio-temporal correlations of video background. Based on this idea, we design a basic tensor RPCA model over the video frames, dubbed as the holistic TenRPCA model (H-TenRPCA). To characterize the correlations among the groups of similar 3D patches of video background, we further design a patch-group-based tensor RPCA model (PG-TenRPCA) by joint tensor Tucker decompositions of 3D patch groups for modeling the video background. Efficient algorithms using alternating direction method of multipliers (ADMM) are developed to solve the proposed models. Extensive experiments on simulated and real-world videos demonstrate the superiority of the proposed approaches over the existing state-of-the-art approaches.

preprint2016arXiv

What Objective Does Self-paced Learning Indeed Optimize?

Self-paced learning (SPL) is a recently raised methodology designed through simulating the learning principle of humans/animals. A variety of SPL realization schemes have been designed for different computer vision and pattern recognition tasks, and empirically substantiated to be effective in these applications. However, the investigation on its theoretical insight is still a blank. To this issue, this study attempts to provide some new theoretical understanding under the SPL scheme. Specifically, we prove that the solving strategy on SPL accords with a majorization minimization algorithm implemented on a latent objective function. Furthermore, we find that the loss function contained in this latent objective has a similar configuration with non-convex regularized penalty (NSPR) known in statistics and machine learning. Such connection inspires us discovering more intrinsic relationship between SPL regimes and NSPR forms, like SCAD, LOG and EXP. The robustness insight under SPL can then be finely explained. We also analyze the capability of SPL on its easy loss prior embedding property, and provide an insightful interpretation to the effectiveness mechanism under previous SPL variations. Besides, we design a group-partial-order loss prior, which is especially useful to weakly labeled large-scale data processing tasks. Through applying SPL with this loss prior to the FCVID dataset, which is currently one of the biggest manually annotated video dataset, our method achieves state-of-the-art performance beyond previous methods, which further helps supports the proposed theoretical arguments.

preprint2015arXiv

Detail-preserving and Content-aware Variational Multi-view Stereo Reconstruction

Accurate recovery of 3D geometrical surfaces from calibrated 2D multi-view images is a fundamental yet active research area in computer vision. Despite the steady progress in multi-view stereo reconstruction, most existing methods are still limited in recovering fine-scale details and sharp features while suppressing noises, and may fail in reconstructing regions with few textures. To address these limitations, this paper presents a Detail-preserving and Content-aware Variational (DCV) multi-view stereo method, which reconstructs the 3D surface by alternating between reprojection error minimization and mesh denoising. In reprojection error minimization, we propose a novel inter-image similarity measure, which is effective to preserve fine-scale details of the reconstructed surface and builds a connection between guided image filtering and image registration. In mesh denoising, we propose a content-aware $\ell_{p}$-minimization algorithm by adaptively estimating the $p$ value and regularization parameters based on the current input. It is much more promising in suppressing noise while preserving sharp features than conventional isotropic mesh smoothing. Experimental results on benchmark datasets demonstrate that our DCV method is capable of recovering more surface details, and obtains cleaner and more accurate reconstructions than state-of-the-art methods. In particular, our method achieves the best results among all published methods on the Middlebury dino ring and dino sparse ring datasets in terms of both completeness and accuracy.

preprint2015arXiv

FastMMD: Ensemble of Circular Discrepancy for Efficient Two-Sample Test

The maximum mean discrepancy (MMD) is a recently proposed test statistic for two-sample test. Its quadratic time complexity, however, greatly hampers its availability to large-scale applications. To accelerate the MMD calculation, in this study we propose an efficient method called FastMMD. The core idea of FastMMD is to equivalently transform the MMD with shift-invariant kernels into the amplitude expectation of a linear combination of sinusoid components based on Bochner's theorem and Fourier transform (Rahimi & Recht, 2007). Taking advantage of sampling of Fourier transform, FastMMD decreases the time complexity for MMD calculation from $O(N^2 d)$ to $O(L N d)$, where $N$ and $d$ are the size and dimension of the sample set, respectively. Here $L$ is the number of basis functions for approximating kernels which determines the approximation accuracy. For kernels that are spherically invariant, the computation can be further accelerated to $O(L N \log d)$ by using the Fastfood technique (Le et al., 2013). The uniform convergence of our method has also been theoretically proved in both unbiased and biased estimates. We have further provided a geometric explanation for our method, namely ensemble of circular discrepancy, which facilitates us to understand the insight of MMD, and is hopeful to help arouse more extensive metrics for assessing two-sample test. Experimental results substantiate that FastMMD is with similar accuracy as exact MMD, while with faster computation speed and lower variance than the existing MMD approximation methods.

preprint2015arXiv

Iterated Support Vector Machines for Distance Metric Learning

Distance metric learning aims to learn from the given training data a valid distance metric, with which the similarity between data samples can be more effectively evaluated for classification. Metric learning is often formulated as a convex or nonconvex optimization problem, while many existing metric learning algorithms become inefficient for large scale problems. In this paper, we formulate metric learning as a kernel classification problem, and solve it by iterated training of support vector machines (SVM). The new formulation is easy to implement, efficient in training, and tractable for large-scale problems. Two novel metric learning models, namely Positive-semidefinite Constrained Metric Learning (PCML) and Nonnegative-coefficient Constrained Metric Learning (NCML), are developed. Both PCML and NCML can guarantee the global optimality of their solutions. Experimental results on UCI dataset classification, handwritten digit recognition, face verification and person re-identification demonstrate that the proposed metric learning methods achieve higher classification accuracy than state-of-the-art methods and they are significantly more efficient in training.

preprint2014arXiv

Density-Based Region Search with Arbitrary Shape for Object Localization

Region search is widely used for object localization. Typically, the region search methods project the score of a classifier into an image plane, and then search the region with the maximal score. The recently proposed region search methods, such as efficient subwindow search and efficient region search, %which localize objects from the score distribution on an image are much more efficient than sliding window search. However, for some classifiers and tasks, the projected scores are nearly all positive, and hence maximizing the score of a region results in localizing nearly the entire images as objects, which is meaningless. In this paper, we observe that the large scores are mainly concentrated on or around objects. Based on this observation, we propose a method, named level set maximum-weight connected subgraph (LS-MWCS), which localizes objects with arbitrary shapes by searching regions with the densest score rather than the maximal score. The region density can be controlled by a parameter flexibly. And we prove an important property of the proposed LS-MWCS, which guarantees that the region with the densest score can be searched. Moreover, the LS-MWCS can be efficiently optimized by belief propagation. The method is evaluated on the problem of weakly-supervised object localization, and the quantitative results demonstrate the superiorities of our LS-MWCS compared to other state-of-the-art methods.

preprint2014arXiv

On the Optimal Solution of Weighted Nuclear Norm Minimization

In recent years, the nuclear norm minimization (NNM) problem has been attracting much attention in computer vision and machine learning. The NNM problem is capitalized on its convexity and it can be solved efficiently. The standard nuclear norm regularizes all singular values equally, which is however not flexible enough to fit real scenarios. Weighted nuclear norm minimization (WNNM) is a natural extension and generalization of NNM. By assigning properly different weights to different singular values, WNNM can lead to state-of-the-art results in applications such as image denoising. Nevertheless, so far the global optimal solution of WNNM problem is not completely solved yet due to its non-convexity in general cases. In this article, we study the theoretical properties of WNNM and prove that WNNM can be equivalently transformed into a quadratic programming problem with linear constraints. This implies that WNNM is equivalent to a convex problem and its global optimum can be readily achieved by off-the-shelf convex optimization solvers. We further show that when the weights are non-descending, the globally optimal solution of WNNM can be obtained in closed-form.

preprint2013arXiv

A Kernel Classification Framework for Metric Learning

Learning a distance metric from the given training samples plays a crucial role in many machine learning tasks, and various models and optimization algorithms have been proposed in the past decade. In this paper, we generalize several state-of-the-art metric learning methods, such as large margin nearest neighbor (LMNN) and information theoretic metric learning (ITML), into a kernel classification framework. First, doublets and triplets are constructed from the training samples, and a family of degree-2 polynomial kernel functions are proposed for pairs of doublets or triplets. Then, a kernel classification framework is established, which can not only generalize many popular metric learning methods such as LMNN and ITML, but also suggest new metric learning methods, which can be efficiently implemented, interestingly, by using the standard support vector machine (SVM) solvers. Two novel metric learning methods, namely doublet-SVM and triplet-SVM, are then developed under the proposed framework. Experimental results show that doublet-SVM and triplet-SVM achieve competitive classification accuracies with state-of-the-art metric learning methods such as ITML and LMNN but with significantly less training time.

preprint2013arXiv

Dictionary learning under global sparsity constraint

A new method is proposed in this paper to learn overcomplete dictionary from training data samples. Differing from the current methods that enforce similar sparsity constraint on each of the input samples, the proposed method attempts to impose global sparsity constraint on the entire data set. This enables the proposed method to fittingly assign the atoms of the dictionary to represent various samples and optimally adapt to the complicated structures underlying the entire data set. By virtue of the sparse coding and sparse PCA techniques, a simple algorithm is designed for the implementation of the method. The efficiency and the convergence of the proposed algorithm are also theoretically analyzed. Based on the experimental results implemented on a series of signal and image data sets, it is apparent that our method performs better than the current dictionary learning methods in original dictionary recovering, input data reconstructing, and salient data structure revealing.

preprint2012arXiv

A recursive divide-and-conquer approach for sparse principal component analysis

In this paper, a new method is proposed for sparse PCA based on the recursive divide-and-conquer methodology. The main idea is to separate the original sparse PCA problem into a series of much simpler sub-problems, each having a closed-form solution. By recursively solving these sub-problems in an analytical way, an efficient algorithm is constructed to solve the sparse PCA problem. The algorithm only involves simple computations and is thus easy to implement. The proposed method can also be very easily extended to other sparse PCA problems with certain constraints, such as the nonnegative sparse PCA problem. Furthermore, we have shown that the proposed algorithm converges to a stationary point of the problem, and its computational complexity is approximately linear in both data size and dimensionality. The effectiveness of the proposed method is substantiated by extensive experiments implemented on a series of synthetic and real data in both reconstruction-error-minimization and data-variance-maximization viewpoints.

preprint2012arXiv

Divide-and-Conquer Method for L1 Norm Matrix Factorization in the Presence of Outliers and Missing Data

The low-rank matrix factorization as a L1 norm minimization problem has recently attracted much attention due to its intrinsic robustness to the presence of outliers and missing data. In this paper, we propose a new method, called the divide-and-conquer method, for solving this problem. The main idea is to break the original problem into a series of smallest possible sub-problems, each involving only unique scalar parameter. Each of these subproblems is proved to be convex and has closed-form solution. By recursively optimizing these small problems in an analytical way, efficient algorithm, entirely avoiding the time-consuming numerical optimization as an inner loop, for solving the original problem can naturally be constructed. The computational complexity of the proposed algorithm is approximately linear in both data size and dimensionality, making it possible to handle large-scale L1 norm matrix factorization problems. The algorithm is also theoretically proved to be convergent. Based on a series of experiment results, it is substantiated that our method always achieves better results than the current state-of-the-art methods on $L1$ matrix factorization calculation in both computational time and accuracy, especially on large-scale applications such as face recognition and structure from motion.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Computer Vision Machine Learning eess.IV Artificial Intelligence Data Structures and Algorithms Databases Information Retrieval Multimedia Numerical Analysis physics.comp-ph physics.geo-ph

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2306.17797:author:5:deyu-meng

Imported May 21, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.14370:author:5:deyu-meng

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.13744:author:4:deyu-meng

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.13581:author:4:deyu-meng

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.17504:author:7:deyu-meng

Imported May 20, 2026Synced May 20, 2026

13 works

Qian Zhao

Researcher

Qian Zhao contributes to research discovery and scholarly infrastructure.

Open to collaborate

11 works

Zongben Xu

Researcher

Zongben Xu contributes to research discovery and scholarly infrastructure.

Open to collaborate

6 works

Lei Zhang

Researcher

Lei Zhang contributes to research discovery and scholarly infrastructure.

Open to collaborate

5 works

Jun Shu

Researcher

Jun Shu contributes to research discovery and scholarly infrastructure.

Open to collaborate

Deyu Meng

What is connected

Connect this record

See the researcher in context

Building this map preview

39 published item(s)

A Distributional View for Visual Mechanistic Interpretability: KL-Minimal Soft-Constraint Principle

Aligning Network Equivariance with Data Symmetry: A Theoretical Framework and Adaptive Approach for Image Restoration

Deciphering Neural Reparameterized Full-Waveform Inversion with Neural Sensitivity Kernel and Wave Tangent Kernel

HIR-ALIGN: Enhancing Hyperspectral Image Restoration via Diffusion-Based Data Generation

HIDFlowNet: A Flow-Based Deep Network for Hyperspectral Image Denoising

Gramformer: Learning Crowd Counting via Graph-Modulated Transformer

Revisiting Nonlocal Self-Similarity from Continuous Representation

An Efficient and Accurate Rough Set for Feature Selection, Classification and Knowledge Representation

Blind Image Super-resolution with Elaborate Degradation Modeling on Noise and Kernel

Decoupled-and-Coupled Networks: Self-Supervised Hyperspectral Image Super-Resolution with Subpixel Fusion

Diagnosing Batch Normalization in Class Incremental Learning

Low-light Image Enhancement by Retinex Based Algorithm Unrolling and Adjustment

Two-Stream Graph Convolutional Network for Intra-oral Scanner Image Segmentation

Unsupervised Local Discrimination for Medical Images

A Model-driven Deep Neural Network for Single Image Rain Removal

Ball k-means

Cross-Attention in Coupled Unmixing Nets for Unsupervised Hyperspectral Super-Resolution

Dual Adversarial Network: Toward Real-world Noise Removal and Noise Generation

Learning Adaptive Loss for Robust Learning with Noisy Labels

LT-Net: Label Transfer by Learning Reversible Voxel-wise Correspondence for One-shot Medical Image Segmentation

Meta Feature Modulator for Long-tailed Recognition

Meta Transition Adaptation for Robust Deep Learning with Noisy Labels

Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype Prediction

Structural Residual Learning for Single Image Rain Removal

A novel learning-based frame pooling method for Event Detection

Exploiting Multi-modal Curriculum in Noisy Web Data for Large-scale Concept Learning

Low-rank Matrix Factorization under General Mixture Noise Distributions

Strategies for Searching Video Content with Text Queries or Video Examples

Total Variation Regularized Tensor RPCA for Background Subtraction from Compressive Measurements

What Objective Does Self-paced Learning Indeed Optimize?

Detail-preserving and Content-aware Variational Multi-view Stereo Reconstruction

FastMMD: Ensemble of Circular Discrepancy for Efficient Two-Sample Test

Iterated Support Vector Machines for Distance Metric Learning

Density-Based Region Search with Arbitrary Shape for Object Localization

On the Optimal Solution of Weighted Nuclear Norm Minimization

A Kernel Classification Framework for Metric Learning

Dictionary learning under global sparsity constraint

A recursive divide-and-conquer approach for sparse principal component analysis

Divide-and-Conquer Method for L1 Norm Matrix Factorization in the Presence of Outliers and Missing Data