Researcher profile

Kede Ma

Kede Ma contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

RELO: Reinforcement Learning to Localize for Visual Object Tracking

Conventional visual object trackers localize targets using handcrafted spatial priors, often in the form of heatmaps. Such priors provide only surrogate supervision and are poorly aligned with tracking optimization and evaluation metrics, such as intersection over union (IoU) and area under the success curve (AUC). Here, we introduce RELO, a REinforcement-learning-to-LOcalize method for visual object tracking that formulates target localization as a Markov decision process. Specifically, RELO replaces handcrafted spatial priors with a localization policy learned over spatial positions via reinforcement learning, with rewards combining frame-level IoU and sequence-level AUC. We additionally introduce layer-aligned temporal token propagation to improve semantic consistency across frames, with negligible computational overhead. Across multiple benchmarks, RELO achieves superior results, attaining 57.5% AUC on LaSOText without template updates. This confirms that reward-driven localization provides an effective alternative to prior-driven localization for visual object tracking.

preprint2022arXiv

Continual Learning for Blind Image Quality Assessment

The explosive growth of image data facilitates the fast development of image processing and computer vision methods for emerging visual applications, meanwhile introducing novel distortions to the processed images. This poses a grand challenge to existing blind image quality assessment (BIQA) models, failing to continually adapt to such subpopulation shift. Recent work suggests training BIQA methods on the combination of all available human-rated IQA datasets. However, this type of approach is not scalable to a large number of datasets, and is cumbersome to incorporate a newly created dataset as well. In this paper, we formulate continual learning for BIQA, where a model learns continually from a stream of IQA datasets, building on what was learned from previously seen data. We first identify five desiderata in the new setting with a measure to quantify the plasticity-stability trade-off. We then propose a simple yet effective method for learning BIQA models continually. Specifically, based on a shared backbone network, we add a prediction head for a new dataset, and enforce a regularizer to allow all prediction heads to evolve with new data while being resistant to catastrophic forgetting of old data. We compute the quality score by an adaptive weighted summation of estimates from all prediction heads. Extensive experiments demonstrate the promise of the proposed continual learning method in comparison to standard training techniques for BIQA. We made the code publicly available at https://github.com/zwx8981/BIQA_CL.

preprint2021arXiv

Exposing Semantic Segmentation Failures via Maximum Discrepancy Competition

Semantic segmentation is an extensively studied task in computer vision, with numerous methods proposed every year. Thanks to the advent of deep learning in semantic segmentation, the performance on existing benchmarks is close to saturation. A natural question then arises: Does the superior performance on the closed (and frequently re-used) test sets transfer to the open visual world with unconstrained variations? In this paper, we take steps toward answering the question by exposing failures of existing semantic segmentation methods in the open visual world under the constraint of very limited human labeling effort. Inspired by previous research on model falsification, we start from an arbitrarily large image set, and automatically sample a small image set by MAximizing the Discrepancy (MAD) between two segmentation methods. The selected images have the greatest potential in falsifying either (or both) of the two methods. We also explicitly enforce several conditions to diversify the exposed failures, corresponding to different underlying root causes. A segmentation method, whose failures are more difficult to be exposed in the MAD competition, is considered better. We conduct a thorough MAD diagnosis of ten PASCAL VOC semantic segmentation algorithms. With detailed analysis of experimental results, we point out strengths and weaknesses of the competing algorithms, as well as potential research directions for further advancement in semantic segmentation. The codes are publicly available at \url{https://github.com/QTJiebin/MAD_Segmentation}.

preprint2021arXiv

Perceptual Quality Assessment of Omnidirectional Images as Moving Camera Videos

Omnidirectional images (also referred to as static 360° panoramas) impose viewing conditions much different from those of regular 2D images. How do humans perceive image distortions in immersive virtual reality (VR) environments is an important problem which receives less attention. We argue that, apart from the distorted panorama itself, two types of VR viewing conditions are crucial in determining the viewing behaviors of users and the perceived quality of the panorama: the starting point and the exploration time. We first carry out a psychophysical experiment to investigate the interplay among the VR viewing conditions, the user viewing behaviors, and the perceived quality of 360° images. Then, we provide a thorough analysis of the collected human data, leading to several interesting findings. Moreover, we propose a computational framework for objective quality assessment of 360° images, embodying viewing conditions and behaviors in a delightful way. Specifically, we first transform an omnidirectional image to several video representations using different user viewing behaviors under different viewing conditions. We then leverage advanced 2D full-reference video quality models to compute the perceived quality. We construct a set of specific quality measures within the proposed framework, and demonstrate their promises on three VR quality databases.

preprint2020arXiv

Characterizing Generalized Rate-Distortion Performance of Video Coding: An Eigen Analysis Approach

Rate-distortion (RD) theory is at the heart of lossy data compression. Here we aim to model the generalized RD (GRD) trade-off between the visual quality of a compressed video and its encoding profiles (e.g., bitrate and spatial resolution). We first define the theoretical functional space $\mathcal{W}$ of the GRD function by analyzing its mathematical properties.We show that $\mathcal{W}$ is a convex set in a Hilbert space, inspiring a computational model of the GRD function, and a method of estimating model parameters from sparse measurements. To demonstrate the feasibility of our idea, we collect a large-scale database of real-world GRD functions, which turn out to live in a low-dimensional subspace of $\mathcal{W}$. Combining the GRD reconstruction framework and the learned low-dimensional space, we create a low-parameter eigen GRD method to accurately estimate the GRD function of a source video content from only a few queries. Experimental results on the database show that the learned GRD method significantly outperforms state-of-the-art empirical RD estimation methods both in accuracy and efficiency. Last, we demonstrate the promise of the proposed model in video codec comparison.

preprint2020arXiv

Comparison of Image Quality Models for Optimization of Image Processing Systems

The performance of objective image quality assessment (IQA) models has been evaluated primarily by comparing model predictions to human quality judgments. Perceptual datasets gathered for this purpose have provided useful benchmarks for improving IQA methods, but their heavy use creates a risk of overfitting. Here, we perform a large-scale comparison of IQA models in terms of their use as objectives for the optimization of image processing algorithms. Specifically, we use eleven full-reference IQA models to train deep neural networks for four low-level vision tasks: denoising, deblurring, super-resolution, and compression. Subjective testing on the optimized images allows us to rank the competing models in terms of their perceptual performance, elucidate their relative advantages and disadvantages in these tasks, and propose a set of desirable properties for incorporation into future IQA models.

preprint2020arXiv

Efficient and Effective Context-Based Convolutional Entropy Modeling for Image Compression

Precise estimation of the probabilistic structure of natural images plays an essential role in image compression. Despite the recent remarkable success of end-to-end optimized image compression, the latent codes are usually assumed to be fully statistically factorized in order to simplify entropy modeling. However, this assumption generally does not hold true and may hinder compression performance. Here we present context-based convolutional networks (CCNs) for efficient and effective entropy modeling. In particular, a 3D zigzag scanning order and a 3D code dividing technique are introduced to define proper coding contexts for parallel entropy decoding, both of which boil down to place translation-invariant binary masks on convolution filters of CCNs. We demonstrate the promise of CCNs for entropy modeling in both lossless and lossy image compression. For the former, we directly apply a CCN to the binarized representation of an image to compute the Bernoulli distribution of each code for entropy estimation. For the latter, the categorical distribution of each code is represented by a discretized mixture of Gaussian distributions, whose parameters are estimated by three CCNs. We then jointly optimize the CCN-based entropy model along with analysis and synthesis transforms for rate-distortion performance. Experiments on the Kodak and Tecnick datasets show that our methods powered by the proposed CCNs generally achieve comparable compression performance to the state-of-the-art while being much faster.

preprint2020arXiv

I Am Going MAD: Maximum Discrepancy Competition for Comparing Classifiers Adaptively

The learning of hierarchical representations for image classification has experienced an impressive series of successes due in part to the availability of large-scale labeled data for training. On the other hand, the trained classifiers have traditionally been evaluated on small and fixed sets of test images, which are deemed to be extremely sparsely distributed in the space of all natural images. It is thus questionable whether recent performance improvements on the excessively re-used test sets generalize to real-world natural images with much richer content variations. Inspired by efficient stimulus selection for testing perceptual models in psychophysical and physiological studies, we present an alternative framework for comparing image classifiers, which we name the MAximum Discrepancy (MAD) competition. Rather than comparing image classifiers using fixed test images, we adaptively sample a small test set from an arbitrarily large corpus of unlabeled images so as to maximize the discrepancies between the classifiers, measured by the distance over WordNet hierarchy. Human labeling on the resulting model-dependent image sets reveals the relative performance of the competing classifiers, and provides useful insights on potential ways to improve them. We report the MAD competition results of eleven ImageNet classifiers while noting that the framework is readily extensible and cost-effective to add future classifiers into the competition. Codes can be found at https://github.com/TAMU-VITA/MAD.

preprint2020arXiv

Image Quality Assessment: Unifying Structure and Texture Similarity

Objective measures of image quality generally operate by comparing pixels of a "degraded" image to those of the original. Relative to human observers, these measures are overly sensitive to resampling of texture regions (e.g., replacing one patch of grass with another). Here, we develop the first full-reference image quality model with explicit tolerance to texture resampling. Using a convolutional neural network, we construct an injective and differentiable function that transforms images to multi-scale overcomplete representations. We demonstrate empirically that the spatial averages of the feature maps in this representation capture texture appearance, in that they provide a set of sufficient statistical constraints to synthesize a wide variety of texture patterns. We then describe an image quality method that combines correlations of these spatial averages ("texture similarity") with correlations of the feature maps ("structure similarity"). The parameters of the proposed measure are jointly optimized to match human ratings of image quality, while minimizing the reported distances between subimages cropped from the same texture images. Experiments show that the optimized method explains human perceptual scores, both on conventional image quality databases, as well as on texture databases. The measure also offers competitive performance on related tasks such as texture classification and retrieval. Finally, we show that our method is relatively insensitive to geometric transformations (e.g., translation and dilation), without use of any specialized training or data augmentation. Code is available at https://github.com/dingkeyan93/DISTS.

preprint2020arXiv

Learning to Blindly Assess Image Quality in the Laboratory and Wild

Computational models for blind image quality assessment (BIQA) are typically trained in well-controlled laboratory environments with limited generalizability to realistically distorted images. Similarly, BIQA models optimized for images captured in the wild cannot adequately handle synthetically distorted images. To face the cross-distortion-scenario challenge, we develop a BIQA model and an approach of training it on multiple IQA databases (of different distortion scenarios) simultaneously. A key step in our approach is to create and combine image pairs within individual databases as the training set, which effectively bypasses the issue of perceptual scale realignment. We compute a continuous quality annotation for each pair from the corresponding human opinions, indicating the probability of one image having better perceptual quality. We train a deep neural network for BIQA over the training set of massive image pairs by minimizing the fidelity loss. Experiments on six IQA databases demonstrate that the optimized model by the proposed training strategy is effective in blindly assessing image quality in the laboratory and wild, outperforming previous BIQA methods by a large margin.

preprint2019arXiv

Intrinsic Image Popularity Assessment

The goal of research in automatic image popularity assessment (IPA) is to develop computational models that can accurately predict the potential of a social image to go viral on the Internet. Here, we aim to single out the contribution of visual content to image popularity, i.e., intrinsic image popularity. Specifically, we first describe a probabilistic method to generate massive popularity-discriminable image pairs, based on which the first large-scale image database for intrinsic IPA (I$^2$PA) is established. We then develop computational models for I$^2$PA based on deep neural networks, optimizing for ranking consistency with millions of popularity-discriminable image pairs. Experiments on Instagram and other social platforms demonstrate that the optimized model performs favorably against existing methods, exhibits reasonable generalizability on different databases, and even surpasses human-level performance on Instagram. In addition, we conduct a psychophysical experiment to analyze various aspects of human behavior in I$^2$PA.