Source author record

Yixiong Liang

Yixiong Liang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence

Catalog footprint

What is connected

7works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MSD-Score: Multi-Scale Distributional Scoring for Reference-Free Image Caption Evaluation

Evaluating image captions without references remains challenging because global embedding similarity often misses fine-grained mismatches such as hallucinated objects, missing attributes, or incorrect relations. We propose MSD-Score, a reference-free metric that models image patch and text token embeddings as von Mises-Fisher mixtures on the unit hypersphere. Instead of treating each modality as a single point, MSD-Score formulates image-text matching as a multi-scale distributional scoring problem. Semantic discrepancies are quantified via a weighted bi-directional KL divergence and combined with global similarity in a multi-scale framework for both single- and multi-candidate evaluations. Extensive experiments show that MSD-Score achieves state-of-the-art correlation with human judgments among reference-free metrics. Beyond accuracy, its probabilistic formulation yields transparent and decomposable diagnostics of local grounding errors, providing a deterministic complementary signal to holistic similarity metrics and judge-based evaluators.

preprint2022arXiv

Exploring Contextual Relationships for Cervical Abnormal Cell Detection

Cervical abnormal cell detection is a challenging task as the morphological discrepancies between abnormal and normal cells are usually subtle. To determine whether a cervical cell is normal or abnormal, cytopathologists always take surrounding cells as references to identify its abnormality. To mimic these behaviors, we propose to explore contextual relationships to boost the performance of cervical abnormal cell detection. Specifically, both contextual relationships between cells and cell-to-global images are exploited to enhance features of each region of interest (RoI) proposals. Accordingly, two modules, dubbed as RoI-relationship attention module (RRAM) and global RoI attention module (GRAM), are developed and their combination strategies are also investigated. We establish a strong baseline by using Double-Head Faster R-CNN with feature pyramid network (FPN) and integrate our RRAM and GRAM into it to validate the effectiveness of the proposed modules. Experiments conducted on a large cervical cell detection dataset reveal that the introduction of RRAM and GRAM both achieves better average precision (AP) than the baseline methods. Moreover, when cascading RRAM and GRAM, our method outperforms the state-of-the-art (SOTA) methods. Furthermore, we also show the proposed feature enhancing scheme can facilitate both image-level and smear-level classification. The code and trained models are publicly available at https://github.com/CVIU-CSU/CR4CACD.

preprint2018arXiv

CNN-Based Automatic Urinary Particles Recognition

The urine sediment analysis of particles in microscopic images can assist physicians in evaluating patients with renal and urinary tract diseases. Manual urine sediment examination is labor-intensive, subjective and time-consuming, and the traditional automatic algorithms often extract the hand-crafted features for recognition. Instead of using the hand-crafted features, in this paper, we exploit CNN to learn features in an end-to-end manner to recognize the urine particles. We treat the urine particles recognition as object detection and exploit two state-of-the-art CNN-based object detection methods, Faster R-CNN and SSD, as well as their variants for urine particles recognition. We further investigate different factors involving these CNN-based object detection methods for urine particles recognition. We comprehensively evaluate these methods on a dataset consisting of 5,376 annotated images corresponding to 7 categories of urine particles, i.e., erythrocyte, leukocyte, epithelial cell, crystal, cast, mycete, epithelial nuclei, and obtain a best mAP (mean average precision) of 84.1% while taking only 72 ms per image on a NVIDIA Titan X GPU.

preprint2018arXiv

Efficient Misalignment-Robust Multi-Focus Microscopical Images Fusion

In this paper we propose a very efficient method to fuse the unregistered multi-focus microscopical images based on the speed-up robust features (SURF). Our method follows the pipeline of first registration and then fusion. However, instead of treating the registration and fusion as two completely independent stage, we propose to reuse the determinant of the approximate Hessian generated in SURF detection stage as the corresponding salient response for the final image fusion, thus it enables nearly cost-free saliency map generation. In addition, due to the adoption of SURF scale space representation, our method can generate scale-invariant saliency map which is desired for scale-invariant image fusion. We present an extensive evaluation on the dataset consisting of several groups of unregistered multi-focus 4K ultra HD microscopic images with size of 4112 x 3008. Compared with the state-of-the-art multi-focus image fusion methods, our method is much faster and achieve better results in the visual performance. Our method provides a flexible and efficient way to integrate complementary and redundant information from multiple multi-focus ultra HD unregistered images into a fused image that contains better description than any of the individual input images. Code is available at https://github.com/yiqingmy/JointRF.

preprint2018arXiv

Scale-Invariant Structure Saliency Selection for Fast Image Fusion

In this paper, we present a fast yet effective method for pixel-level scale-invariant image fusion in spatial domain based on the scale-space theory. Specifically, we propose a scale-invariant structure saliency selection scheme based on the difference-of-Gaussian (DoG) pyramid of images to build the weights or activity map. Due to the scale-invariant structure saliency selection, our method can keep both details of small size objects and the integrity information of large size objects in images. In addition, our method is very efficient since there are no complex operation involved and easy to be implemented and therefore can be used for fast high resolution images fusion. Experimental results demonstrate the proposed method yields competitive or even better results comparing to state-of-the-art image fusion methods both in terms of visual quality and objective evaluation metrics. Furthermore, the proposed method is very fast and can be used to fuse the high resolution images in real-time. Code is available at https://github.com/yiqingmy/Fusion.

preprint2011arXiv

Feature selection via simultaneous sparse approximation for person specific face verification

There is an increasing use of some imperceivable and redundant local features for face recognition. While only a relatively small fraction of them is relevant to the final recognition task, the feature selection is a crucial and necessary step to select the most discriminant ones to obtain a compact face representation. In this paper, we investigate the sparsity-enforced regularization-based feature selection methods and propose a multi-task feature selection method for building person specific models for face verification. We assume that the person specific models share a common subset of features and novelly reformulated the common subset selection problem as a simultaneous sparse approximation problem. To the best of our knowledge, it is the first time to apply the sparsity-enforced regularization methods for person specific face verification. The effectiveness of the proposed methods is verified with the challenging LFW face databases.

preprint2011arXiv

Multi-task GLOH feature selection for human age estimation

In this paper, we propose a novel age estimation method based on GLOH feature descriptor and multi-task learning (MTL). The GLOH feature descriptor, one of the state-of-the-art feature descriptor, is used to capture the age-related local and spatial information of face image. As the exacted GLOH features are often redundant, MTL is designed to select the most informative feature bins for age estimation problem, while the corresponding weights are determined by ridge regression. This approach largely reduces the dimensions of feature, which can not only improve performance but also decrease the computational burden. Experiments on the public available FG-NET database show that the proposed method can achieve comparable performance over previous approaches while using much fewer features.

Yixiong Liang

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

MSD-Score: Multi-Scale Distributional Scoring for Reference-Free Image Caption Evaluation

Exploring Contextual Relationships for Cervical Abnormal Cell Detection

CNN-Based Automatic Urinary Particles Recognition

Efficient Misalignment-Robust Multi-Focus Microscopical Images Fusion

Scale-Invariant Structure Saliency Selection for Fast Image Fusion

Feature selection via simultaneous sparse approximation for person specific face verification

Multi-task GLOH feature selection for human age estimation