Researcher profile

Hongming Shan

Hongming Shan contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
20works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

20 published item(s)

preprint2026arXiv

Bridging Brain and Semantics: A Hierarchical Framework for Semantically Enhanced fMRI-to-Video Reconstruction

Reconstructing dynamic visual experiences as videos from functional magnetic resonance imaging (fMRI) is pivotal for advancing the understanding of neural processes. However, current fMRI-to-video reconstruction methods are hindered by a semantic gap between noisy fMRI signals and the rich content of videos, stemming from a reliance on incomplete semantic embeddings that neither capture video-specific cues (e.g., actions) nor integrate prior knowledge. To this end, we draw inspiration from the dual-pathway processing mechanism in human brain and introduce CineNeuron, a novel hierarchical framework for semantically enhanced video reconstruction from fMRI signals with two synergistic stages. First, a bottom-up semantic enrichment stage maps fMRI signals to a rich embedding space that comprehensively captures textual semantics, image contents, action concepts, and object categories. Second, a top-down memory integration stage utilizes the proposed Mixture-of-Memories method to dynamically select relevant "memories" from previously seen data and fuse them with the fMRI embedding to refine the video reconstruction. Extensive experimental results on two fMRI-to-video benchmarks demonstrate that CineNeuron surpasses state-of-the-art methods across various metrics.

preprint2026arXiv

GraphMAR: Geometry-Aware Graph Learning Framework for Spatially Adaptive CT Metal Artifact Reduction

Computed tomography (CT) metal artifact reduction (MAR) aims to reduce the severe streaking artifacts induced by metallic implants and other high-density objects. Effective MAR generally requires both accurate artifact localization and artifact removal. Sinogram-domain methods can exploit explicit geometric cues, such as metal traces, to identify metal-corrupted measurements, while requiring raw projection data, which is often unavailable in clinical and practical scenarios. Image-domain methods are more flexible and widely applicable, yet they usually lack comparable geometric guidance, limiting their ability to localize artifacts and leading to suboptimal results. To address this limitation, we propose GraphMAR, a geometry-aware learning framework for explicit artifact identification and spatially adaptive MAR in the image domain. The key idea is to introduce graph-based geometric modeling as an image-domain analogue of sinogram metal traces. Specifically, we first construct a geometric graph from the metal mask and derive a geometric density graph that coarsely localizes artifact-prone regions according to inter-implant geometry. We then design GraphMoE, a graph-routed mixture-of-experts module that builds a polar-coordinate artifact graph in feature space and adaptively routes different experts to different spatial regions for MAR. By aligning the learned routing maps with the geometric density graph, GraphMAR provides explicit and interpretable artifact localization while enabling region-adaptive artifact reduction. Experiments on both simulated and real-world datasets demonstrate that GraphMAR achieves superior MAR performance compared with existing methods. To the best of our knowledge, this is the first work to introduce graph-based modeling for CT MAR and to enable explicit artifact identification in the image domain, improving both restoration quality and interpretability.

preprint2026arXiv

MSAVBench: Towards Comprehensive and Reliable Evaluation of Multi-Shot Audio-Video Generation

Video generation is rapidly evolving from single-shot synthesis to complex multi-shot audio-video (MSAV) narratives to meet real-world demands. However, evaluating such frontier models remains a fundamental challenge. Existing benchmarks are limited in scope and data diversity, and rely on rigid evaluation pipelines, preventing systematic and reliable assessment of modern MSAV models. To bridge these gaps, we introduce MSAVBench, the first comprehensive benchmark and adaptive hybrid evaluation framework for multi-shot audio-video generation. Our benchmark spans four key dimensions, video, audio, shot, and reference, covering diverse task settings, varying shot counts of up to 15, and challenging non-realistic scenarios. Our evaluation framework improves robustness through an adaptive self-correction mechanism for shot segmentation, instance-wise rubrics for subjective metrics, and tool-grounded evidence extraction for complex judgments. Furthermore, MSAVBench achieves high alignment with human judgments, reaching a Spearman rank correlation of 91.5%. Our systematic evaluation of 19 state-of-the-art closed- and open-source models shows that current systems still struggle with director-level control and fine-grained audio-visual synchronization, while modular or agentic generation pipelines offer a promising path toward narrowing the gap between open- and closed-source models. We will release the benchmark data and evaluation code to facilitate future research.

preprint2023arXiv

CORE: Learning Consistent Ordinal REpresentations for Image Ordinal Estimation

The goal of image ordinal estimation is to estimate the ordinal label of a given image with a convolutional neural network. Existing methods are mainly based on ordinal regression and particularly focus on modeling the ordinal mapping from the feature representation of the input to the ordinal label space. However, the manifold of the resultant feature representations does not maintain the intrinsic ordinal relations of interest, which hinders the effectiveness of the image ordinal estimation. Therefore, this paper proposes learning intrinsic Consistent Ordinal REpresentations (CORE) from ordinal relations residing in groundtruth labels while encouraging the feature representations to embody the ordinal low-dimensional manifold. First, we develop an ordinal totally ordered set (toset) distribution (OTD), which can (i) model the label embeddings to inherit ordinal information and measure distances between ordered labels of samples in a neighborhood, and (ii) model the feature embeddings to infer numerical magnitude with unknown ordinal information among the features of different samples. Second, through OTD, we convert the feature representations and labels into the same embedding space for better alignment, and then compute the Kullback Leibler (KL) divergence between the ordinal labels and feature representations to endow the latent space with consistent ordinal relations. Third, we optimize the KL divergence through ordinal prototype-constrained convex programming with dual decomposition; our theoretical analysis shows that we can obtain the optimal solutions via gradient backpropagation. Extensive experimental results demonstrate that the proposed CORE can accurately construct an ordinal latent space and significantly enhance existing deep ordinal regression methods to achieve better results.

preprint2022arXiv

Meta Ordinal Regression Forest for Medical Image Classification with Ordinal Labels

The performance of medical image classification has been enhanced by deep convolutional neural networks (CNNs), which are typically trained with cross-entropy (CE) loss. However, when the label presents an intrinsic ordinal property in nature, e.g., the development from benign to malignant tumor, CE loss cannot take into account such ordinal information to allow for better generalization. To improve model generalization with ordinal information, we propose a novel meta ordinal regression forest (MORF) method for medical image classification with ordinal labels, which learns the ordinal relationship through the combination of convolutional neural network and differential forest in a meta-learning framework. The merits of the proposed MORF come from the following two components: a tree-wise weighting net (TWW-Net) and a grouped feature selection (GFS) module. First, the TWW-Net assigns each tree in the forest with a specific weight that is mapped from the classification loss of the corresponding tree. Hence, all the trees possess varying weights, which is helpful for alleviating the tree-wise prediction variance. Second, the GFS module enables a dynamic forest rather than a fixed one that was previously used, allowing for random feature perturbation. During training, we alternatively optimize the parameters of the CNN backbone and TWW-Net in the meta-learning framework through calculating the Hessian matrix. Experimental results on two medical image classification datasets with ordinal labels, i.e., LIDC-IDRI and Breast Ultrasound Dataset, demonstrate the superior performances of our MORF method over existing state-of-the-art methods.

preprint2021arXiv

Convolutional Ordinal Regression Forest for Image Ordinal Estimation

Image ordinal estimation is to predict the ordinal label of a given image, which can be categorized as an ordinal regression problem. Recent methods formulate an ordinal regression problem as a series of binary classification problems. Such methods cannot ensure that the global ordinal relationship is preserved since the relationships among different binary classifiers are neglected. We propose a novel ordinal regression approach, termed Convolutional Ordinal Regression Forest or CORF, for image ordinal estimation, which can integrate ordinal regression and differentiable decision trees with a convolutional neural network for obtaining precise and stable global ordinal relationships. The advantages of the proposed CORF are twofold. First, instead of learning a series of binary classifiers \emph{independently}, the proposed method aims at learning an ordinal distribution for ordinal regression by optimizing those binary classifiers \emph{simultaneously}. Second, the differentiable decision trees in the proposed CORF can be trained together with the ordinal distribution in an end-to-end manner. The effectiveness of the proposed CORF is verified on two image ordinal estimation tasks, i.e. facial age estimation and image aesthetic assessment, showing significant improvements and better stability over the state-of-the-art ordinal regression methods.

preprint2021arXiv

DU-GAN: Generative Adversarial Networks with Dual-Domain U-Net Based Discriminators for Low-Dose CT Denoising

LDCT has drawn major attention in the medical imaging field due to the potential health risks of CT-associated X-ray radiation to patients. Reducing the radiation dose, however, decreases the quality of the reconstructed images, which consequently compromises the diagnostic performance. Various deep learning techniques have been introduced to improve the image quality of LDCT images through denoising. GANs-based denoising methods usually leverage an additional classification network, i.e. discriminator, to learn the most discriminate difference between the denoised and normal-dose images and, hence, regularize the denoising model accordingly; it often focuses either on the global structure or local details. To better regularize the LDCT denoising model, this paper proposes a novel method, termed DU-GAN, which leverages U-Net based discriminators in the GANs framework to learn both global and local difference between the denoised and normal-dose images in both image and gradient domains. The merit of such a U-Net based discriminator is that it can not only provide the per-pixel feedback to the denoising network through the outputs of the U-Net but also focus on the global structure in a semantic level through the middle layer of the U-Net. In addition to the adversarial training in the image domain, we also apply another U-Net based discriminator in the image gradient domain to alleviate the artifacts caused by photon starvation and enhance the edge of the denoised CT images. Furthermore, the CutMix technique enables the per-pixel outputs of the U-Net based discriminator to provide radiologists with a confidence map to visualize the uncertainty of the denoised results, facilitating the LDCT-based screening and diagnosis. Extensive experiments on the simulated and real-world datasets demonstrate superior performance over recently published methods both qualitatively and quantitatively.

preprint2021arXiv

Meta ordinal weighting net for improving lung nodule classification

The progression of lung cancer implies the intrinsic ordinal relationship of lung nodules at different stages-from benign to unsure then to malignant. This problem can be solved by ordinal regression methods, which is between classification and regression due to its ordinal label. However, existing convolutional neural network (CNN)-based ordinal regression methods only focus on modifying classification head based on a randomly sampled mini-batch of data, ignoring the ordinal relationship resided in the data itself. In this paper, we propose a Meta Ordinal Weighting Network (MOW-Net) to explicitly align each training sample with a meta ordinal set (MOS) containing a few samples from all classes. During the training process, the MOW-Net learns a mapping from samples in MOS to the corresponding class-specific weight. In addition, we further propose a meta cross-entropy (MCE) loss to optimize the network in a meta-learning scheme. The experimental results demonstrate that the MOW-Net achieves better accuracy than the state-of-the-art ordinal regression methods, especially for the unsure class.

preprint2021arXiv

RoutingGAN: Routing Age Progression and Regression with Disentangled Learning

Although impressive results have been achieved for age progression and regression, there remain two major issues in generative adversarial networks (GANs)-based methods: 1) conditional GANs (cGANs)-based methods can learn various effects between any two age groups in a single model, but are insufficient to characterize some specific patterns due to completely shared convolutions filters; and 2) GANs-based methods can, by utilizing several models to learn effects independently, learn some specific patterns, however, they are cumbersome and require age label in advance. To address these deficiencies and have the best of both worlds, this paper introduces a dropout-like method based on GAN~(RoutingGAN) to route different effects in a high-level semantic feature space. Specifically, we first disentangle the age-invariant features from the input face, and then gradually add the effects to the features by residual routers that assign the convolution filters to different age groups by dropping out the outputs of others. As a result, the proposed RoutingGAN can simultaneously learn various effects in a single model, with convolution filters being shared in part to learn some specific effects. Experimental results on two benchmarked datasets demonstrate superior performance over existing methods both qualitatively and quantitatively.

preprint2021arXiv

When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework

To minimize the effects of age variation in face recognition, previous work either extracts identity-related discriminative features by minimizing the correlation between identity- and age-related features, called age-invariant face recognition (AIFR), or removes age variation by transforming the faces of different age groups into the same age group, called face age synthesis (FAS); however, the former lacks visual results for model interpretation while the latter suffers from artifacts compromising downstream recognition. Therefore, this paper proposes a unified, multi-task framework to jointly handle these two tasks, termed MTLFace, which can learn age-invariant identity-related representation while achieving pleasing face synthesis. Specifically, we first decompose the mixed face feature into two uncorrelated components -- identity- and age-related feature -- through an attention mechanism, and then decorrelate these two components using multi-task training and continuous domain adaption. In contrast to the conventional one-hot encoding that achieves group-level FAS, we propose a novel identity conditional module to achieve identity-level FAS, with a weight-sharing strategy to improve the age smoothness of synthesized faces. In addition, we collect and release a large cross-age face dataset with age and gender annotations to advance the development of the AIFR and FAS. Extensive experiments on five benchmark cross-age datasets demonstrate the superior performance of our proposed MTLFace over existing state-of-the-art methods for AIFR and FAS. We further validate MTLFace on two popular general face recognition datasets, showing competitive performance for face recognition in the wild. The source code and dataset are available at~\url{https://github.com/Hzzone/MTLFace}.

preprint2020arXiv

Cine Cardiac MRI Motion Artifact Reduction Using a Recurrent Neural Network

Cine cardiac magnetic resonance imaging (MRI) is widely used for diagnosis of cardiac diseases thanks to its ability to present cardiovascular features in excellent contrast. As compared to computed tomography (CT), MRI, however, requires a long scan time, which inevitably induces motion artifacts and causes patients' discomfort. Thus, there has been a strong clinical motivation to develop techniques to reduce both the scan time and motion artifacts. Given its successful applications in other medical imaging tasks such as MRI super-resolution and CT metal artifact reduction, deep learning is a promising approach for cardiac MRI motion artifact reduction. In this paper, we propose a recurrent neural network to simultaneously extract both spatial and temporal features from under-sampled, motion-blurred cine cardiac images for improved image quality. The experimental results demonstrate substantially improved image quality on two clinical test datasets. Also, our method enables data-driven frame interpolation at an enhanced temporal resolution. Compared with existing methods, our deep learning approach gives a superior performance in terms of structural similarity (SSIM) and peak signal-to-noise ratio (PSNR).

preprint2020arXiv

Low-dimensional Manifold Constrained Disentanglement Network for Metal Artifact Reduction

Deep neural network based methods have achieved promising results for CT metal artifact reduction (MAR), most of which use many synthesized paired images for training. As synthesized metal artifacts in CT images may not accurately reflect the clinical counterparts, an artifact disentanglement network (ADN) was proposed with unpaired clinical images directly, producing promising results on clinical datasets. However, without sufficient supervision, it is difficult for ADN to recover structural details of artifact-affected CT images based on adversarial losses only. To overcome these problems, here we propose a low-dimensional manifold (LDM) constrained disentanglement network (DN), leveraging the image characteristics that the patch manifold is generally low-dimensional. Specifically, we design an LDM-DN learning algorithm to empower the disentanglement network through optimizing the synergistic network loss functions while constraining the recovered images to be on a low-dimensional patch manifold. Moreover, learning from both paired and unpaired data, an efficient hybrid optimization scheme is proposed to further improve the MAR performance on clinical datasets. Extensive experiments demonstrate that the proposed LDM-DN approach can consistently improve the MAR performance in paired and/or unpaired learning settings, outperforming competing methods on synthesized and clinical datasets.

preprint2020arXiv

Meta Ordinal Regression Forest For Learning with Unsure Lung Nodules

Deep learning-based methods have achieved promising performance in early detection and classification of lung nodules, most of which discard unsure nodules and simply deal with a binary classification -- malignant vs benign. Recently, an unsure data model (UDM) was proposed to incorporate those unsure nodules by formulating this problem as an ordinal regression, showing better performance over traditional binary classification. To further explore the ordinal relationship for lung nodule classification, this paper proposes a meta ordinal regression forest (MORF), which improves upon the state-of-the-art ordinal regression method, deep ordinal regression forest (DORF), in three major ways. First, MORF can alleviate the biases of the predictions by making full use of deep features while DORF needs to fix the composition of decision trees before training. Second, MORF has a novel grouped feature selection (GFS) module to re-sample the split nodes of decision trees. Last, combined with GFS, MORF is equipped with a meta learning-based weighting scheme to map the features selected by GFS to tree-wise weights while DORF assigns equal weights for all trees. Experimental results on the LIDC-IDRI dataset demonstrate superior performance over existing methods, including the state-of-the-art DORF.

preprint2020arXiv

Ordinal Distribution Regression for Gait-based Age Estimation

Computer vision researchers prefer to estimate age from face images because facial features provide useful information. However, estimating age from face images becomes challenging when people are distant from the camera or occluded. A person's gait is a unique biometric feature that can be perceived efficiently even at a distance. Thus, gait can be used to predict age when face images are not available. However, existing gait-based classification or regression methods ignore the ordinal relationship of different ages, which is an important clue for age estimation. This paper proposes an ordinal distribution regression with a global and local convolutional neural network for gait-based age estimation. Specifically, we decompose gait-based age regression into a series of binary classifications to incorporate the ordinal age information. Then, an ordinal distribution loss is proposed to consider the inner relationships among these classifications by penalizing the distribution discrepancy between the estimated value and the ground truth. In addition, our neural network comprises a global and three local sub-networks, and thus, is capable of learning the global structure and local details from the head, body, and feet. Experimental results indicate that the proposed approach outperforms state-of-the-art gait-based age estimation methods on the OULP-Age dataset.

preprint2020arXiv

Parameter-Transferred Wasserstein Generative Adversarial Network (PT-WGAN) for Low-Dose PET Image Denoising

Due to the widespread use of positron emission tomography (PET) in clinical practice, the potential risk of PET-associated radiation dose to patients needs to be minimized. However, with the reduction in the radiation dose, the resultant images may suffer from noise and artifacts that compromise diagnostic performance. In this paper, we propose a parameter-transferred Wasserstein generative adversarial network (PT-WGAN) for low-dose PET image denoising. The contributions of this paper are twofold: i) a PT-WGAN framework is designed to denoise low-dose PET images without compromising structural details, and ii) a task-specific initialization based on transfer learning is developed to train PT-WGAN using trainable parameters transferred from a pretrained model, which significantly improves the training efficiency of PT-WGAN. The experimental results on clinical data show that the proposed network can suppress image noise more effectively while preserving better image fidelity than recently published state-of-the-art methods. We make our code available at https://github.com/90n9-yu/PT-WGAN.

preprint2020arXiv

PFA-GAN: Progressive Face Aging with Generative Adversarial Network

Face aging is to render a given face to predict its future appearance, which plays an important role in the information forensics and security field as the appearance of the face typically varies with age. Although impressive results have been achieved with conditional generative adversarial networks (cGANs), the existing cGANs-based methods typically use a single network to learn various aging effects between any two different age groups. However, they cannot simultaneously meet three essential requirements of face aging -- including image quality, aging accuracy, and identity preservation -- and usually generate aged faces with strong ghost artifacts when the age gap becomes large. Inspired by the fact that faces gradually age over time, this paper proposes a novel progressive face aging framework based on generative adversarial network (PFA-GAN) to mitigate these issues. Unlike the existing cGANs-based methods, the proposed framework contains several sub-networks to mimic the face aging process from young to old, each of which only learns some specific aging effects between two adjacent age groups. The proposed framework can be trained in an end-to-end manner to eliminate accumulative artifacts and blurriness. Moreover, this paper introduces an age estimation loss to take into account the age distribution for an improved aging accuracy, and proposes to use the Pearson correlation coefficient as an evaluation metric measuring the aging smoothness for face aging methods. Extensively experimental results demonstrate superior performance over existing (c)GANs-based methods, including the state-of-the-art one, on two benchmarked datasets. The source code is available at~\url{https://github.com/Hzzone/PFA-GAN}.

preprint2019arXiv

A Method of Rapid Quantification of Patient-Specific Organ Dose for CT Using Coupled Deep-Learning based Multi-Organ Segmentation and GPU-accelerated Monte Carlo Dose Computing

Purpose: This paper describes a new method to apply deep-learning algorithms for automatic segmentation of radiosensitive organs from 3D tomographic CT images before computing organ doses using a GPU-based Monte Carlo code. Methods: A deep convolutional neural network (CNN) for organ segmentation is trained to automatically delineate radiosensitive organs from CT. With a GPU-based Monte Carlo dose engine (ARCHER) to derive CT dose of a phantom made from a subject's CT scan, we are then able to compute the patient-specific CT dose for each of the segmented organs. The developed tool is validated by using Relative Dose Error (RDE) against the organ doses calculated by ARCHER with manual segmentation performed by radiologists. The dose computation results are also compared against organ doses from population-average phantoms to demonstrate the improvement achieved by using the developed tool. In this study, two datasets were used: The Lung CT Segmentation Challenge 2017 (LCTSC) dataset, which contains 60 thoracic CT scan patients each with 5 segmented organs, and the Pancreas-CT (PCT) dataset, which contains 43 abdominal CT scan patients each with 8 segmented organs. Five-fold cross-validation of the new method is performed on both datasets. Results: Comparing with the traditional organ dose evaluation method that based on population-average phantom, our proposed method achieved the smaller RDE range on all organs with -4.3%~1.5% vs -31.5%~33.9% (lung), -7.0%~2.3% vs -15.2%~125.1% (heart), -18.8%~40.2% vs -10.3%~124.1% (esophagus) in the LCTSC dataset and -5.6%~1.6% vs -20.3%~57.4% (spleen), -4.5%~4.6% vs -19.5%~61.0% (pancreas), -2.3%~4.4% vs -37.8%~75.8% (left kidney), -14.9%~5.4% vs -39.9% ~14.6% (gall bladder), -0.9%~1.6% vs -30.1%~72.5% (liver), and -23.0%~11.1% vs -52.5%~-1.3% (stomach) in the PCT dataset.

preprint2019arXiv

Look globally, age locally: Face aging with an attention mechanism

Face aging is of great importance for cross-age recognition and entertainment-related applications. Recently, conditional generative adversarial networks (cGANs) have achieved impressive results for face aging. Existing cGANs-based methods usually require a pixel-wise loss to keep the identity and background consistent. However, minimizing the pixel-wise loss between the input and synthesized images likely resulting in a ghosted or blurry face. To address this deficiency, this paper introduces an Attention Conditional GANs (AcGANs) approach for face aging, which utilizes attention mechanism to only alert the regions relevant to face aging. In doing so, the synthesized face can well preserve the background information and personal identity without using the pixel-wise loss, and the ghost artifacts and blurriness can be significantly reduced. Based on the benchmarked dataset Morph, both qualitative and quantitative experiment results demonstrate superior performance over existing algorithms in terms of image quality, personal identity, and age accuracy.

preprint2019arXiv

MRI Super-Resolution with Ensemble Learning and Complementary Priors

Magnetic resonance imaging (MRI) is a widely used medical imaging modality. However, due to the limitations in hardware, scan time, and throughput, it is often clinically challenging to obtain high-quality MR images. The super-resolution approach is potentially promising to improve MR image quality without any hardware upgrade. In this paper, we propose an ensemble learning and deep learning framework for MR image super-resolution. In our study, we first enlarged low resolution images using 5 commonly used super-resolution algorithms and obtained differentially enlarged image datasets with complementary priors. Then, a generative adversarial network (GAN) is trained with each dataset to generate super-resolution MR images. Finally, a convolutional neural network is used for ensemble learning that synergizes the outputs of GANs into the final MR super-resolution images. According to our results, the ensemble learning results outcome any one of GAN outputs. Compared with some state-of-the-art deep learning-based super-resolution methods, our approach is advantageous in suppressing artifacts and keeping more image details.

preprint2019arXiv

Multi-Contrast Super-Resolution MRI Through a Progressive Network

Magnetic resonance imaging (MRI) is widely used for screening, diagnosis, image-guided therapy, and scientific research. A significant advantage of MRI over other imaging modalities such as computed tomography (CT) and nuclear imaging is that it clearly shows soft tissues in multi-contrasts. Compared with other medical image super-resolution (SR) methods that are in a single contrast, multi-contrast super-resolution studies can synergize multiple contrast images to achieve better super-resolution results. In this paper, we propose a one-level non-progressive neural network for low up-sampling multi-contrast super-resolution and a two-level progressive network for high up-sampling multi-contrast super-resolution. Multi-contrast information is combined in high-level feature space. Our experimental results demonstrate that the proposed networks can produce MRI super-resolution images with good image quality and outperform other multi-contrast super-resolution methods in terms of structural similarity and peak signal-to-noise ratio. Also, the progressive network produces a better SR image quality than the non-progressive network, even if the original low-resolution images were highly down-sampled.