Source author record

Xiaoran Zhang

Xiaoran Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision eess.IV Machine Learning Artificial Intelligence math.AP physics.med-ph Quantitative Methods

Catalog footprint

What is connected

8works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Does DINOv3 Set a New Medical Vision Standard? Benchmarking 2D and 3D Classification, Segmentation, and Registration

The advent of large-scale vision foundation models, pre-trained on diverse natural images, has marked a paradigm shift in computer vision. However, how the frontier vision foundation models' efficacies transfer to specialised domains such as medical imaging remains an open question. This report investigates whether DINOv3, a state-of-the-art self-supervised vision transformer (ViT) pre-trained on natural images, can directly serve as a powerful, unified encoder for medical vision tasks without domain-specific fine-tuning. To answer this, we benchmark DINOv3 across common medical vision tasks, including 2D and 3D classification, segmentation, and registration on a wide range of medical imaging modalities. We systematically analyse its scalability by varying model sizes and input image resolutions. Our findings reveal that DINOv3 shows impressive performance and establishes a formidable new baseline. Remarkably, it can even outperform medical-specific foundation models like BiomedCLIP and CT-Net on several tasks, despite being trained solely on natural images. However, we identify clear limitations: The model's features degrade in scenarios requiring deep domain specialisation, such as in whole-slide images (WSIs), electron microscopy (EM), and positron emission tomography (PET). Furthermore, we observe that DINOv3 does not consistently follow the scaling law in the medical domain. Its performance does not reliably increase with larger models or finer feature resolutions, showing diverse scaling behaviours across tasks. Overall, our work establishes DINOv3 as a strong baseline, whose powerful visual features can serve as a robust prior for multiple medical tasks. This opens promising future directions, such as leveraging its features to enforce multiview consistency in 3D reconstruction.

preprint2022arXiv

An alternative proof of Tataru's dispersive estimates

The aim of this article is to give an alternative proof of Tataru's dispersive estimates for wave equations posed on the hyperbolic space. Based on the formula for the wave kernel on hyperbolic spaces, in \cite{MR2743652}, we give the proof from the perspective of Bessel potentials, by exploiting various facts about Gamma functions, modified Bessel functions, and Bessel potentials. This leads to our proof being more self-contained than that in Tataru \cite{MR1804518}.

preprint2022arXiv

Incremental Learning Meets Transfer Learning: Application to Multi-site Prostate MRI Segmentation

Many medical datasets have recently been created for medical image segmentation tasks, and it is natural to question whether we can use them to sequentially train a single model that (1) performs better on all these datasets, and (2) generalizes well and transfers better to the unknown target site domain. Prior works have achieved this goal by jointly training one model on multi-site datasets, which achieve competitive performance on average but such methods rely on the assumption about the availability of all training data, thus limiting its effectiveness in practical deployment. In this paper, we propose a novel multi-site segmentation framework called incremental-transfer learning (ITL), which learns a model from multi-site datasets in an end-to-end sequential fashion. Specifically, "incremental" refers to training sequentially constructed datasets, and "transfer" is achieved by leveraging useful information from the linear combination of embedding features on each dataset. In addition, we introduce our ITL framework, where we train the network including a site-agnostic encoder with pre-trained weights and at most two segmentation decoder heads. We also design a novel site-level incremental loss in order to generalize well on the target domain. Second, we show for the first time that leveraging our ITL training scheme is able to alleviate challenging catastrophic forgetting problems in incremental learning. We conduct experiments using five challenging benchmark datasets to validate the effectiveness of our incremental-transfer learning approach. Our approach makes minimal assumptions on computation resources and domain-specific expertise, and hence constitutes a strong starting point in multi-site medical image segmentation.

preprint2022arXiv

Learning correspondences of cardiac motion from images using biomechanics-informed modeling

Learning spatial-temporal correspondences in cardiac motion from images is important for understanding the underlying dynamics of cardiac anatomical structures. Many methods explicitly impose smoothness constraints such as the $\mathcal{L}_2$ norm on the displacement vector field (DVF), while usually ignoring biomechanical feasibility in the transformation. Other geometric constraints either regularize specific regions of interest such as imposing incompressibility on the myocardium or introduce additional steps such as training a separate network-based regularizer on physically simulated datasets. In this work, we propose an explicit biomechanics-informed prior as regularization on the predicted DVF in modeling a more generic biomechanically plausible transformation within all cardiac structures without introducing additional training complexity. We validate our methods on two publicly available datasets in the context of 2D MRI data and perform extensive experiments to illustrate the effectiveness and robustness of our proposed methods compared to other competing regularization schemes. Our proposed methods better preserve biomechanical properties by visual assessment and show advantages in segmentation performance using quantitative evaluation metrics. The code is publicly available at \url{https://github.com/Voldemort108X/bioinformed_reg}.

preprint2022arXiv

MyoPS: A Benchmark of Myocardial Pathology Segmentation Combining Three-Sequence Cardiac Magnetic Resonance Images

Assessment of myocardial viability is essential in diagnosis and treatment management of patients suffering from myocardial infarction, and classification of pathology on myocardium is the key to this assessment. This work defines a new task of medical image analysis, i.e., to perform myocardial pathology segmentation (MyoPS) combining three-sequence cardiac magnetic resonance (CMR) images, which was first proposed in the MyoPS challenge, in conjunction with MICCAI 2020. The challenge provided 45 paired and pre-aligned CMR images, allowing algorithms to combine the complementary information from the three CMR sequences for pathology segmentation. In this article, we provide details of the challenge, survey the works from fifteen participants and interpret their methods according to five aspects, i.e., preprocessing, data augmentation, learning strategy, model architecture and post-processing. In addition, we analyze the results with respect to different factors, in order to examine the key obstacles and explore potential of solutions, as well as to provide a benchmark for future research. We conclude that while promising results have been reported, the research is still in the early stage, and more in-depth exploration is needed before a successful application to the clinics. Note that MyoPS data and evaluation tool continue to be publicly available upon registration via its homepage (www.sdspeople.fudan.edu.cn/zhuangxiahai/0/myops20/).

preprint2021arXiv

Label-free virtual HER2 immunohistochemical staining of breast tissue using deep learning

The immunohistochemical (IHC) staining of the human epidermal growth factor receptor 2 (HER2) biomarker is widely practiced in breast tissue analysis, preclinical studies and diagnostic decisions, guiding cancer treatment and investigation of pathogenesis. HER2 staining demands laborious tissue treatment and chemical processing performed by a histotechnologist, which typically takes one day to prepare in a laboratory, increasing analysis time and associated costs. Here, we describe a deep learning-based virtual HER2 IHC staining method using a conditional generative adversarial network that is trained to rapidly transform autofluorescence microscopic images of unlabeled/label-free breast tissue sections into bright-field equivalent microscopic images, matching the standard HER2 IHC staining that is chemically performed on the same tissue sections. The efficacy of this virtual HER2 staining framework was demonstrated by quantitative analysis, in which three board-certified breast pathologists blindly graded the HER2 scores of virtually stained and immunohistochemically stained HER2 whole slide images (WSIs) to reveal that the HER2 scores determined by inspecting virtual IHC images are as accurate as their immunohistochemically stained counterparts. A second quantitative blinded study performed by the same diagnosticians further revealed that the virtually stained HER2 images exhibit a comparable staining quality in the level of nuclear detail, membrane clearness, and absence of staining artifacts with respect to their immunohistochemically stained counterparts. This virtual HER2 staining framework bypasses the costly, laborious, and time-consuming IHC staining procedures in laboratory, and can be extended to other types of biomarkers to accelerate the IHC tissue staining used in life sciences and biomedical workflow.

preprint2020arXiv

A Comparative Study for Non-rigid Image Registration and Rigid Image Registration

Image registration algorithms can be generally categorized into two groups: non-rigid and rigid. Recently, many deep learning-based algorithms employ a neural net to characterize non-rigid image registration function. However, do they always perform better? In this study, we compare the state-of-art deep learning-based non-rigid registration approach with rigid registration approach. The data is generated from Kaggle Dog vs Cat Competition \url{https://www.kaggle.com/c/dogs-vs-cats/} and we test the algorithms' performance on rigid transformation including translation, rotation, scaling, shearing and pixelwise non-rigid transformation. The Voxelmorph is trained on rigidset and nonrigidset separately for comparison and we also add a gaussian blur layer to its original architecture to improve registration performance. The best quantitative results in both root-mean-square error (RMSE) and mean absolute error (MAE) metrics for rigid registration are produced by SimpleElastix and non-rigid registration by Voxelmorph. We select representative samples for visual assessment.

preprint2020arXiv

Fully automated deep learning based segmentation of normal, infarcted and edema regions from multiple cardiac MRI sequences

Myocardial characterization is essential for patients with myocardial infarction and other myocardial diseases, and the assessment is often performed using cardiac magnetic resonance (CMR) sequences. In this study, we propose a fully automated approach using deep convolutional neural networks (CNN) for cardiac pathology segmentation, including left ventricular (LV) blood pool, right ventricular blood pool, LV normal myocardium, LV myocardial edema (ME) and LV myocardial scars (MS). The input to the network consists of three CMR sequences, namely, late gadolinium enhancement (LGE), T2 and balanced steady state free precession (bSSFP). The proposed approach utilized the data provided by the MyoPS challenge hosted by MICCAI 2020 in conjunction with STACOM. The training set for the CNN model consists of images acquired from 25 cases, and the gold standard labels are provided by trained raters and validated by radiologists. The proposed approach introduces a data augmentation module, linear encoder and decoder module and a network module to increase the number of training samples and improve the prediction accuracy for LV ME and MS. The proposed approach is evaluated by the challenge organizers with a test set including 20 cases and achieves a mean dice score of $46.8\%$ for LV MS and $55.7\%$ for LV ME+MS

Xiaoran Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Does DINOv3 Set a New Medical Vision Standard? Benchmarking 2D and 3D Classification, Segmentation, and Registration

An alternative proof of Tataru's dispersive estimates

Incremental Learning Meets Transfer Learning: Application to Multi-site Prostate MRI Segmentation

Learning correspondences of cardiac motion from images using biomechanics-informed modeling

MyoPS: A Benchmark of Myocardial Pathology Segmentation Combining Three-Sequence Cardiac Magnetic Resonance Images

Label-free virtual HER2 immunohistochemical staining of breast tissue using deep learning

A Comparative Study for Non-rigid Image Registration and Rigid Image Registration

Fully automated deep learning based segmentation of normal, infarcted and edema regions from multiple cardiac MRI sequences