Researcher profile

Cheng Ouyang

Cheng Ouyang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
15works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

15 published item(s)

preprint2026arXiv

Does DINOv3 Set a New Medical Vision Standard? Benchmarking 2D and 3D Classification, Segmentation, and Registration

The advent of large-scale vision foundation models, pre-trained on diverse natural images, has marked a paradigm shift in computer vision. However, how the frontier vision foundation models' efficacies transfer to specialised domains such as medical imaging remains an open question. This report investigates whether DINOv3, a state-of-the-art self-supervised vision transformer (ViT) pre-trained on natural images, can directly serve as a powerful, unified encoder for medical vision tasks without domain-specific fine-tuning. To answer this, we benchmark DINOv3 across common medical vision tasks, including 2D and 3D classification, segmentation, and registration on a wide range of medical imaging modalities. We systematically analyse its scalability by varying model sizes and input image resolutions. Our findings reveal that DINOv3 shows impressive performance and establishes a formidable new baseline. Remarkably, it can even outperform medical-specific foundation models like BiomedCLIP and CT-Net on several tasks, despite being trained solely on natural images. However, we identify clear limitations: The model's features degrade in scenarios requiring deep domain specialisation, such as in whole-slide images (WSIs), electron microscopy (EM), and positron emission tomography (PET). Furthermore, we observe that DINOv3 does not consistently follow the scaling law in the medical domain. Its performance does not reliably increase with larger models or finer feature resolutions, showing diverse scaling behaviours across tasks. Overall, our work establishes DINOv3 as a strong baseline, whose powerful visual features can serve as a robust prior for multiple medical tasks. This opens promising future directions, such as leveraging its features to enforce multiview consistency in 3D reconstruction.

preprint2026arXiv

Segmentation, Detection and Explanation: A Unified Framework for CT Appearance Reasoning

Recent progress in deep learning has significantly advanced CT image analysis, particularly for segmentation tasks. However, these advances are largely confined to image-level pattern recognition, with most methods lacking explicit anatomical or contextual reasoning. Large vision-language models introduce linguistic context into image analysis, yet most approaches typically focus on a single task, which is insufficient for clinical workflow analysis that requires multiple fine-grained types of analysis, such as anatomy detection and segmentation. In this paper, we propose a unified autoregressive framework that integrates language-guided visual reasoning into CT interpretation. Our method introduces task-routing tokens that trigger detection and segmentation heads conditioned on the hidden states of a large vision-language model, enabling coherent generation of visual outputs (e.g., masks and bounding boxes) and textual reasonings. To progressively enhance localisation accuracy and semantic clarity, we further design a "closer-look" mechanism that allows the model to perform progressive coarse-to-fine visits to regions of interest under refined fields of view. To support model training and evaluation, we curated a new multimodal CT dataset containing pixel-wise masks, bounding boxes, spatial prompts, and structured descriptions for visual objects constructed through an AI-assisted annotation process with human verification. Experiments on public benchmarks demonstrate consistent improvements over the SoTA, achieving up to 1.0% Dice on BTCV and 1.7% Dice on MosMed+, while additionally providing appearance reasoning outputs. The code and dataset will be available.

preprint2024arXiv

Parabolic Anderson model in bounded domains of recurrent metric measure spaces

A metric measure space equipped with a Dirichlet form is called recurrent if its Hausdorff dimension is less than its walk dimension. In bounded domains of such spaces we study the parabolic Anderson models \[ \partial_{t} u(t,x) = Δu(t,x) + βu(t,x) \, \dot{W}_α(t,x) \] where the noise $W_α$ is white in time and colored in space when $α>0$ while for $α=0$ it is also white in space. Both Dirichlet and Neumann boundary conditions are considered. Besides proving existence and uniqueness in the Itô sense we also get precise $L^p$ estimates for the moments and intermittency properties of the solution as a consequence. Our study reveals new exponents which are intrinsically associated to the geometry of the underlying space and the results for instance apply in metric graphs or fractals like the Sierpiński gasket for which we prove scaling invariance properties of the models.

preprint2022arXiv

Enhancing MR Image Segmentation with Realistic Adversarial Data Augmentation

The success of neural networks on medical image segmentation tasks typically relies on large labeled datasets for model training. However, acquiring and manually labeling a large medical image set is resource-intensive, expensive, and sometimes impractical due to data sharing and privacy issues. To address this challenge, we propose AdvChain, a generic adversarial data augmentation framework, aiming at improving both the diversity and effectiveness of training data for medical image segmentation tasks. AdvChain augments data with dynamic data augmentation, generating randomly chained photo-metric and geometric transformations to resemble realistic yet challenging imaging variations to expand training data. By jointly optimizing the data augmentation model and a segmentation network during training, challenging examples are generated to enhance network generalizability for the downstream task. The proposed adversarial data augmentation does not rely on generative networks and can be used as a plug-in module in general segmentation networks. It is computationally efficient and applicable for both low-shot supervised and semi-supervised learning. We analyze and evaluate the method on two MR image segmentation tasks: cardiac segmentation and prostate segmentation with limited labeled data. Results show that the proposed approach can alleviate the need for labeled data while improving model generalization ability, indicating its practical value in medical imaging applications.

preprint2022arXiv

Improved post-hoc probability calibration for out-of-domain MRI segmentation

Probability calibration for deep models is highly desirable in safety-critical applications such as medical imaging. It makes output probabilities of deep networks interpretable, by aligning prediction probability with the actual accuracy in test data. In image segmentation, well-calibrated probabilities allow radiologists to identify regions where model-predicted segmentations are unreliable. These unreliable predictions often occur to out-of-domain (OOD) images that are caused by imaging artifacts or unseen imaging protocols. Unfortunately, most previous calibration methods for image segmentation perform sub-optimally on OOD images. To reduce the calibration error when confronted with OOD images, we propose a novel post-hoc calibration model. Our model leverages the pixel susceptibility against perturbations at the local level, and the shape prior information at the global level. The model is tested on cardiac MRI segmentation datasets that contain unseen imaging artifacts and images from an unseen imaging protocol. We demonstrate reduced calibration errors compared with the state-of-the-art calibration algorithm.

preprint2022arXiv

MaxStyle: Adversarial Style Composition for Robust Medical Image Segmentation

Convolutional neural networks (CNNs) have achieved remarkable segmentation accuracy on benchmark datasets where training and test sets are from the same domain, yet their performance can degrade significantly on unseen domains, which hinders the deployment of CNNs in many clinical scenarios. Most existing works improve model out-of-domain (OOD) robustness by collecting multi-domain datasets for training, which is expensive and may not always be feasible due to privacy and logistical issues. In this work, we focus on improving model robustness using a single-domain dataset only. We propose a novel data augmentation framework called MaxStyle, which maximizes the effectiveness of style augmentation for model OOD performance. It attaches an auxiliary style-augmented image decoder to a segmentation network for robust feature learning and data augmentation. Importantly, MaxStyle augments data with improved image style diversity and hardness, by expanding the style space with noise and searching for the worst-case style composition of latent features via adversarial training. With extensive experiments on multiple public cardiac and prostate MR datasets, we demonstrate that MaxStyle leads to significantly improved out-of-distribution robustness against unseen corruptions as well as common distribution shifts across multiple, different, unseen sites and unknown image sequences under both low- and high-training data settings. The code can be found at https://github.com/cherise215/MaxStyle.

preprint2022arXiv

Parabolic Anderson model on Heisenberg groups: the Itô setting

In this note we focus our attention on a stochastic heat equation defined on the Heisenberg group $\mathbf{H}^{n}$ of order $n$. This equation is written as $\partial_t u=\frac{1}{2}Δu+u\dot{W}_α$, where $Δ$ is the hypoelliptic Laplacian on $\mathbf{H}^{n}$ and $\{\dot{W}_α; α>0\}$ is a family of Gaussian space-time noises which are white in time and have a covariance structure generated by $(-Δ)^{-α}$ in space. Our aim is threefold: (i) Give a proper description of the noise $W_α$; (ii) Prove that one can solve the stochastic heat equation in the Itô sense as soon as $α>\frac{n}{2}$; (iii) Give some basic moment estimates for the solution $u(t,x)$.

preprint2020arXiv

Density bounds for solutions to differential equations driven by Gaussian rough paths

We consider finite dimensional rough differential equations driven by centered Gaussian processes. Combining Malliavin calculus, rough paths techniques and interpolation inequalities, we establish upper bounds on the density of the corresponding solution for any fixed time $t>0$. In addition, we provide Varadhan estimates for the asymptotic behavior of the density for small noise. The emphasis is on working with general Gaussian processes with covariance function satisfying suitable abstract, checkable conditions.

preprint2020arXiv

Moment estimates for some renormalized parabolic Anderson models

The theory of regularity structures enables the definition of the following parabolic Anderson model in a very rough environment: $\partial_{t} u_{t}(x) = \frac12 Δu_{t}(x) + u_{t}(x) \, \dot W_{t}(x)$, for $t\in\mathbb{R}_{+}$ and $x\in \mathbb{R}^{d}$, where $\dot W_{t}(x)$ is a Gaussian noise whose space time covariance function is singular. In this rough context, we shall give some information about the moments of $u_{t}(x)$ when the stochastic heat equation is interpreted in the Skorohod as well as the Stratonovich sense. Of special interest is the critical case, for which one observes a blowup of moments for large times.

preprint2020arXiv

Precise Local Estimates for Differential Equations driven by Fractional Brownian Motion: Elliptic Case

This article is concerned with stochastic differential equations driven by a $d$ dimensional fractional Brownian motion with Hurst parameter $H>1/4$, understood in the rough paths sense. Whenever the coefficients of the equation satisfy a uniform ellipticity condition, we establish a sharp local estimate on the associated control distance function and a sharp local lower estimate on the density of the solution.

preprint2020arXiv

Precise Local Estimates for Differential Equations driven by Fractional Brownian Motion: Hypoelliptic Case

This article is concerned with stochastic differential equations driven by a $d$ dimensional fractional Brownian motion with Hurst parameter $H>1/4$, understood in the rough paths sense. Whenever the coefficients of the equation satisfy a uniform hypoellipticity condition, we establish a sharp local estimate on the associated control distance function and a sharp local lower estimate on the density of the solution. Our methodology relies heavily on the rough paths structure of the equation.

preprint2020arXiv

Realistic Adversarial Data Augmentation for MR Image Segmentation

Neural network-based approaches can achieve high accuracy in various medical image segmentation tasks. However, they generally require large labelled datasets for supervised learning. Acquiring and manually labelling a large medical dataset is expensive and sometimes impractical due to data sharing and privacy issues. In this work, we propose an adversarial data augmentation method for training neural networks for medical image segmentation. Instead of generating pixel-wise adversarial attacks, our model generates plausible and realistic signal corruptions, which models the intensity inhomogeneities caused by a common type of artefacts in MR imaging: bias field. The proposed method does not rely on generative networks, and can be used as a plug-in module for general segmentation networks in both supervised and semi-supervised learning. Using cardiac MR imaging we show that such an approach can improve the generalization ability and robustness of models as well as provide significant improvements in low-data scenarios.

preprint2019arXiv

Unsupervised Multi-modal Style Transfer for Cardiac MR Segmentation

In this work, we present a fully automatic method to segment cardiac structures from late-gadolinium enhanced (LGE) images without using labelled LGE data for training, but instead by transferring the anatomical knowledge and features learned on annotated balanced steady-state free precession (bSSFP) images, which are easier to acquire. Our framework mainly consists of two neural networks: a multi-modal image translation network for style transfer and a cascaded segmentation network for image segmentation. The multi-modal image translation network generates realistic and diverse synthetic LGE images conditioned on a single annotated bSSFP image, forming a synthetic LGE training set. This set is then utilized to fine-tune the segmentation network pre-trained on labelled bSSFP images, achieving the goal of unsupervised LGE image segmentation. In particular, the proposed cascaded segmentation network is able to produce accurate segmentation by taking both shape prior and image appearance into account, achieving an average Dice score of 0.92 for the left ventricle, 0.83 for the myocardium, and 0.88 for the right ventricle on the test set.