Source author record

Yuheng Li

Yuheng Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Machine Learning cond-mat.mtrl-sci cond-mat.other eess.IV Graphics physics.chem-ph

Catalog footprint

What is connected

5works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

BrainDINO: A Brain MRI Foundation Model for Generalizable Clinical Representation Learning

Brain MRI underpins a wide range of neuroscientific and clinical applications, yet most learning-based methods remain task-specific and require substantial labeled data. Here we show that a single self-supervised representation can generalize across heterogeneous brain MRI endpoints. We trained BrainDINO, a self-distilled foundation model, on approximately 6.6 million unlabeled axial slices from 20 datasets encompassing broad variation in population, disease, and acquisition setting. Using a frozen encoder with lightweight task heads, BrainDINO supported transfer across tumor segmentation, neurodegenerative and neurodevelopmental conditions classification, brain age estimation, post-stroke temporal prediction, molecular status prediction, MRI sequence classification, and survival modeling. Across tasks and supervision regimes, BrainDINO consistently equaled or exceeded natural-image and MRI-specific self-supervised baselines, with particularly strong advantages under label scarcity. Representation analyses further showed anatomically organized and pathology-sensitive feature structure in the absence of task-specific supervision. Our findings indicate that large-scale slice-wise self-supervised learning can yield a unified brain MRI representation that supports diverse neuroimaging tasks without volumetric pretraining or full-network fine-tuning, establishing a scalable foundation for robust and data-efficient brain imaging analysis.

preprint2026arXiv

Quantitative Video World Model Evaluation for Geometric-Consistency

Generative video models are increasingly studied as implicit world models, yet evaluating whether they produce physically plausible 3D structure and motion remains challenging. Most existing video evaluation pipelines rely heavily on human judgment or learned graders, which can be subjective and weakly diagnostic for geometric failures. We introduce PDI-Bench (Perspective Distortion Index), a quantitative framework for auditing geometric coherence in generated videos. Given a generated clip, we obtain object-centric observations via segmentation and point tracking (e.g., SAM 2, MegaSaM, and CoTracker3), lift them to 3D world-space coordinates via monocular reconstruction, and compute a set of projective-geometry residuals capturing three failure dimensions: scale-depth alignment, 3D motion consistency, and 3D structural rigidity. To support systematic evaluation, we build PDI-Dataset, covering diverse scenarios designed to stress these geometric constraints. Across state-of-the-art video generators, PDI reveals consistent geometry-specific failure modes that are not captured by common perceptual metrics, and provides a diagnostic signal for progress toward physically grounded video generation and physical world model. Our code and dataset can be found at https://pdi-bench.github.io/.

preprint2022arXiv

GIRAFFE HD: A High-Resolution 3D-aware Generative Model

3D-aware generative models have shown that the introduction of 3D information can lead to more controllable image generation. In particular, the current state-of-the-art model GIRAFFE can control each object's rotation, translation, scale, and scene camera pose without corresponding supervision. However, GIRAFFE only operates well when the image resolution is low. We propose GIRAFFE HD, a high-resolution 3D-aware generative model that inherits all of GIRAFFE's controllable features while generating high-quality, high-resolution images ($512^2$ resolution and above). The key idea is to leverage a style-based neural renderer, and to independently generate the foreground and background to force their disentanglement while imposing consistency constraints to stitch them together to composite a coherent final image. We demonstrate state-of-the-art 3D controllable high-resolution image generation on multiple natural image datasets.

preprint2022arXiv

H$_2$O and CO$_2$ Surface Contamination of the Lithium-Stuffed Garnet

Understanding the reactivity of ubiquitous molecules on complex oxides has broad impacts in energy applications and catalysis. The garnet-type Li$_7$La$_3$Zr$_2$O$_{12}$ is a promising solid-state electrolyte for lithium(Li)-ion batteries, and it readily reacts with H$_2$O and CO$_2$ when exposed to ambient air. Such reactions form a contamination layer on Li$_7$La$_3$Zr$_2$O$_{12}$ that is detrimental to the battery operations. The strong interactions of Li$_7$La$_3$Zr$_2$O$_{12}$ with H$_2$O and CO$_2$, however, make Li$_7$La$_3$Zr$_2$O$_{12}$ a promising support to catalyze H$_2$O dissociation and CO$_2$ adsorption. Here, using first-principles calculations, we investigate the adsorption and reactions of H$_2$O and CO$_2$ on a Li$_7$La$_3$Zr$_2$O$_{12}$ surface. We show that H$_2$O reacts through the exchange of proton and Li$^{+}$ and produces metal hydroxide species. At high H$_2$O coverage, half of the H$_2$O molecules dissociate while the other half remain intact. CO$_2$ reacts with the Li$_7$La$_3$Zr$_2$O$_{12}$ surface directly to produce carbonate species. We clarify that the individual reactions of H$_2$O and CO$_2$ with Li$_7$La$_3$Zr$_2$O$_{12}$ are more thermodynamically favorable than the co-adsorption of H$_2$O and CO$_2$. Finally, we demonstrate that low temperature and high partial pressure promote the reactions of H$_2$O and CO$_2$ with Li$_7$La$_3$Zr$_2$O$_{12}$. For energy storage application of Li$_7$La$_3$Zr$_2$O$_{12}$, our study guides processing conditions to minimize surface contamination. From a catalysis point of view, our findings reveal the potential of using complex oxides, such as Li$_7$La$_3$Zr$_2$O$_{12}$ as a support for reactions requiring H$_2$O dissociation and strong CO$_2$ adsorption.

preprint2020arXiv

MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation

We present MixNMatch, a conditional generative model that learns to disentangle and encode background, object pose, shape, and texture from real images with minimal supervision, for mix-and-match image generation. We build upon FineGAN, an unconditional generative model, to learn the desired disentanglement and image generator, and leverage adversarial joint image-code distribution matching to learn the latent factor encoders. MixNMatch requires bounding boxes during training to model background, but requires no other supervision. Through extensive experiments, we demonstrate MixNMatch's ability to accurately disentangle, encode, and combine multiple factors for mix-and-match image generation, including sketch2color, cartoon2img, and img2gif applications. Our code/models/demo can be found at https://github.com/Yuheng-Li/MixNMatch