Paper detail

DALPHIN: Benchmarking Digital Pathology AI Copilots Against Pathologists on an Open Multicentric Dataset

Foundation models with visual question answering capabilities for digital pathology are emerging. Such unprecedented technology requires independent benchmarking to assess its potential in assisting pathologists in routine diagnostics. We created DALPHIN, the first multicentric open benchmark for pathology AI copilots, comprising 1236 images from 300 cases, spanning 130 rare to common diagnoses, 6 countries, and 14 subspecialties. The DALPHIN design and dataset are introduced alongside a human performance benchmark of 31 pathologists from 10 countries with varying expertise. We report results for two general-purpose (GPT-5, Gemini 2.5 Pro) and one pathology-specific copilot (PathChat+) for sequential and independent answer generation. We observed no statistically significant difference from expert-level performance in four of six tasks for PathChat, 2/6 tasks for Gemini, and 1/6 tasks for GPT. DALPHIN is publicly released with sequestered, indirectly accessible ground truth to foster robust and enduring benchmarking. Data, methods, and the evaluation platform are accessible through dalphin.grand-challenge.org.

preprint2026arXivOpen access

Carlijn Lems Sander Moonemans Natálie Klubíčková Biagio Brattoli Taebum Lee Seokhwi Kim Veronica Vilaplana Laura Pons Sapir Hochman Mauricio Eduardo Suárez-Franck Pedro Luis Fernandez Julius Drachneris Donatas Petroska Renaldas Augulis Arvydas Laurinavicius Domingos Oliveira Diana Montezuma Anouk B. Bouwmeester Dominique van Midden Anne-Marie Vos Shoko Vos Jolique van Ipenburg Maschenka Balkenhol Koen Winkler Iris Nagtegaal Konnie Hebeda Uta Flucke Katrien Grünberg Josef Skopal Brinder S. Chohan Jordi Temprana-Salvador Enrico Munari Luca Cima Giulia Querzoli Yosamin Gonzalez Belisario Jaeike W. Faber Geert J. L. H. van Leenders Jan H. von der Thüsen Lodewijk A. A. Brosens Ronald R. de Krijger Pieter Wesseling Sandrine Florquin Mateusz Maniewski Adam Kowalewski Robert Barna Dina Tiniakos Joan Lop Gros Rogier Donders Jake S. F. Maurits Ming Yang Lu Chengkuan Chen Faisal Mahmood Jeroen van der Laak Nadieh Khalili Frédérique Meeuwsen Francesco Ciompi

Computer Vision Artificial Intelligence

Open graph Reviews Discussion

Signal facts

What is known right now

Open access56 authors2 topics

Imported metadata coverageMissing code, dataset, citation and institution fields are tracked without dominating the paper.Details

Citations: 0Reviews: 0Saves: 0Code: not linkedDataset: not linkedInstitutions: 0

Next steps

Decide what to do with this paper

Like0 Dislike0Score 0

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Save to reading list0

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Authors

Institutions

No institution affiliation has been imported for this paper yet.

Add specific reaction

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

No discussion yet. The first strong comment sets the tone.

DALPHIN: Benchmarking Digital Pathology AI Copilots Against Pathologists on an Open Multicentric Dataset

What is known right now

Decide what to do with this paper

Keep the important context close to the paper

Authors

Institutions

Research map

Building this map preview

0 review(s)

0 comment(s)