Source author record

Ibrahim Ethem Hamamci

Ibrahim Ethem Hamamci appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Machine Learning

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Comprehensive language-image pre-training for 3D medical image understanding

Vision-language pre-training, i.e., aligning images with paired text, is a powerful paradigm to create encoders that can be directly used for tasks such as classification, retrieval, and segmentation. In the 3D medical image domain, these capabilities allow vision-language encoders (VLEs) to support radiologists by retrieving patients with similar abnormalities, predicting likelihoods of abnormality, or, with downstream adaptation, generating radiological reports. While the methodology holds promise, data availability and domain-specific hurdles limit the capabilities of current 3D VLEs. In this paper, we overcome these challenges by injecting additional supervision via a report generation objective and combining vision-language with vision-only pre-training. This allows us to leverage both image-only and paired image-text 3D datasets, increasing the total amount of data to which our model is exposed. Through these additional objectives, paired with best practices of the 3D medical imaging domain, we develop the Comprehensive Language-Image Pre-training (COLIPRI) encoder family. Our COLIPRI encoders achieve state-of-the-art performance in report generation, semantic segmentation, classification probing, and zero-shot classification. The model is available at https://huggingface.co/microsoft/colipri.

preprint2026arXiv

DeepTumorVQA: A Hierarchical 3D CT Benchmark for Stage-Wise Evaluation of Medical VLMs and Tool-Augmented Agents

Medical vision-language models (VLMs) and AI agents have made significant progress in learning to analyze and reason about clinical images. However, existing medical visual question answering (VQA) benchmarks collapse model capabilities into a single accuracy score, obscuring where and why models fail. We propose DeepTumorVQA, a hierarchical benchmark that follows the multi-stage evidence chain in tumor diagnosis and decomposes 3D CT reasoning into four stages: recognition, measurement, visual reasoning, and medical reasoning. Higher-level questions remain independently scorable, while their ground-truth evidence chains are defined over lower-level primitives. The benchmark contains 476K questions across 42 clinical subtypes on 9,262 3D CT volumes. In addition to a direct reasoning mode for VLMs, DeepTumorVQA provides tool-interaction environments for agent evaluation, where a model can call external tools, including segmentation models, measurement programs, and medical knowledge modules, before answering the question. Evaluating over 30 model configurations, we find that reliable quantitative measurement is the primary bottleneck, making later-stage visual and medical reasoning harder for VLMs, while tool augmentation substantially mitigates this issue. When tools are available, leveraging medical knowledge and tools to reason about medical images becomes a new challenge. We further show that ground-truth step-by-step tool-use traces from DeepTumorVQA can supervise agents and reduce tool-use and reasoning failures. This stage-wise progression from recognition to measurement to visual and medical reasoning provides a concrete roadmap for future medical VLM and AI agent studies. All data and code are released at https://github.com/Schuture/DeepTumorVQA.

preprint2022arXiv

AutoCOR: Autonomous Condylar Offset Ratio Calculator on TKA-Postoperative Lateral Knee X-ray

The postoperative range of motion is one of the crucial factors indicating the outcome of Total Knee Arthroplasty (TKA). Although the correlation between range of knee flexion and posterior condylar offset (PCO) is controversial in the literature, PCO maintains its importance on evaluation of TKA. Due to limitations on PCO measurement, two novel parameters, posterior condylar offset ratio (PCOR) and anterior condylar offset ratio (ACOR), were introduced. Nowadays, the calculation of PCOR and ACOR on plain lateral radiographs is done manually by orthopedic surgeons. In this regard, we developed a software, AutoCOR, to calculate PCOR and ACOR autonomously, utilizing unsupervised machine learning algorithm (k-means clustering) and digital image processing techniques. The software AutoCOR is capable of detecting the anterior/posterior edge points and anterior/posterior cortex of the femoral shaft on true postoperative lateral conventional radiographs. To test the algorithm, 50 postoperative true lateral radiographs from Istanbul Kosuyolu Medipol Hospital Database were used (32 patients). The mean PCOR was 0.984 (SD 0.235) in software results and 0.972 (SD 0.164) in ground truth values. It shows strong and significant correlation between software and ground truth values (Pearson r=0.845 p<0.0001). The mean ACOR was 0.107 (SD 0.092) in software results and 0.107 (SD 0.070) in ground truth values. It shows moderate and significant correlation between software and ground truth values (Spearman's rs=0.519 p=0.0001412). We suggest that AutoCOR is a useful tool that can be used in clinical practice.