Researcher profile

Rafet Sifa

Rafet Sifa contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

Benchmark Success, Clinical Failure: When Reinforcement Learning Optimizes for Benchmarks, Not Patients

Recent Reinforcement Learning (RL) advances for Large Language Models (LLMs) have improved reasoning tasks, yet their resource-constrained application to medical imaging remains underexplored. We introduce ChexReason, a vision-language model trained via R1-style methodology (SFT followed by GRPO) using only 2,000 SFT samples, 1,000 RL samples, and a single A100 GPU. Evaluations on CheXpert and NIH benchmarks reveal a fundamental tension: GRPO recovers in-distribution performance (23% improvement on CheXpert, macro-F1 = 0.346) but degrades cross-dataset transferability (19% drop on NIH). This mirrors high-resource models like NV-Reason-CXR-3B, suggesting the issue stems from the RL paradigm rather than scale. We identify a generalization paradox where the SFT checkpoint uniquely improves on NIH before optimization, indicating teacher-guided reasoning captures more institution-agnostic features. Furthermore, cross-model comparisons show structured reasoning scaffolds benefit general-purpose VLMs but offer minimal gain for medically pre-trained models. Consequently, curated supervised fine-tuning may outperform aggressive RL for clinical deployment requiring robustness across diverse populations.

preprint2026arXiv

Knowing When Not to Predict: Self Supervised Learning and Abstention for Safer DR Screening

Self-supervised learning (SSL) is now a standard way to pretrain medical image models, but performance is still mostly judged by downstream accuracy. For safety-critical screening tasks such as diabetic retinopathy grading, this is not enough: a model must also know when its predictions are unreliable and defer uncertain cases for clinical review. In this work, we examine how the length of SSL pretraining influences calibrated confidence and confidence-based abstention. We evaluate multiple SSL checkpoints under a fixed fine-tuning protocol and assess calibrated confidence, coverage, selective accuracy, and selective macro-F1. Across datasets and data regimes, SSL pretraining improves selective prediction compared to training from scratch. Unlike prior SSL studies that primarily evaluate downstream accuracy or AUROC, we analyze how SSL pretraining duration influences confidence behavior under calibrated confidence-based abstention. However, once accuracy saturates, selective performance can still change markedly across checkpoints, and longer pretraining does not consistently improve reliability. These results underscore the importance of abstention-aware evaluation and suggest that pretraining length should be treated as an important reliability-related design choice rather than only a computational detail. Code is available at GitHub.

preprint2022arXiv

KPI-BERT: A Joint Named Entity Recognition and Relation Extraction Model for Financial Reports

We present KPI-BERT, a system which employs novel methods of named entity recognition (NER) and relation extraction (RE) to extract and link key performance indicators (KPIs), e.g. "revenue" or "interest expenses", of companies from real-world German financial documents. Specifically, we introduce an end-to-end trainable architecture that is based on Bidirectional Encoder Representations from Transformers (BERT) combining a recurrent neural network (RNN) with conditional label masking to sequentially tag entities before it classifies their relations. Our model also introduces a learnable RNN-based pooling mechanism and incorporates domain expert knowledge by explicitly filtering impossible relations. We achieve a substantially higher prediction performance on a new practical dataset of German financial reports, outperforming several strong baselines including a competing state-of-the-art span-based entity tagging approach.

preprint2022arXiv

Towards Bundle Adjustment for Satellite Imaging via Quantum Machine Learning

Given is a set of images, where all images show views of the same area at different points in time and from different viewpoints. The task is the alignment of all images such that relevant information, e.g., poses, changes, and terrain, can be extracted from the fused image. In this work, we focus on quantum methods for keypoint extraction and feature matching, due to the demanding computational complexity of these sub-tasks. To this end, k-medoids clustering, kernel density clustering, nearest neighbor search, and kernel methods are investigated and it is explained how these methods can be re-formulated for quantum annealers and gate-based quantum computers. Experimental results obtained on digital quantum emulation hardware, quantum annealers, and quantum gate computers show that classical systems still deliver superior results. However, the proposed methods are ready for the current and upcoming generations of quantum computing devices which have the potential to outperform classical systems in the near future.

preprint2022arXiv

Towards Linguistically Informed Multi-Objective Pre-Training for Natural Language Inference

We introduce a linguistically enhanced combination of pre-training methods for transformers. The pre-training objectives include POS-tagging, synset prediction based on semantic knowledge graphs, and parent prediction based on dependency parse trees. Our approach achieves competitive results on the Natural Language Inference task, compared to the state of the art. Specifically for smaller models, the method results in a significant performance boost, emphasizing the fact that intelligent pre-training can make up for fewer parameters and help building more efficient models. Combining POS-tagging and synset prediction yields the overall best results.

preprint2014arXiv

A Comparison of Methods for Player Clustering via Behavioral Telemetry

The analysis of user behavior in digital games has been aided by the introduction of user telemetry in game development, which provides unprecedented access to quantitative data on user behavior from the installed game clients of the entire population of players. Player behavior telemetry datasets can be exceptionally complex, with features recorded for a varying population of users over a temporal segment that can reach years in duration. Categorization of behaviors, whether through descriptive methods (e.g. segmention) or unsupervised/supervised learning techniques, is valuable for finding patterns in the behavioral data, and developing profiles that are actionable to game developers. There are numerous methods for unsupervised clustering of user behavior, e.g. k-means/c-means, Non-negative Matrix Factorization, or Principal Component Analysis. Although all yield behavior categorizations, interpretation of the resulting categories in terms of actual play behavior can be difficult if not impossible. In this paper, a range of unsupervised techniques are applied together with Archetypal Analysis to develop behavioral clusters from playtime data of 70,014 World of Warcraft players, covering a five year interval. The techniques are evaluated with respect to their ability to develop actionable behavioral profiles from the dataset.