RAD: A Dataset and Benchmark for Real-Life Anomaly Detection with Robotic Observations
Anomaly detection is a core capability for robotic perception and industrial inspection, yet most existing benchmarks are collected under controlled conditions with fixed viewpoints and stable illumination, failing to reflect real deployment scenarios. We introduce RAD (Realistic Anomaly Detection), a robot-captured, multi-view dataset designed to stress pose variation, reflective materials, and viewpoint-dependent defect visibility. RAD covers 13 everyday object categories and four realistic defect types--scratched, missing, stained, and squeezed--captured from over 60 robot viewpoints per object under uncontrolled lighting. We benchmark a wide range of state-of-the-art approaches, including 2D feature-based methods, 3D reconstruction pipelines, and vision-language models (VLMs), under a pose-agnostic setting. Surprisingly, we find that mature 2D feature-embedding methods consistently outperform recent 3D and VLM-based approaches at the image level, while the performance gap narrows for pixel-level localization. Our analysis reveals that reflective surfaces, geometric symmetry, and sparse viewpoint coverage fundamentally limit current geometry-based and zero-shot methods. RAD establishes a challenging and realistic benchmark for robotic anomaly detection, highlighting critical open problems beyond controlled laboratory settings.