Robustness of existing radiology report evaluation approaches across modalities and anatomies
Determine whether large language model–based metrics and fine-tuned small-model evaluators developed primarily for chest X-ray report evaluation are robust when applied to radiology reports from other imaging modalities and anatomical regions.
References
However, it remains unclear whether these approaches are robust when applied to reports from other modalities and anatomies.
— VERT: Reliable LLM Judges for Radiology Report Evaluation
(2604.03376 - Bologna et al., 3 Apr 2026) in Abstract