A Generalist Audio Foundation Model for Comprehensive Body Sound Auscultation

Published 12 Nov 2024 in cs.SD and eess.AS | (2411.07547v2)

Abstract: Accurate and efficient auscultation-based diagnostics are vital for early disease detection, especially in resource-limited settings where specialized clinical expertise is scarce. Traditional auscultation, which heavily depends on clinician experience, suffers from significant inter-observer variability, while existing AI models often falter due to the limitations of non-representative training data. In this study, we introduce AuscultaBase, a novel AI-driven diagnostic framework that harnesses self-supervised and contrastive learning techniques alongside large-scale, multi-source data integration to advance body sound analysis. By generating robust feature representations, AuscultaBase markedly enhances performance in abnormality detection, disease classification, and activity recognition tasks. Comprehensive evaluations on our newly established benchmark, AuscultaBench, demonstrate that AuscultaBase consistently outperforms state-of-the-art methods across key performance metrics, underscoring its potential as a scalable and cost-effective tool for clinical screening and early disease intervention. The code and model checkpoint has been released in https://github.com/applewpj/AuscultaBase.