Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Tutorial on Clinical Speech AI Development: From Data Collection to Model Validation

Published 29 Oct 2024 in eess.AS, cs.AI, and cs.SD | (2410.21640v1)

Abstract: There has been a surge of interest in leveraging speech as a marker of health for a wide spectrum of conditions. The underlying premise is that any neurological, mental, or physical deficits that impact speech production can be objectively assessed via automated analysis of speech. Recent advances in speech-based AI models for diagnosing and tracking mental health, cognitive, and motor disorders often use supervised learning, similar to mainstream speech technologies like recognition and verification. However, clinical speech AI has distinct challenges, including the need for specific elicitation tasks, small available datasets, diverse speech representations, and uncertain diagnostic labels. As a result, application of the standard supervised learning paradigm may lead to models that perform well in controlled settings but fail to generalize in real-world clinical deployments. With translation into real-world clinical scenarios in mind, this tutorial paper provides an overview of the key components required for robust development of clinical speech AI. Specifically, this paper will cover the design of speech elicitation tasks and protocols most appropriate for different clinical conditions, collection of data and verification of hardware, development and validation of speech representations designed to measure clinical constructs of interest, development of reliable and robust clinical prediction models, and ethical and participant considerations for clinical speech AI. The goal is to provide comprehensive guidance on building models whose inputs and outputs link to the more interpretable and clinically meaningful aspects of speech, that can be interrogated and clinically validated on clinical datasets, and that adhere to ethical, privacy, and security considerations by design.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (256)
  1. V. Ramanarayanan, A. C. Lammert, H. P. Rowe, T. F. Quatieri, and J. R. Green, “Speech as a biomarker: Opportunities, interpretability, and challenges,” Perspectives of the ASHA Special Interest Groups, vol. 7, no. 1, pp. 276–283, 2022.
  2. M. Faurholt-Jepsen, J. Busk, M. Frost, M. Vinberg, E. M. Christensen, O. Winther, J. E. Bardram, and L. V. Kessing, “Voice analysis as an objective state marker in bipolar disorder,” Translational psychiatry, vol. 6, no. 7, pp. e856–e856, 2016.
  3. V. Rapcan, S. D’Arcy, S. Yeap, N. Afzal, J. Thakore, and R. B. Reilly, “Acoustic and temporal analysis of speech: A potential biomarker for schizophrenia,” Medical engineering & physics, vol. 32, no. 9, pp. 1074–1079, 2010.
  4. S. Luz, F. Haider, S. de la Fuente, D. Fromm, and B. MacWhinney, “Detecting cognitive decline using speech only: The adresso challenge,” in INTERSPEECH 2021.   ISCA, 2021.
  5. J. C. Vásquez-Correa, T. Arias-Vergara, J. R. Orozco-Arroyave, B. Eskofier, J. Klucken, and E. Nöth, “Multimodal assessment of parkinson’s disease: a deep learning approach,” IEEE journal of biomedical and health informatics, vol. 23, no. 4, pp. 1618–1630, 2018.
  6. S. Quintas, M. Balaguer, J. Mauclair, V. Woisard, and J. Pinquier, “Automatic modelling of perceptual judges in the context of head and neck cancer speech intelligibility,” International Journal of Language & Communication Disorders.
  7. G. Stegmann, S. Charles, J. Liss, J. Shefner, S. Rutkove, and V. Berisha, “A speech-based prognostic model for dysarthria progression in als,” Amyotrophic Lateral Sclerosis and Frontotemporal Degeneration, vol. 24, no. 7-8, pp. 599–604, 2023.
  8. H. Martens, T. Dekens, G. Van Nuffelen, L. Latacz, W. Verhelst, and M. De Bodt, “Automated speech rate measurement in dysarthria,” Journal of Speech, Language, and Hearing Research, vol. 58, no. 3, pp. 698–712, 2015.
  9. N. Cummins, S. Scherer, J. Krajewski, S. Schnieder, J. Epps, and T. F. Quatieri, “A review of depression and suicide risk assessment using speech analysis,” Speech communication, vol. 71, pp. 10–49, 2015.
  10. N. R. Benway and J. L. Preston, “Artificial intelligence–assisted speech therapy for //: A single-case experimental study,” American Journal of Speech-Language Pathology, vol. 33, no. 5, pp. 2461–2486, 2024.
  11. S.-I. Ng, C. W.-Y. Ng, J. Wang, and T. Lee, “Automatic detection of speech sound disorder in cantonese-speaking pre-school children,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 4355–4368, 2024.
  12. R. Voleti, S. M. Woolridge, J. M. Liss, M. Milanovic, G. Stegmann, S. Hahn, P. D. Harvey, T. L. Patterson, C. R. Bowie, and V. Berisha, “Language analytics for assessment of mental health status and functional competency,” Schizophrenia bulletin, vol. 49, no. Supplement_2, pp. S183–S195, 2023.
  13. G. Bedi, F. Carrillo, G. A. Cecchi, D. F. Slezak, M. Sigman, N. B. Mota, S. Ribeiro, D. C. Javitt, M. Copelli, and C. M. Corcoran, “Automated analysis of free speech predicts psychosis onset in high-risk youths,” npj Schizophrenia, vol. 1, no. 1, pp. 1–7, 2015.
  14. S. De la Fuente Garcia, C. W. Ritchie, and S. Luz, “Artificial intelligence, speech, and language processing approaches to monitoring alzheimer’s disease: a systematic review,” Journal of Alzheimer’s Disease, vol. 78, no. 4, pp. 1547–1574, 2020.
  15. L. Moro-Velazquez, J. A. Gomez-Garcia, J. D. Arias-Londoño, N. Dehak, and J. I. Godino-Llorente, “Advances in parkinson’s disease detection and assessment using voice and speech: A review of the articulatory and phonatory aspects,” Biomedical Signal Processing and Control, vol. 66, p. 102418, 2021.
  16. L. Deng and X. Li, “Machine learning paradigms for speech recognition: An overview,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 5, pp. 1060–1089, 2013.
  17. M. Shi, G. Cheung, and S. R. Shahamiri, “Speech and language processing with deep learning for dementia diagnosis: A systematic review,” Psychiatry Research, p. 115538, 2023.
  18. R. Ardila, M. Branson, K. Davis, M. Kohler, J. Meyer, M. Henretty, R. Morais, L. Saunders, F. Tyers, and G. Weber, “Common voice: A massively-multilingual speech corpus,” in Proceedings of the Twelfth Language Resources and Evaluation Conference, N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, and S. Piperidis, Eds.   Marseille, France: European Language Resources Association, May 2020, pp. 4218–4222.
  19. V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: an asr corpus based on public domain audio books,” in 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP).   IEEE, 2015, pp. 5206–5210.
  20. M. R. Arbabshirani, S. Plis, J. Sui, and V. D. Calhoun, “Single subject prediction of brain disorders in neuroimaging: Promises and pitfalls,” Neuroimage, vol. 145, pp. 137–165, 2017.
  21. A. Vabalas, E. Gowen, E. Poliakoff, and A. J. Casson, “Machine learning algorithm validation with a limited sample size,” PloS one, vol. 14, no. 11, p. e0224365, 2019.
  22. V. Berisha, C. Krantsevich, G. Stegmann, S. Hahn, and J. Liss, “Are reported accuracies in the clinical speech machine learning literature overoptimistic?” in Interspeech 2022, 2022, pp. 2453–2457.
  23. T. Viering and M. Loog, “The shape of learning curves: a review,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 6, pp. 7799–7819, 2022.
  24. S. Kapoor and A. Narayanan, “Leakage and the reproducibility crisis in machine-learning-based science,” Patterns, vol. 4, no. 9, 2023.
  25. S. Coretta, J. V. Casillas, S. Roessig, M. Franke, B. Ahn, A. H. Al-Hoorie, J. Al-Tamimi, N. E. Alotaibi, M. K. AlShakhori, R. M. Altmiller et al., “Multidimensional signals and analytic flexibility: Estimating degrees of freedom in human-speech analyses,” Advances in Methods and Practices in Psychological Science, vol. 6, no. 3, p. 25152459231162567, 2023.
  26. Food, D. Administration et al., “Proposed regulatory framework for modifications to artificial intelligence/machine learning (ai/ml)-based software as a medical device (samd),” 2019.
  27. J. C. Goldsack, A. Coravos, J. P. Bakker, B. Bent, A. V. Dowling, C. Fitzer-Attas, A. Godfrey, J. G. Godino, N. Gujar, E. Izmailova et al., “Verification, analytical validation, and clinical validation (v3): the foundation of determining fit-for-purpose for biometric monitoring technologies (biomets),” npj digital Medicine, vol. 3, no. 1, p. 55, 2020.
  28. J. Moll, R. de Oliveira-Souza, and P. J. Eslinger, “Morals and the human brain: a working model,” Neuroreport, vol. 14, no. 3, pp. 299–305, 2003.
  29. S. Sapir, L. O. Ramig, and C. Fox, “Voice, speech, and swallowing disorders,” in Handbook of Parkinson’s disease.   CRC Press, 2007, pp. 469–492.
  30. P. D. Harvey, “Speech competence in manic and schizophrenic psychoses: the association between clinically rated thought disorder and cohesion and reference performance.” Journal of Abnormal Psychology, vol. 92, no. 3, p. 368, 1983.
  31. D. M. Jacobs, M. Sano, G. Dooneief, K. Marder, K. L. Bell, and Y. Stern, “Neuropsychological detection and characterization of preclinical alzheimer’s disease,” Neurology, vol. 45, no. 5, pp. 957–962, 1995.
  32. V. J. Pereira and D. Sell, “How differences in anatomy and physiology and other aetiology affect the way we label and describe speech in individuals with cleft lip and palate,” International Journal of Language & Communication Disorders, 2023.
  33. R. F. Orlikoff and D. H. Kraus, “Dysphonia following nonsurgical management of advanced laryngeal carcinoma,” American Journal of Speech-Language Pathology, vol. 5, no. 3, pp. 47–52, 1996.
  34. B. Binazzi, B. Lanini, I. Romagnoli, S. Garuglieri, L. Stendardi, R. Bianchi, F. Gigliotti, and G. Scano, “Dyspnea during speech in chronic obstructive pulmonary disease patients: effects of pulmonary rehabilitation,” Respiration, vol. 81, no. 5, pp. 379–385, 2011.
  35. D. P. Folsom, L. Lindamer, L. P. Montross, W. Hawthorne, S. Golshan, R. Hough, J. Shale, and D. V. Jeste, “Diagnostic variability for schizophrenia and major depression in a large public mental health care system dataset,” Psychiatry Research, vol. 144, no. 2-3, pp. 167–175, 2006.
  36. A. M. Weinstein, S. Gujral, M. A. Butters, C. R. Bowie, C. E. Fischer, A. J. Flint, N. Herrmann, J. L. Kennedy, L. Mah, S. Ovaysikia et al., “Diagnostic precision in the detection of mild cognitive impairment: a comparison of two approaches,” The American Journal of Geriatric Psychiatry, vol. 30, no. 1, pp. 54–64, 2022.
  37. T. G. Beach and C. H. Adler, “Importance of low diagnostic accuracy for early parkinson’s disease,” Movement Disorders, vol. 33, no. 10, pp. 1551–1554, 2018.
  38. W. Xiong, J. Droppo, X. Huang, F. Seide, M. L. Seltzer, A. Stolcke, D. Yu, and G. Zweig, “Toward human parity in conversational speech recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 12, pp. 2410–2423, 2017.
  39. B. Schuller, S. Steidl, A. Batliner, J. Hirschberg, J. K. Burgoon, A. Baird, A. Elkins, Y. Zhang, E. Coutinho, and K. Evanini, “The interspeech 2016 computational paralinguistics challenge: Deception, sincerity & native language,” in Interspeech 2016, 2016, pp. 2001–2005.
  40. A. Mohamed, H.-y. Lee, L. Borgholt, J. D. Havtorn, J. Edin, C. Igel, K. Kirchhoff, S.-W. Li, K. Livescu, L. Maaløe et al., “Self-supervised speech representation learning: A review,” IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1179–1210, 2022.
  41. O. Murton, R. Hillman, and D. Mehta, “Cepstral peak prominence values for clinical voice evaluation,” American Journal of Speech-Language Pathology, vol. 29, no. 3, pp. 1596–1607, 2020.
  42. K. Bunton, R. D. Kent, J. R. Duffy, J. C. Rosenbek, and J. F. Kent, “Listener agreement for auditory-perceptual ratings of dysarthria,” Journal of Speech, Language, and Hearing Research, vol. 50, no. 6, pp. 1481–1495, 2007. [Online]. Available: https://pubs.asha.org/doi/abs/10.1044/1092-4388%282007/102%29
  43. U. Food, D. Administration et al., “Clinical decision support software: guidance for industry and food and drug administration staff,” FDA digirepo. nlm. nih. gov/master/borndig/9918504188706676/9918504188706676. pdf, 2022.
  44. A. R. Reich, J. A. Mason, R. R. Frederickson, and R. S. Schlauch, “Factors influencing fundamental frequency range estimates in children,” Journal of Speech and Hearing Disorders, vol. 54, no. 3, pp. 429–438, 1989.
  45. J. Wit, B. Maassen, F. Gabreels, and G. Thoonen, “Maximum performance tests in children with developmental spastic dysarthria,” Journal of Speech, Language, and Hearing Research, vol. 36, no. 3, pp. 452–459, 1993.
  46. M. Ordin and I. Mennen, “Cross-linguistic differences in bilinguals’ fundamental frequency ranges,” Journal of Speech, Language, and Hearing Research, vol. 60, no. 6, pp. 1493–1506, 2017.
  47. M. K. Shear, J. Vander Bilt, P. Rucci, J. Endicott, B. Lydiard, M. W. Otto, M. H. Pollack, L. Chandler, J. Williams, A. Ali et al., “Reliability and validity of a structured interview guide for the hamilton anxiety rating scale (sigh-a),” Depression and anxiety, vol. 13, no. 4, pp. 166–178, 2001.
  48. H. Stassen, G. Bomben, and E. Günther, “Speech characteristics in depression,” Psychopathology, vol. 24, no. 2, pp. 88–105, 1991.
  49. L. Cummings, “Describing the cookie theft picture: Sources of breakdown in alzheimer’s dementia,” Pragmatics and Society, vol. 10, no. 2, pp. 153–176, 2019.
  50. R. Goldman and M. Fristoe, “Goldman-fristoe test of articulation,” 1969.
  51. P. Cheung, A. Ng, and C. To, “Hong kong cantonese articulation test,” Language Information Sciences Research Centre, City University of Hong Kong, 2006.
  52. N. R. Benway and J. L. Preston, “Differences between school-age children with apraxia of speech and other speech sound disorders on multisyllable repetition,” Perspectives of the ASHA Special Interest Groups, vol. 5, no. 4, pp. 794–808, 2020.
  53. D. Sell, “Issues in perceptual speech analysis in cleft palate and related disorders: a review,” International Journal of Language & Communication Disorders, vol. 40, no. 2, pp. 103–121, 2005.
  54. V. Woisard, C. Astésano, M. Balaguer, J. Farinas, C. Fredouille, P. Gaillard, A. Ghio, L. Giusti, I. Laaridh, M. Lalain et al., “C2si corpus: a database of speech disorder productions to assess intelligibility and quality of life in head and neck cancers,” Language Resources and Evaluation, vol. 55, no. 1, pp. 173–190, 2021.
  55. K. Omori, “Diagnosis of voice disorders,” JMAJ, vol. 54, no. 4, pp. 248–253, 2011.
  56. T. Karlsen, L. Sandvik, J.-H. Heimdal, and H. J. Aarstad, “Acoustic voice analysis and maximum phonation time in relation to voice handicap index score and larynx disease,” Journal of Voice, vol. 34, no. 1, pp. 161–e27, 2020.
  57. M. Mulligan, J. Carpenter, J. Riddel, M. K. Delaney, G. Badger, P. Krusinski, and R. Tandan, “Intelligibility and the acoustic characteristics of speech in amyotrophic lateral sclerosis (als),” Journal of Speech, Language, and Hearing Research, vol. 37, no. 3, pp. 496–503, 1994.
  58. L. J. Ball, D. R. Beukelman, and G. L. Pattee, “Communication effectiveness of individuals with amyotrophic lateral sclerosis,” Journal of Communication Disorders, vol. 37, no. 3, pp. 197–215, 2004.
  59. P. M. Usita, I. E. Hyman Jr, and K. C. Herman, “Narrative intentions: Listening to life stories in alzheimer’s disease,” Journal of Aging Studies, vol. 12, no. 2, pp. 185–197, 1998.
  60. C. E. Leyton, S. Savage, M. Irish, S. Schubert, O. Piguet, K. J. Ballard, and J. R. Hodges, “Verbal repetition in primary progressive aphasia and alzheimer’s disease,” Journal of Alzheimer’s Disease, vol. 41, no. 2, pp. 575–585, 2014.
  61. A. S. Cohen, J. E. McGovern, T. J. Dinzeo, and M. A. Covington, “Speech deficits in serious mental illness: a cognitive resource issue?” Schizophrenia research, vol. 160, no. 1-3, pp. 173–179, 2014.
  62. T. T. Schnur and S. Wang, “Differences in connected speech outcomes across elicitation methods,” Aphasiology, vol. 38, no. 5, pp. 816–837, 2024.
  63. J. Mayer and L. Murray, “Functional measures of naming in aphasia: Word retrieval in confrontation naming versus connected speech,” Aphasiology, vol. 17, no. 5, pp. 481–497, 2003.
  64. T. Tykalova, D. Skrabal, T. Boril, R. Cmejla, J. Volin, and J. Rusz, “Effect of ageing on acoustic characteristics of voice pitch and formants in czech vowels,” Journal of Voice, vol. 35, no. 6, pp. 931–e21, 2021.
  65. G.-S. Lee, “Variability in voice fundamental frequency of sustained vowels in speakers with sensorineural hearing loss,” Journal of Voice, vol. 26, no. 1, pp. 24–29, 2012.
  66. C. I. Abbiati, K. R. Bauerly, and S. L. Velleman, “Speech elicitation methods for measuring articulatory control,” Journal of Speech, Language, and Hearing Research, pp. 1–8, 2023.
  67. K. Tran, L. Xu, G. Stegmann, J. Liss, V. Berisha, and R. Utianski, “Investigating the impact of speech compression on the acoustics of dysarthric speech.” in Proc. Interspeech, 2022, pp. 2263–2267.
  68. C. Ge, Y. Xiong, and P. Mok, “How Reliable Are Phonetic Data Collected Remotely? Comparison of Recording Devices and Environments on Acoustic Measurements,” in Proc. Interspeech 2021, 2021, pp. 3984–3988.
  69. J. Höbel-Müller, I. Siegert, R. Heinemann, A. F. Requardt, M. Tornow, and A. Wendemuth, “Analysis of the influence of different room acoustics on acoustic emotion features and emotion recognition performance,” in Tagungsband - DAGA 2019, Rostock, Germany, 2019, pp. 886–889.
  70. V. S. Fahed, E. P. Doheny, M. Busse, J. Hoblyn, and M. M. Lowery, “Comparison of acoustic voice features derived from mobile devices and studio microphone recordings,” Journal of Voice, 2022.
  71. A. Szabo, B. Hammarberg, A. Hakansson, and M. Sodersten, “A voice accumulator device: Evaluation based on studio and field recordings,” Logopedics Phoniatrics Vocology, vol. 26, no. 3, pp. 102–117, 2001.
  72. T. Printz, J. R. Sorensen, C. Godballe, and Å. M. Grøntved, “Test-retest reliability of the dual-microphone voice range profile,” Journal of Voice, vol. 32, no. 1, pp. 32–37, 2018.
  73. J. Rusz, T. Tykalova, L. O. Ramig, and E. Tripoliti, “Guidelines for speech recording and acoustic analyses in dysarthrias of movement disorders,” Movement Disorders, vol. 36, no. 4, pp. 803–814, 2021.
  74. J. G. Švec and S. Granqvist, “Guidelines for selecting microphones for human voice production research,” American Journal of Speech-Language Pathology, vol. 19, no. 4, pp. 356–368, 2010.
  75. Y. Pan and A. Waibel, “The effects of room acoustics on mfcc speech parameter.” in Proc. Interspeech, 2000, pp. 129–132.
  76. J. Dineley, E. Carr, F. Matcham, J. Downs, R. J. B. Dobson, T. F. Quatieri, and N. Cummins, “Towards robust paralinguistic assessment for real-world mobile health (mhealth) monitoring: an initial study of reverberation effects on speech,” in Proc. Interspeech, 2023, pp. 2373–2377.
  77. S. Skodda and U. Schlegel, “Speech rate and rhythm in parkinson’s disease,” Movement disorders: official journal of the Movement Disorder Society, vol. 23, no. 7, pp. 985–992, 2008.
  78. G. S. Turner, K. Tjaden, and G. Weismer, “The influence of speaking rate on vowel space and speech intelligibility for individuals with amyotrophic lateral sclerosis,” Journal of Speech, Language, and Hearing Research, vol. 38, no. 5, pp. 1001–1013, 1995.
  79. G. Weismer, J.-Y. Jeng, J. S. Laures, R. D. Kent, and J. F. Kent, “Acoustic and intelligibility characteristics of sentence production in neurogenic speech disorders,” Folia Phoniatrica et Logopaedica, vol. 53, no. 1, pp. 1–18, 2001.
  80. V. Berisha, S. Sandoval, R. Utianski, J. Liss, and A. Spanias, “Characterizing the distribution of the quadrilateral vowel space area,” The Journal of the Acoustical Society of America, vol. 135, no. 1, pp. 421–427, 2014.
  81. K. D. Mueller, B. Hermann, J. Mecollari, and L. S. Turkstra, “Connected speech and language in mild cognitive impairment and alzheimer’s disease: A review of picture description tasks,” Journal of clinical and experimental neuropsychology, vol. 40, no. 9, pp. 917–939, 2018.
  82. A. Vellido, “The importance of interpretability and visualization in machine learning for applications in medicine and health care,” Neural computing and applications, vol. 32, no. 24, pp. 18 069–18 083, 2020.
  83. J. Liss and V. Berisha, “Operationalizing clinical speech analytics: Moving from features to measures for real-world clinical impact,” Journal of Speech, Language, and Hearing Research, pp. 1–7, 2024.
  84. B. M. Halpern, R. van Son, M. van den Brekel, and O. Scharenborg, “Detecting and Analysing Spontaneous Oral Cancer Speech in the Wild,” in Proc. Interspeech, 2020, pp. 4826–4830.
  85. S. Möller and R. Schönweiler, “Analysis of infant cries for the early detection of hearing impairment,” Speech communication, vol. 28, no. 3, pp. 175–193, 1999.
  86. S. Quintas, J. Mauclair, V. Woisard, and J. Pinquier, “Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer,” in Proc. Interspeech 2020, 2020, pp. 4976–4980.
  87. S. P. Bayerl, D. Wagner, E. Noeth, and K. Riedhammer, “Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0,” in Proc. Interspeech, 2022, pp. 2868–2872.
  88. F. Braun, S. P. Bayerl, P. A. Pérez-Toro, F. Hönig, H. Lehfeld, T. Hillemacher, E. Nöth, T. Bocklet, and K. Riedhammer, “Classifying Dementia in the Presence of Depression: A Cross-Corpus Study,” in Proc. INTERSPEECH 2023, 2023, pp. 2308–2312.
  89. L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.” Journal of machine learning research, vol. 9, no. 11, 2008.
  90. F. Zheng, G. Zhang, and Z. Song, “Comparison of different implementations of mfcc,” Journal of Computer science and Technology, vol. 16, pp. 582–589, 2001.
  91. B. T. Meyer, B. Kollmeier, and J. Ooster, “Autonomous measurement of speech intelligibility utilizing automatic speech recognition,” in Proc. Interspeech, 2015, pp. 2982–2986.
  92. M. Schuster, A. Maier, T. Haderlein, E. Nkenke, U. Wohlleben, F. Rosanowski, U. Eysholdt, and E. Nöth, “Evaluation of speech intelligibility for children with cleft lip and palate by means of automatic speech recognition,” International Journal of Pediatric Otorhinolaryngology, vol. 70, no. 10, pp. 1741–1747, 2006.
  93. H. Kim, M. Hasegawa-Johnson, and A. Perlman, “Vowel contrast and speech intelligibility in dysarthria,” Folia Phoniatrica et Logopaedica, vol. 63, no. 4, pp. 187–194, 2011.
  94. Y. Kim, R. D. Kent, and G. Weismer, “An acoustic study of the relationships among neurologic disease, dysarthria type, and severity of dysarthria,” Journal of Speech, Language, and Hearing Research, vol. 54, no. 2, pp. 417–429, 2011.
  95. K. L. Lansford and J. M. Liss, “Vowel acoustics in dysarthria: Speech disorder diagnosis and classification,” Journal of Speech, Language, and Hearing Research, vol. 57, no. 1, pp. 57–67, 2014. [Online]. Available: https://pubs.asha.org/doi/abs/10.1044/1092-4388%282013/12-0262%29
  96. F. L. Darley, A. E. Aronson, and J. R. Brown, “Differential diagnostic patterns of dysarthria,” Journal of speech and hearing research, vol. 12, no. 2, pp. 246–269, 1969.
  97. G. M. Stegmann, S. Hahn, J. Liss, J. Shefner, S. B. Rutkove, K. Kawabata, S. Bhandari, K. Shelton, C. J. Duncan, and V. Berisha, “Repeatability of commonly used speech and language features for clinical applications,” Digital biomarkers, vol. 4, no. 3, pp. 109–122, 2020.
  98. F. Eyben, M. Wöllmer, and B. Schuller, “Opensmile: the munich versatile and fast open-source audio feature extractor,” in Proceedings of the 18th ACM international conference on Multimedia, 2010, pp. 1459–1462.
  99. P. Boersma and D. Weenink, “Praat: Doing phonetics by computer [computer program]. version 6.0. 37,” Retrieved February 2018 from http://www.praat.org/.
  100. D. Iter, J. Yoon, and D. Jurafsky, “Automatic detection of incoherent speech for diagnosing schizophrenia,” in Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, 2018, pp. 136–146.
  101. K. Hitczenko, H. Cowan, V. Mittal, and M. Goldrick, “Automated coherence measures fail to index thought disorder in individuals at risk for psychosis,” in Proceedings of the seventh workshop on computational linguistics and clinical psychology: improving access, 2021, pp. 129–150.
  102. Y. Park and C. E. Stepp, “Test–retest reliability of relative fundamental frequency and conventional acoustic, aerodynamic, and perceptual measures in individuals with healthy voices,” Journal of Speech, Language, and Hearing Research, vol. 62, no. 6, pp. 1707–1718, 2019.
  103. S. dos Santos Barreto and K. Zazo Ortiz, “Protocol for the evaluation of speech intelligibility in dysarthrias: evidence of reliability and validity,” Folia Phoniatrica et Logopaedica, vol. 67, no. 4, pp. 212–218, 2016.
  104. H. P. Rowe, K. L. Stipancic, A. C. Lammert, and J. R. Green, “Validation of an acoustic-based framework of speech motor control: Assessing criterion and construct validity using kinematic and perceptual measures,” Journal of Speech, Language, and Hearing Research, vol. 64, no. 12, pp. 4736–4753, 2021.
  105. B. A. Yawer, J. Liss, and V. Berisha, “Reliability and validity of a widely-available ai tool for assessment of stress based on speech,” Scientific reports, vol. 13, no. 1, p. 20224, 2023.
  106. R. S. Reis, A. Hino, and C. Añez, “Perceived stress scale,” J. health Psychol, vol. 15, no. 1, pp. 107–114, 2010.
  107. M. Darling-White and J. E. Huber, “The impact of parkinson’s disease on breath pauses and their relationship to speech impairment: A longitudinal study,” American Journal of Speech-Language Pathology, vol. 29, no. 4, pp. 1910–1922, 2020.
  108. M. Darling-White, Z. Anspach, and J. E. Huber, “Longitudinal effects of parkinson’s disease on speech breathing during an extemporaneous connected speech task,” Journal of Speech, Language, and Hearing Research, vol. 65, no. 4, pp. 1402–1415, 2022.
  109. N. P. Solomon and T. J. Hixon, “Speech breathing in parkinson’s disease,” Journal of Speech, Language, and Hearing Research, vol. 36, no. 2, pp. 294–310, 1993.
  110. R. Voleti, J. M. Liss, and V. Berisha, “A review of automated speech and language features for assessment of cognitive and thought disorders,” IEEE journal of selected topics in signal processing, vol. 14, no. 2, pp. 282–298, 2019.
  111. B. Roark, M. Mitchell, J.-P. Hosom, K. Hollingshead, and J. Kaye, “Spoken language derived measures for detecting mild cognitive impairment,” IEEE transactions on audio, speech, and language processing, vol. 19, no. 7, pp. 2081–2090, 2011.
  112. A. König, A. Satt, A. Sorin, R. Hoory, O. Toledo-Ronen, A. Derreumaux, V. Manera, F. Verhey, P. Aalten, P. H. Robert et al., “Automatic speech analysis for the assessment of patients with predementia and alzheimer’s disease,” Alzheimer’s & Dementia: Diagnosis, Assessment & Disease Monitoring, vol. 1, no. 1, pp. 112–124, 2015.
  113. R. L. Horwitz-Martin, T. F. Quatieri, A. C. Lammert, J. R. Williamson, Y. Yunusova, E. Godoy, D. D. Mehta, and J. R. Green, “Relation of automatically extracted formant trajectories with intelligibility loss and speaking rate decline in amyotrophic lateral sclerosis.” in Proc. Interspeech, 2016, pp. 1205–1209.
  114. V. H. Yngve, “A model and an hypothesis for language structure,” Proceedings of the American philosophical society, vol. 104, no. 5, pp. 444–466, 1960.
  115. S. Lee, E. J. Yeo, S. Kim, and M. Chung, “Knowledge-driven speech features for detection of korean-speaking children with autism spectrum disorder,” Phonetics and Speech Sciences, vol. 15, no. 2, pp. 53–59, 2023.
  116. J. Rusz, R. Cmejla, H. Ruzickova, and E. Ruzicka, “Quantitative acoustic measurements for characterization of speech and voice disorders in early untreated parkinson’s disease,” The journal of the Acoustical Society of America, vol. 129, no. 1, pp. 350–367, 2011.
  117. B. T. Harel, M. S. Cannizzaro, H. Cohen, N. Reilly, and P. J. Snyder, “Acoustic characteristics of parkinsonian speech: a potential biomarker of early disease progression and treatment,” Journal of Neurolinguistics, vol. 17, no. 6, pp. 439–453, 2004.
  118. A. Tsanas, M. A. Little, C. Fox, and L. O. Ramig, “Objective automatic assessment of rehabilitative speech treatment in parkinson’s disease,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 22, no. 1, pp. 181–190, 2013.
  119. A. Romana, J. Bandon, N. Carlozzi, A. Roberts, and E. M. Provost, “Classification of manifest huntington disease using vowel distortion measures,” in Proc. Interspeech, vol. 2020, 2020, p. 4966.
  120. M. Perez, W. Jin, D. Le, N. Carlozzi, P. Dayalu, A. Roberts, and E. M. Provost, “Classification of huntington disease using acoustic and lexical features,” in Proc. Interspeech, vol. 2018, 2018, p. 1898.
  121. M. Charest, M. J. Skoczylas, and P. Schneider, “Properties of lexical diversity in the narratives of children with typical language development and developmental language disorder,” American Journal of Speech-Language Pathology, vol. 29, no. 4, pp. 1866–1882, 2020.
  122. G. Fergadiotis, H. H. Wright, and T. M. West, “Measuring lexical diversity in narrative discourse of people with aphasia.” American Journal of Speech-Language Pathology, vol. 22, no. 2, 2013.
  123. A. Nagrani, J. S. Chung, W. Xie, and A. Zisserman, “Voxceleb: Large-scale speaker verification in the wild,” Computer Speech & Language, vol. 60, p. 101027, 2020.
  124. G. Chen, S. Chai, G.-B. Wang, J. Du, W.-Q. Zhang, C. Weng, D. Su, D. Povey, J. Trmal, J. Zhang, M. Jin, S. Khudanpur, S. Watanabe, S. Zhao, W. Zou, X. Li, X. Yao, Y. Wang, Z. You, and Z. Yan, “Gigaspeech: An evolving, multi-domain asr corpus with 10,000 hours of transcribed audio,” in Proc. Interspeech, 2021, pp. 3670–3674.
  125. V. Changawala and F. Rudzicz, “Whister: Using whisper’s representations for stuttering detection,” in Interspeech 2024, 2024, pp. 897–901.
  126. M. Schubert, D. Duran, and I. Siegert, “Challenges of german speech recognition: A study on multi-ethnolectal speech among adolescents,” in Interspeech 2024, 2024, pp. 3045–3049.
  127. V. Silber-Varod, I. Siegert, O. Jokisch, Y. Sinha, and N. Geri, “A cross-language study of selected speech recognition systems,” The Online Journal of Applied Knowledge Management: OJAKM, vol. 9, pp. 1 – 15, 2021.
  128. S. M. Witt and S. J. Young, “Phone-level pronunciation scoring and assessment for interactive language learning,” Speech communication, vol. 30, no. 2-3, pp. 95–108, 2000.
  129. W. Hu, Y. Qian, F. K. Soong, and Y. Wang, “Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers,” Speech Communication, vol. 67, pp. 154–166, 2015.
  130. L. Fontan, T. Pellegrini, J. Olcoz, and A. Abad, “Predicting disordered speech comprehensibility from goodness of pronunciation scores,” in Proc. SLPAT, 2015, pp. 42–46.
  131. Y. Jiao, V. Berisha, and J. Liss, “Interpretable phonological features for clinical applications,” in Proc. ICASSP, 2017, pp. 5045–5049.
  132. V. C. Mathad, N. Scherer, K. Chapman, J. M. Liss, and V. Berisha, “A deep learning algorithm for objective assessment of hypernasality in children with cleft palate,” IEEE Transactions on Biomedical Engineering, vol. 68, no. 10, pp. 2986–2996, 2021.
  133. V. C. Mathad, J. M. Liss, K. Chapman, N. Scherer, and V. Berisha, “Consonant-vowel transition models based on deep learning for objective evaluation of articulation,” IEEE/ACM transactions on audio, speech, and language processing, vol. 31, pp. 86–95, 2022.
  134. K. N. Stevens, “Evidence for the role of acoustic boundaries in the perception of speech sounds,” The Journal of the Acoustical Society of America, vol. 69, no. S1, pp. S116–S116, 1981.
  135. M. S. Hedrick and R. N. Ohde, “Effect of relative amplitude of frication on perception of place of articulation,” The Journal of the Acoustical Society of America, vol. 94, no. 4, pp. 2005–2026, 1993.
  136. M. Cernak, J. R. Orozco-Arroyave, F. Rudzicz, H. Christensen, J. C. Vásquez-Correa, and E. Nöth, “Characterisation of voice quality of parkinson’s disease using differential phonological posterior features,” Computer Speech & Language, vol. 46, pp. 196–208, 2017.
  137. Y. Liu, T. Lee, T. Law, and K. Y.-S. Lee, “Acoustical assessment of voice disorder with continuous speech using asr posterior features,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 6, pp. 1047–1059, 2019.
  138. M. Shahin and B. Ahmed, “Anomaly detection based pronunciation verification approach using speech attribute features,” Speech Communication, vol. 111, pp. 29–43, 2019.
  139. L. Wang, Y. Gong, N. Dawalatabad, M. Vilela, K. Placek, B. Tracey, Y. Gong, A. Premasiri, F. Vieira, and J. Glass, “Automatic prediction of amyotrophic lateral sclerosis progression using longitudinal speech transformer,” in Interspeech 2024, 2024, pp. 2000–2004.
  140. S. H. Dumpala, K. Dikaios, A. Nunes, F. Rudzicz, R. Uher, and S. Oore, “Self-supervised embeddings for detecting individual symptoms of depression,” in Proc. Interspeech, 2024, pp. 1450–1454.
  141. S. Lee, S. Kim, and M. Chung, “Automatic assessment of speech production skills for children with cochlear implants using wav2vec2.0 acoustic embeddings,” in Interspeech 2024, 2024, pp. 862–866.
  142. J. Shor, A. Jansen, W. Han, D. Park, and Y. Zhang, “Universal paralinguistic speech representations using self-supervised conformers,” in Proc. ICASSP.   IEEE, 2022, pp. 3169–3173.
  143. J. Shor and S. Venugopalan, “Trillsson: Distilled universal paralinguistic speech representations,” in Proc. Interspeech, 2022, pp. 356–360.
  144. H. Lee and A. Saeed, “Distilled non-semantic speech embeddings with binary neural networks for low-resource devices,” Pattern Recognition Letters, vol. 177, pp. 15–19, 2024.
  145. J. Shor, A. Jansen, R. Maor, O. Lang, O. Tuval, F. de Chaumont Quitry, M. Tagliasacchi, I. Shavitt, D. Emanuel, and Y. Haviv, “Towards learning a universal non-semantic representation of speech,” in Proc. Interspeech, 2020, pp. 140–144.
  146. R. D. Kent, “Hearing and believing: Some limits to the auditory-perceptual assessment of speech and voice disorders,” American Journal of Speech-Language Pathology, vol. 5, no. 3, pp. 7–23, 1996.
  147. T. R. Goldstein, D. J. Miklowitz, and K. L. Mullen, “Social skills knowledge and performance among adolescents with bipolar disorder,” Bipolar disorders, vol. 8, no. 4, pp. 350–361, 2006.
  148. J. Lee, L. Altshuler, D. C. Glahn, D. J. Miklowitz, K. Ochsner, and M. F. Green, “Social and nonsocial cognition in bipolar disorder and schizophrenia: relative levels of impairment,” American Journal of Psychiatry, vol. 170, no. 3, pp. 334–341, 2013.
  149. F. L. Wuyts, M. S. D. Bodt, G. Molenberghs, M. Remacle, L. Heylen, B. Millet, K. V. Lierde, J. Raes, and P. H. V. d. Heyning, “The dysphonia severity index: an objective measure of vocal quality based on a multiparameter approach,” Journal of speech, language, and hearing research, vol. 43, no. 3, pp. 796–809, 2000.
  150. V. Berisha and J. M. Liss, “Responsible development of clinical speech ai: Bridging the gap between clinical research and technology,” NPJ Digital Medicine, vol. 7, no. 1, p. 208, 2024.
  151. S. A. Borrie, M. J. McAuliffe, and J. M. Liss, “Perceptual learning of dysarthric speech: A review of experimental studies,” Journal of Speech, Language, and Hearing Research, vol. 55, no. 1, pp. 290–305, 2012. [Online]. Available: https://pubs.asha.org/doi/abs/10.1044/1092-4388%282011/10-0349%29
  152. M. Tu, V. Berisha, and J. Liss, “Interpretable objective assessment of dysarthric speech based on deep neural networks.” in Proc. Interspeech, 2017, pp. 1849–1853.
  153. L. B. Helou, N. P. Solomon, L. R. Henry, G. L. Coppit, R. S. Howard, and A. Stojadinovic, “The role of listener experience on consensus auditory-perceptual evaluation of voice (cape-v) ratings of postthyroidectomy voice,” American Journal of Speech-Language Pathology, vol. 19, no. 3, p. 248, 2010.
  154. R. R. Patel, S. N. Awan, J. Barkmeier-Kraemer, M. Courey, D. Deliyski, T. Eadie, D. Paul, J. G. Švec, and R. Hillman, “Recommended protocols for instrumental assessment of voice: American speech-language-hearing association expert panel to develop a protocol for instrumental assessment of vocal function,” American journal of speech-language pathology, vol. 27, no. 3, pp. 887–905, 2018.
  155. J. G. Švec and S. Granqvist, “Tutorial and guidelines on measurement of sound pressure level in voice and speech,” Journal of Speech, Language, and Hearing Research, vol. 61, no. 3, pp. 441–461, 2018.
  156. D. Karimi, H. Dou, S. K. Warfield, and A. Gholipour, “Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis,” Medical image analysis, vol. 65, p. 101759, 2020.
  157. W. Li, G. Dasarathy, and V. Berisha, “Regularization via structural label smoothing,” in International Conference on Artificial Intelligence and Statistics.   PMLR, 2020, pp. 1453–1463.
  158. X. Ma, H. Huang, Y. Wang, S. Romano, S. Erfani, and J. Bailey, “Normalized loss functions for deep learning with noisy labels,” in International conference on machine learning.   PMLR, 2020, pp. 6543–6553.
  159. C. Sauder, M. Bretl, and T. Eadie, “Predicting voice disorder status from smoothed measures of cepstral peak prominence using praat and analysis of dysphonia in speech and voice (adsv),” Journal of Voice, vol. 31, no. 5, pp. 557–566, 2017.
  160. L. Xu, J. Liss, and V. Berisha, “Dysarthria detection based on a deep learning model with a clinically-interpretable layer,” JASA Express Letters, vol. 3, no. 1, 2023.
  161. S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” Proc. NeurIPS, p. 4768–4777, 2017.
  162. A. Yeung, A. Iaboni, E. Rochon, M. Lavoie, C. Santiago, M. Yancheva, J. Novikova, M. Xu, J. Robin, L. D. Kaufman et al., “Correlating natural language processing and automated speech analysis with clinician assessment to quantify speech-language changes in mild cognitive impairment and alzheimer’s dementia,” Alzheimer’s research & therapy, vol. 13, no. 1, p. 109, 2021.
  163. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” Advances in neural information processing systems, vol. 26, 2013.
  164. J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543.
  165. J. D. M.-W. C. Kenton and L. K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of NAACL-HLT, 2019, pp. 4171–4186.
  166. N. B. Lundin, M. N. Jones, E. J. Myers, A. Breier, and K. S. Minor, “Semantic and phonetic similarity of verbal fluency responses in early-stage psychosis,” Psychiatry research, vol. 309, p. 114404, 2022.
  167. N. M. Docherty, M. DeRosa, and N. C. Andreasen, “Communication disturbances in schizophrenia and mania,” Archives of General Psychiatry, vol. 53, no. 4, pp. 358–364, 1996.
  168. W. Xu, J. Portanova, A. Chander, D. Ben-Zeev, and T. Cohen, “The centroid cannot hold: comparing sequential and global estimates of coherence as indicators of formal thought disorder,” in AMIA Annual Symposium Proceedings, vol. 2020.   American Medical Informatics Association, 2020, p. 1315.
  169. A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, and T. Mikolov, “Fasttext.zip: Compressing text classification models,” arXiv preprint arXiv:1612.03651, 2016.
  170. S. X. Tang, Y. Cong, A. H. Nikzad, A. Mehta, S. Cho, K. Hänsel, S. Berretta, A. A. Dhar, J. M. Kane, and A. K. Malhotra, “Clinical and computational speech measures are associated with social cognition in schizophrenia spectrum disorders,” Schizophrenia Research, vol. 259, pp. 28–37, 2023.
  171. J. C. Vásquez-Correa, C. D. Rios-Urrego, T. Arias-Vergara, M. Schuster, J. Rusz, E. Noeth, and J. R. Orozco-Arroyave, “Transfer learning helps to improve the accuracy to classify patients with different speech disorders in different languages,” Pattern Recognition Letters, vol. 150, pp. 272–279, 2021.
  172. D. Wang and T. F. Zheng, “Transfer learning for speech and language processing,” in 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).   IEEE, 2015, pp. 1225–1237.
  173. A. Bailey and M. D. Plumbley, “Gender bias in depression detection using audio features,” in 2021 29th European Signal Processing Conference (EUSIPCO).   IEEE, 2021, pp. 596–600.
  174. M. Yang, A.-A. El-Attar, and T. Chaspari, “Deconstructing demographic bias in speech-based machine learning models for digital health,” Frontiers in Digital Health, vol. 6, p. 1351637, 2024.
  175. J. Rusz, J. Švihlík, P. Krỳže, M. Novotnỳ, and T. Tykalová, “Reproducibility of voice analysis with machine learning,” Movement Disorders, vol. 36, no. 5, pp. 1282–1283, 2021.
  176. J. Zhang, J. Liss, S. Jayasuriya, and V. Berisha, “Robust vocal quality feature embeddings for dysphonic voice detection,” IEEE/ACM transactions on audio, speech, and language processing, vol. 31, pp. 1348–1359, 2023.
  177. T. Ko, V. Peddinti, D. Povey, M. L. Seltzer, and S. Khudanpur, “A study on data augmentation of reverberant speech for robust speech recognition,” in Proc. ICASSP.   IEEE, 2017, pp. 5220–5224.
  178. B. Vachhani, C. Bhat, and S. K. Kopparapu, “Data augmentation using healthy speech for dysarthric speech recognition,” in Proc. Interspeech, 2018, pp. 471–475.
  179. S. Shahnawazuddin, W. Ahmad, N. Adiga, and A. Kumar, “In-domain and out-of-domain data augmentation to improve children’s speaker verification system in limited data scenario,” in Proc. ICASSP.   IEEE, 2020, pp. 7554–7558.
  180. L. Prananta, B. Halpern, S. Feng, and O. Scharenborg, “The effectiveness of time stretching for enhancing dysarthric speech for improved dysarthric speech recognition,” in Proc. Interspeech, 2022, pp. 36–40.
  181. J. Zhang, S. Jayasuriya, and V. Berisha, “Learning repeatable speech embeddings using an intra-class correlation regularizer,” Advances in Neural Information Processing Systems, vol. 36, 2024.
  182. W. Zhou, B. Y. Lin, and X. Ren, “Isobn: Fine-tuning bert with isotropic batch normalization,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 16, 2021, pp. 14 621–14 629.
  183. C. Peyser, W. R. Huang, A. Rosenberg, T. Sainath, M. Picheny, and K. Cho, “Towards disentangled speech representations,” in Proc. Interspeech, 2022, pp. 3603–3607.
  184. J. Gao, D. He, X. Tan, T. Qin, L. Wang, and T. Liu, “Representation degeneration problem in training natural language generation models,” in Proc. ICLR, 2019.
  185. L. Xu, K. D. Mueller, J. Liss, and V. Berisha, “Decorrelating language model embeddings for speech-based prediction of cognitive impairment,” in Proc. ICASSP.   IEEE, 2023, pp. 1–5.
  186. J. M. Perero-Codosero, F. Espinoza-Cuadros, J. Antón-Martín, M. A. Barbero-Alvarez, and L. A. Hernández-Gómez, “Modeling obstructive sleep apnea voices using deep neural network embeddings and domain-adversarial training,” IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 2, pp. 240–250, 2019.
  187. D. Park, Y. Yu, D. Katabi, and H. K. Kim, “Adversarial continual learning to transfer self-supervised speech representations for voice pathology detection,” IEEE Signal Processing Letters, 2023.
  188. Y.-T. Hsu, Z. Zhu, C.-T. Wang, S.-H. Fang, F. Rudzicz, and Y. Tsao, “Robustness against the channel effect in pathological voice detection,” arXiv preprint arXiv:1811.10376, 2018.
  189. M. Amiri and I. Kodrasi, “Test-time adaptation for automatic pathological speech detection in noisy environments,” in Proc. European Signal Processing Conference, Lyon, France, 2024.
  190. P.-H. C. Chen, Y. Liu, and L. Peng, “How to develop machine learning models for healthcare,” Nature materials, vol. 18, no. 5, pp. 410–414, 2019.
  191. B. Schuller, S. Steidl, A. Batliner, J. Epps, F. Eyben, F. Ringeval, E. Marchi, and Y. Zhang, “The interspeech 2014 computational paralinguistics challenge: Cognitive & physical load, multitasking,” in Proc. Interspeech, 2014.
  192. V. Berisha, C. Krantsevich, P. R. Hahn, S. Hahn, G. Dasarathy, P. Turaga, and J. Liss, “Digital medicine and the curse of dimensionality,” NPJ digital medicine, vol. 4, no. 1, p. 153, 2021.
  193. J. Robin, J. E. Harrison, L. D. Kaufman, F. Rudzicz, W. Simpson, and M. Yancheva, “Evaluation of speech-based digital biomarkers: review and recommendations,” Digital Biomarkers, vol. 4, no. 3, pp. 99–108, 2020.
  194. T. P. Quinn, S. Jacobs, M. Senadeera, V. Le, and S. Coghlan, “The three ghosts of medical ai: Can the black-box present deliver?” Artificial intelligence in medicine, vol. 124, p. 102158, 2022.
  195. G. Montavon, A. Binder, S. Lapuschkin, W. Samek, and K.-R. Müller, “Layer-wise relevance propagation: an overview,” Explainable AI: interpreting, explaining and visualizing deep learning, pp. 193–209, 2019.
  196. B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2921–2929.
  197. M. T. Ribeiro, S. Singh, and C. Guestrin, “” why should i trust you?” explaining the predictions of any classifier,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016, pp. 1135–1144.
  198. C. Rudin, “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead,” Nature machine intelligence, vol. 1, no. 5, pp. 206–215, 2019.
  199. D. Alvarez Melis and T. Jaakkola, “Towards robust interpretability with self-explaining neural networks,” Advances in neural information processing systems, vol. 31, 2018.
  200. G. M. Stegmann, S. Hahn, C. J. Duncan, S. B. Rutkove, J. Liss, J. M. Shefner, and V. Berisha, “Estimation of forced vital capacity using speech acoustics in patients with als,” Amyotrophic Lateral Sclerosis and Frontotemporal Degeneration, vol. 22, no. sup1, pp. 14–21, 2021.
  201. G. Stegmann, S. Hahn, S. Bhandari, K. Kawabata, J. Shefner, C. J. Duncan, J. Liss, V. Berisha, and K. Mueller, “Automated semantic relevance as an indicator of cognitive decline: Out-of-sample validation on a large-scale longitudinal dataset,” Alzheimer’s & Dementia: Diagnosis, Assessment & Disease Monitoring, vol. 14, no. 1, p. e12294, 2022.
  202. T. B. Holmlund, C. Chandler, P. W. Foltz, A. S. Cohen, J. Cheng, J. C. Bernstein, E. P. Rosenfeld, and B. Elvevåg, “Applying speech technologies to assess verbal memory in patients with serious mental illness,” NPJ digital medicine, vol. 3, no. 1, p. 33, 2020.
  203. L. Semenova, C. Rudin, and R. Parr, “On the existence of simpler machine learning models,” in Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022, pp. 1827–1858.
  204. J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat et al., “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774, 2023.
  205. H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar et al., “Llama: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023.
  206. H. Nori, N. King, S. M. McKinney, D. Carignan, and E. Horvitz, “Capabilities of gpt-4 on medical challenge problems,” arXiv preprint arXiv:2303.13375, 2023.
  207. J. Kasai, Y. Kasai, K. Sakaguchi, Y. Yamada, and D. Radev, “Evaluating gpt-4 and chatgpt on japanese medical licensing examinations,” arXiv preprint arXiv:2303.18027, 2023.
  208. M. Rosoł, J. S. Gasior, J. Łaba, K. Korzeniewski, and M. Młyńczak, “Evaluation of the performance of gpt-3.5 and gpt-4 on the polish medical final examination,” Scientific Reports, vol. 13, no. 1, p. 20512, 2023.
  209. Q. Jin, F. Chen, Y. Zhou, Z. Xu, J. M. Cheung, R. Chen, R. M. Summers, J. F. Rousseau, P. Ni, M. J. Landsman et al., “Hidden flaws behind expert-level accuracy of multimodal gpt-4 vision in medicine,” ArXiv, pp. arXiv–2401, 2024.
  210. M. Bektaş, J. K. Pereira, F. Daams, and D. L. van der Peet, “Chatgpt in surgery: a revolutionary innovation?” Surgery today, vol. 54, no. 8, pp. 964–971, 2024.
  211. M. A. Fink, A. Bischoff, C. A. Fink, M. Moll, J. Kroschke, L. Dulz, C. P. Heußel, H.-U. Kauczor, and T. F. Weber, “Potential of chatgpt and gpt-4 for data mining of free-text ct reports on lung cancer,” Radiology, vol. 308, no. 3, p. e231362, 2023.
  212. C. Wang, S. Liu, A. Li, and J. Liu, “Text dialogue analysis for primary screening of mild cognitive impairment: Development and validation study,” Journal of Medical Internet Research, vol. 25, p. e51501, 2023.
  213. X. Fei, Y. Tang, J. Zhang, Z. Zhou, I. Yamamoto, and Y. Zhang, “Evaluating cognitive performance: Traditional methods vs. chatgpt,” Digital Health, vol. 10, p. 20552076241264639, 2024.
  214. C. Botelho, J. Mendonça, A. Pompili, T. Schultz, A. Abad, and I. Trancoso, “Macro-descriptors for alzheimer’s disease detection using large language models,” in Proc. Interspeech, 2024, pp. 1975–1979.
  215. V. Hristidis, N. Ruggiano, E. L. Brown, S. R. R. Ganta, and S. Stewart, “Chatgpt vs google for queries related to dementia and other cognitive decline: comparison of results,” Journal of Medical Internet Research, vol. 25, p. e48966, 2023.
  216. Z. Wang, R. Li, B. Dong, J. Wang, X. Li, N. Liu, C. Mao, W. Zhang, L. Dong, J. Gao et al., “Can llms like gpt-4 outperform traditional ai tools in dementia diagnosis? maybe, but not today,” arXiv preprint arXiv:2306.01499, 2023.
  217. R. H. Perlis, “Application of gpt-4 to select next-step antidepressant treatment in major depression,” MedRxiv, 2023.
  218. E. Mohamad, C. Boutoleau-Bretonnière, and G. Chapelet, “Chatgpt’s dance with neuropsychological data: a case study in alzheimer’s disease,” Ageing Research Reviews, p. 102117, 2023.
  219. U. Food, D. Administration et al., “Artificial intelligence and machine learning (ai/ml)-enabled medical devices,” AI/ML-Enabled Medical Devices, 2022.
  220. D. Vela, A. Sharp, R. Zhang, T. Nguyen, A. Hoang, and O. S. Pianykh, “Temporal quality degradation in ai models,” Scientific Reports, vol. 12, no. 1, p. 11654, 2022.
  221. A. Wong, J. Cao, P. G. Lyons, S. Dutta, V. J. Major, E. Ötleş, and K. Singh, “Quantification of sepsis model alerts in 24 us hospitals before and during the covid-19 pandemic,” JAMA Network Open, vol. 4, no. 11, pp. e2 135 286–e2 135 286, 2021.
  222. B. Wang, M. Dohopolski, T. Bai, J. Wu, R. Hannan, N. Desai, A. Garant, D. Yang, D. Nguyen, M.-H. Lin et al., “Performance deterioration of deep learning models after clinical deployment: a case study with auto-segmentation for definitive prostate cancer radiotherapy,” Machine Learning: Science and Technology, vol. 5, no. 2, p. 025077, 2024.
  223. J. Cao, A. Ganesh, J. Cai, R. Southwell, E. M. Perkoff, M. Regan, K. Kann, J. H. Martin, M. Palmer, and S. D’Mello, “A comparative analysis of automatic speech recognition errors in small group classroom discourse,” in Proceedings of the 31st ACM Conference on User Modeling, Adaptation and Personalization, 2023, pp. 250–262.
  224. A. Vaidya, R. J. Chen, D. F. Williamson, A. H. Song, G. Jaume, Y. Yang, T. Hartvigsen, E. C. Dyer, M. Y. Lu, J. Lipkova et al., “Demographic bias in misdiagnosis by computational pathology models,” Nature Medicine, vol. 30, no. 4, pp. 1174–1190, 2024.
  225. I. Straw and C. Callison-Burch, “Artificial intelligence in mental health and the biases of language based models,” PloS one, vol. 15, no. 12, p. e0240376, 2020.
  226. C. Lu, K. Chang, P. Singh, S. Pomerantz, S. Doyle, S. Kakarmath, C. Bridge, and J. Kalpathy-Cramer, “Deploying clinical machine learning? consider the following…” arXiv preprint arXiv:2109.06919, 2021.
  227. S. Yu, X. Wang, and J. C. Príncipe, “Request-and-reverify: Hierarchical hypothesis testing for concept drift detection with expensive labels,” pp. 3033–3039, 2018.
  228. T. Ginart, M. J. Zhang, and J. Zou, “Mldemon: Deployment monitoring for machine learning systems,” in International conference on artificial intelligence and statistics.   PMLR, 2022, pp. 3962–3997.
  229. Y. Ovadia, E. Fertig, J. Ren, Z. Nado, D. Sculley, S. Nowozin, J. Dillon, B. Lakshminarayanan, and J. Snoek, “Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift,” Advances in neural information processing systems, vol. 32, 2019.
  230. L. M. Koch, C. F. Baumgartner, and P. Berens, “Distribution shift detection for the postmarket surveillance of medical ai algorithms: a retrospective simulation study,” NPJ Digital Medicine, vol. 7, no. 1, p. 120, 2024.
  231. W. Ma, C. Chen, S. Zheng, J. Qin, H. Zhang, and Q. Dou, “Test-time adaptation with calibration of medical image classification nets for label distribution shift,” in International Conference on Medical Image Computing and Computer-Assisted Intervention.   Springer, 2022, pp. 313–323.
  232. S. E. Davis, R. A. Greevy Jr, T. A. Lasko, C. G. Walsh, and M. E. Matheny, “Detection of calibration drift in clinical prediction models to inform model updating,” Journal of biomedical informatics, vol. 112, p. 103611, 2020.
  233. A. Leschanowsky and S. Das, “Examining the interplay between privacy and fairness for speech processing: A review and perspective,” in 4th Symposium on Security and Privacy in Speech Communication, 2024, pp. 1–11.
  234. M. P. Gelfer and V. A. Mikos, “The relative contributions of speaking fundamental frequency and formant frequencies to gender identification based on isolated vowels,” Journal of voice, vol. 19, no. 4, pp. 544–554, 2005.
  235. U. Reubold, J. Harrington, and F. Kleber, “Vocal aging effects on f0 and the first formant: A longitudinal analysis in adult speakers,” Speech communication, vol. 52, no. 7-8, pp. 638–651, 2010.
  236. M. Berg, M. Fuchs, K. Wirkner, M. Loeffler, C. Engel, and T. Berger, “The speaking voice in the general population: normative data and associations to sociodemographic and lifestyle factors,” Journal of Voice, vol. 31, no. 2, pp. 257–e13, 2017.
  237. C. Bertelsen, S. Zhou, E. R. Hapner, and M. M. Johns, “Sociodemographic characteristics and treatment response among aging adults with voice disorders in the united states,” JAMA Otolaryngology–Head & Neck Surgery, vol. 144, no. 8, pp. 719–726, 2018.
  238. C. G. Clopper and R. Smiljanic, “Effects of gender and regional dialect on prosodic patterns in american english,” Journal of phonetics, vol. 39, no. 2, pp. 237–245, 2011.
  239. S. Feng, B. M. Halpern, O. Kudina, and O. Scharenborg, “Towards inclusive automatic speech recognition,” Computer Speech & Language, vol. 84, p. 101567, 2024.
  240. W. T. Hutiri and A. Y. Ding, “Bias in automated speaker recognition,” in Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022, pp. 230–247.
  241. A. Kaushal, R. Altman, and C. Langlotz, “Geographic distribution of us cohorts used to train deep learning algorithms,” Jama, vol. 324, no. 12, pp. 1212–1213, 2020.
  242. T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz et al., “Huggingface’s transformers: State-of-the-art natural language processing,” arXiv preprint arXiv:1910.03771, 2019.
  243. K. Li, C. Baird, and D. Lin, “Defend data poisoning attacks on voice authentication,” IEEE Transactions on Dependable and Secure Computing, 2023.
  244. D. Oliynyk, R. Mayer, and A. Rauber, “I know what you trained last summer: A survey on stealing machine learning models and defences,” ACM Computing Surveys, vol. 55, no. 14s, pp. 1–41, 2023.
  245. L. Verde, F. Marulli, and S. Marrone, “Exploring the impact of data poisoning attacks on machine learning model reliability,” Procedia Computer Science, vol. 192, pp. 2624–2632, 2021, knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 25th International Conference KES2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1877050921017695
  246. H. Kwon, Y. Kim, H. Yoon, and D. Choi, “Selective audio adversarial example in evasion attack on speech recognition system,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 526–538, 2019.
  247. P. Żelasko, S. Joshi, Y. Shao, J. Villalba, J. Trmal, N. Dehak, and S. Khudanpur, “Adversarial attacks and defenses for speech recognition systems,” arXiv preprint arXiv:2103.17122, 2021.
  248. K. Pizzi, F. Boenisch, U. Sahin, and K. Böttinger, “Introducing model inversion attacks on automatic speaker recognition,” arXiv preprint arXiv:2301.03206, 2023.
  249. M. A. Shah, J. Szurley, M. Mueller, A. Mouchtaris, and J. Droppo, “Evaluating the Vulnerability of End-to-End Automatic Speech Recognition Models to Membership Inference Attacks,” in Proc. Interspeech, 2021, pp. 891–895.
  250. N. Tomashenko, X. Wang, E. Vincent, J. Patino, B. M. L. Srivastava, P.-G. Noé, A. Nautsch, N. Evans, J. Yamagishi, B. O’Brien, A. Chanclu, J.-F. Bonastre, M. Todisco, and M. Maouche, “The voiceprivacy 2020 challenge: Results and findings,” Computer Speech & Language, vol. 74, p. 101362, 2022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0885230822000080
  251. M. U. Rahman, M. Larson, L. ten Bosch, and C. Tejedor-García, “Scenario of use scheme: Threat modelling for speaker privacy protection in the medical domain,” in 4th Symposium on Security and Privacy in Speech Communication, 2024, pp. 21–25.
  252. S. Ghosh, M. Jouaiti, A. Das, Y. Sinha, T. Polzehl, I. Siegert, and S. Stober, “Anonymising elderly and pathological speech: Voice conversion using ddsp and query-by-example,” in Proc. Interspeech, 2024, pp. 4438–4442.
  253. R. Aloufi, H. Haddadi, and D. Boyle, “Privacy-preserving voice analysis via disentangled representations,” in Proceedings of the 2020 ACM SIGSAC Conference on Cloud Computing Security Workshop, ser. CCSW’20.   New York, NY, USA: Association for Computing Machinery, 2020, p. 1–14. [Online]. Available: https://doi.org/10.1145/3411495.3421355
  254. A. H. Orabi, P. Buddhitha, M. H. Orabi, and D. Inkpen, “Deep learning for depression detection of twitter users,” in Proceedings of the fifth workshop on computational linguistics and clinical psychology: from keyboard to clinic, 2018, pp. 88–97.
  255. U. Petti, S. Baker, A. Korhonen, and J. Robin, “How much speech data is needed for tracking language change in alzheimer’s disease? a comparison of random length, 5-min, and 1-min spontaneous speech samples,” Digital Biomarkers, vol. 7, no. 1, pp. 157–166, 2023.
  256. B. M. Halpern, S. Feng, R. van Son, M. van den Brekel, and O. Scharenborg, “Automatic evaluation of spontaneous oral cancer speech using ratings from naive listeners,” Speech Communication, vol. 149, pp. 84–97, 2023.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.