Abstract
In this work, we compared two different input approaches to estimate autism severity using speech signals. We analyzed 127 audio recordings of young children obtained during the Autism Diagnostic Observation Schedule 2nd edition (ADOS-2) administration. Two different sets of features were extracted from each recording: 1) hand-crafted features, which included acoustic and prosodic features, and 2) log-mel spectrograms, which give the time-frequency representation. We examined two different Convolutional Neural Network (CNN) architectures for each of the two inputs and compared the autism severity estimation performance. We showed that the hand-crafted features yielded lower prediction error (normalized RMSE) in most examined configurations than the log-mel spectrograms. Moreover, fusing the estimated autism severity scores of the two feature extraction methods yielded the best results, where both architectures exhibited similar performance (Pearson R=0.66, normalized RMSE=0.24).
Original language | English |
---|---|
Pages (from-to) | 4154-4158 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Volume | 2023-August |
DOIs | |
State | Published - 1 Jan 2023 |
Event | 24th International Speech Communication Association, Interspeech 2023 - Dublin, Ireland Duration: 20 Aug 2023 → 24 Aug 2023 |
Keywords
- ADOS
- CNN
- audio
- autism
- features
- severity estimation
- spectrogram
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modeling and Simulation