Classification of the room volume from reverberant speech signals can be useful in acoustic scene analysis applications such as forensic audio, rescue and security. In the past, such work explicitly required knowledge of the room impulse response (RIR) to either estimate or classify the room volume. In this work, different approaches to extract room-volume features from speech rather than from the RIR are investigated. These include using abrupt stops in speech, dereverberation, extraction of room-volume features directly from speech and using speech recognition features, such as mel frequency cepstral coefficients (MFCC). The room volume is classified using a pattern recognition-based system in which the room-volume features are employed using a feature selection algorithm. Three experimental studies using (1) speech convolved with simulated RIRs, (2) speech convolved with measured RIRs and (3) recorded speech in reverberant environments are presented. It has been shown for all experimental studies that the approach of using abrupt stops in speech outperforms all other approaches. It has also been shown that the classification of room volume is affected mainly by room-volume features that are calculated from the low frequency bands of the room transfer function, in addition to reverberation time or early decay time.
ASJC Scopus subject areas
- Acoustics and Ultrasonics