Neural-Network-Based Direction-of-Arrival Estimation for Reverberant Speech - The Importance of Energetic, Temporal, and Spatial Information

Orel Ben Zaken, Anurag Kumar, Vladimir Tourbabin, Boaz Rafaely

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Direction-of-arrival (DOA) estimation is a fundamental task in audio signal processing that becomes difficult in real-world environments due to the presence of reverberation. To address this difficulty, Direct-Path Dominance (DPD) tests have been proposed as an effective approach for detecting time-frequency (TF) bins dominated by direct sound, which contain accurate DOA information. These have been found to be particularly efficient when working with spherical arrays. While methods based on neural networks (NNs) have been developed to estimate the DOA, they have limitations such as the need for a large training database, and often understanding of the system's operation is lacking. This work proposes two novel DPD-test methods based on a model-based deep learning approach that combines the original DPD-test model with a data-driven system. Thus, it is possible to preserve the robustness of the original DPD-test across acoustic environments, while using a data-driven approach to better extract useful information about the direct sound, thereby enhancing the original method's performance. In particular, the paper investigates how energetic, temporal and spatial information contribute to the identification of TF-bins dominated by the direct signal. The proposed methods are trained on simulated data of a single sound source in a room, and evaluated on simulated and real data. The results show that energetic and temporal information provide new information about direct sound, which has not been considered in previous works and can improve its performance.

Original languageEnglish
Pages (from-to)1298-1309
Number of pages12
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume32
DOIs
StatePublished - 1 Jan 2024

Keywords

  • Speaker localization
  • deep learning
  • direction-of-arrival (DOA)
  • long short-term memory (LSTM)
  • machine learning
  • multilayer perceptron (MLP)
  • neural network (NN)
  • spherical arrays

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Acoustics and Ultrasonics
  • Computational Mathematics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Neural-Network-Based Direction-of-Arrival Estimation for Reverberant Speech - The Importance of Energetic, Temporal, and Spatial Information'. Together they form a unique fingerprint.

Cite this