TY - GEN
T1 - BINAURAL SOUND SOURCE LOCALIZATION USING A HYBRID TIME AND FREQUENCY DOMAIN MODEL
AU - Geva, Gil
AU - Warusfel, Olivier
AU - Dubnov, Shlomo
AU - Dubnov, Tammuz
AU - Amedi, Amir
AU - Hel-Or, Yacov
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024/1/1
Y1 - 2024/1/1
N2 - This paper introduces a new approach to sound source localization using head-related transfer function (HRTF) characteristics, which enable precise full-sphere localization from raw data. While previous research focused primarily on using extensive microphone arrays in the frontal plane, this arrangement often encountered limitations in accuracy and robustness when dealing with smaller microphone arrays. Our model proposes using both time and frequency domain for sound source localization while utilizing Deep Learning (DL) approach. The performance of our proposed model, surpasses the current state-of-the-art results. Specifically, it boasts an average angular error of 0.24° and an average Euclidean distance of 0.01 meters, while the known state-of-the-art gives average angular error of 19.07° and average Euclidean distance of 1.08 meters. This level of accuracy is of paramount importance for a wide range of applications, including robotics, virtual reality, and aiding individuals with cochlear implants (CI).
AB - This paper introduces a new approach to sound source localization using head-related transfer function (HRTF) characteristics, which enable precise full-sphere localization from raw data. While previous research focused primarily on using extensive microphone arrays in the frontal plane, this arrangement often encountered limitations in accuracy and robustness when dealing with smaller microphone arrays. Our model proposes using both time and frequency domain for sound source localization while utilizing Deep Learning (DL) approach. The performance of our proposed model, surpasses the current state-of-the-art results. Specifically, it boasts an average angular error of 0.24° and an average Euclidean distance of 0.01 meters, while the known state-of-the-art gives average angular error of 19.07° and average Euclidean distance of 1.08 meters. This level of accuracy is of paramount importance for a wide range of applications, including robotics, virtual reality, and aiding individuals with cochlear implants (CI).
KW - Binaural
KW - Deep-Learning
KW - Head-related transfer function
KW - Sound source localization
UR - http://www.scopus.com/inward/record.url?scp=85195383099&partnerID=8YFLogxK
U2 - 10.1109/ICASSP48485.2024.10448005
DO - 10.1109/ICASSP48485.2024.10448005
M3 - Conference contribution
AN - SCOPUS:85195383099
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 8821
EP - 8825
BT - 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers
T2 - 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024
Y2 - 14 April 2024 through 19 April 2024
ER -