Deep HRTF Encoding & Interpolation: Exploring Spatial Correlations using Convolutional Neural Networks

Devansh Zurale, Shahrokh Yadegari, Shlomo Dubnov

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

With the advancement in Deep Learning technologies, computers today are able to achieve unimaginable success in several domains involving images and audio. One such area in 3D audio where the applications of deep learning can be promising is in binaural sound localization for headphones, which requires individualized and accurate representations of the filtering effects of the anthropometric measurements of a listening body. Such filters often are stored as a set of Head Related Impulse Responses (HRIRs) or in their frequency domain representations, Head Related Transfer Functions (HRTFs), for specific individuals. A challenge in applying deep learning networks in this area is the lack of availability of vast numbers of complete and accurate HRTF datasets, which is known to cause networks to easily over-fit to the training data. As opposed to images, where the correlations between pixels are more statistical, the correlations that HRTFs share in space are expected to be more a function of the body and pinna reflections. We hypothesize that these spatial correlations between the elements of an HRTF set could be learned using Deep Convolutional Neural Networks (DCNNs). In this work, we first present a CNN-based auto-encoding strategy for HRTF encoding and then we use the learned auto-encoder to provide an alternate solution for the interpolation of HRTFs from a sparse distribution of HRTFs in space. We thereby conclude that DCNNs are capable of achieving results that are comparable to other non deep learning based approaches, in spite of using only a few tens of data points.

Original languageEnglish
Title of host publicationSMC/JIM/IFC 2022 - Proceedings of the 19th Sound and Music Computing Conference
EditorsRomain Michon, Laurent Pottier, Yann Orlarey
PublisherSound and Music Computing Network
Pages350-357
Number of pages8
ISBN (Electronic)9782958412609
StatePublished - 1 Jan 2022
Externally publishedYes
Event19th Sound and Music Computing Conference, SMC 2022 - Saint-Etienne, France
Duration: 5 Jun 202212 Jun 2022

Publication series

NameProceedings of the Sound and Music Computing Conferences
ISSN (Electronic)2518-3672

Conference

Conference19th Sound and Music Computing Conference, SMC 2022
Country/TerritoryFrance
CitySaint-Etienne
Period5/06/2212/06/22

ASJC Scopus subject areas

  • Music
  • Computer Science Applications
  • Media Technology

Fingerprint

Dive into the research topics of 'Deep HRTF Encoding & Interpolation: Exploring Spatial Correlations using Convolutional Neural Networks'. Together they form a unique fingerprint.

Cite this