Skip to main navigation Skip to search Skip to main content

Loss functions incorporating auditory spatial perception in deep learning – a review

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Binaural reproduction aims to deliver immersive spatial audio with high perceptual realism over headphones. Loss functions play a central role in optimizing and evaluating algorithms that generate binaural signals. However, traditional signal-related difference measures often fail to capture the perceptual properties that are essential to spatial audio quality. This review paper surveys recent loss functions that incorporate spatial perception cues relevant to binaural reproduction. It focuses on losses applied to binaural signals, which are often derived from microphone recordings or Ambisonics signals, while excluding those based on room impulse responses. Guided by the Spatial Audio Quality Inventory (SAQI), the review emphasizes perceptual dimensions related to source localization and room response, while excluding general spectral–temporal attributes. The literature survey reveals a strong focus on localization cues, such as interaural time and level differences (ITDs, ILDs), while reverberation and other room acoustic attributes remain less explored in loss function design. Recent works that estimate room acoustic parameters and develop embeddings that capture room characteristics indicate their potential for future integration into neural network training. The paper concludes by highlighting future research directions toward more perceptually grounded loss functions that better capture the listener's spatial experience.

Original languageEnglish
Title of host publication2025 Immersive and 3D Audio
Subtitle of host publicationfrom Architecture to Automotive, I3DA 2025
PublisherInstitute of Electrical and Electronics Engineers
ISBN (Electronic)9798331558284
DOIs
StatePublished - 1 Jan 2025
Event2025 Immersive and 3D Audio: from Architecture to Automotive, I3DA 2025 - Bologna, Italy
Duration: 10 Sep 202512 Sep 2025

Publication series

Name2025 Immersive and 3D Audio: from Architecture to Automotive, I3DA 2025

Conference

Conference2025 Immersive and 3D Audio: from Architecture to Automotive, I3DA 2025
Country/TerritoryItaly
CityBologna
Period10/09/2512/09/25

Keywords

  • Spatial audio
  • audio signal processing
  • deep learning
  • machine learning
  • perceptual loss
  • spatial perception

ASJC Scopus subject areas

  • Automotive Engineering
  • Media Technology
  • Architecture
  • Acoustics and Ultrasonics
  • Instrumentation

Fingerprint

Dive into the research topics of 'Loss functions incorporating auditory spatial perception in deep learning – a review'. Together they form a unique fingerprint.

Cite this