Abstract
High fidelity spatial audio often performs better when produced using a personalized head-related transfer function (HRTF). However, the direct acquisition of HRTFs is cumbersome and requires specialized equipment. Thus, many personalization methods estimate HRTF features from easily obtained anthropometric features of the pinna, head, and torso. The first HRTF notch frequency (N1) is known to be a dominant feature in elevation localization, and thus a useful feature for HRTF personalization. This paper describes the prediction of N1 frequency from pinna anthropometry using a neural model. Prediction is performed separately on three databases, both simulated and measured, and then by domain mixing in-between the databases. The model successfully predicts N1 frequency for individual databases and by domain mixing between some databases. Prediction errors are better or comparable to those previously reported, showing significant improvement when acquired over a large database and with a larger output range.
Original language | English |
---|---|
Title of host publication | ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Publisher | IEEE DataPort |
Pages | 816-820 |
Number of pages | 5 |
ISBN (Print) | 979-8-3503-4486-8 |
DOIs | |
State | Published - 19 Apr 2024 |
Event | ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - Seoul, Korea, Republic of Duration: 14 Apr 2024 → 19 Apr 2024 |
Conference
Conference | ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
---|---|
Period | 14/04/24 → 19/04/24 |
Keywords
- Torso
- Location awareness
- Databases
- Spatial audio
- Neural networks
- Transfer functions
- Predictive models