Group Conversations in Noisy Environments (GiN) - Multimedia Recordings for Location-Aware Speech Enhancement

Emilie D'Olne, Alastair H. Moore, Patrick A. Naylor, Jacob Donley, Vladimir Tourbabin, Thomas Lunner

Research output: Contribution to journalArticlepeer-review

Abstract

Recent years have seen a growing interest in the use of smart glasses mounted with microphones to solve the cocktail party problem using beamforming techniques or machine learning. Many such approaches could bring substantial advances in hearing aid or Augmented Reality (AR) research. To validate these methods, the EasyCom [Donley et al., 2021] dataset introduced high-quality multi-modal recordings of conversations in noise, including egocentric multi-channel microphone array audio, speech source pose, and headset microphone audio. While providing comprehensive data, EasyCom lacks diversity in the acoustic environments considered and the degree of overlapping speech in conversations. This work therefore presents the Group in Noise (GiN) dataset of over 2 hours of group conversations in noisy environments recorded using binaural microphones and a pair of glasses mounted with 5 microphones. The recordings took place in 3 rooms and contain 6 seated participants as well as a standing facilitator. The data also include close-talking microphone audio and head-pose data for each speaker, an audio channel from a fixed reference microphone, and automatically annotated speaker activity information. A baseline method is used to demonstrate the use of the data for speech enhancement. The dataset is publicly available in d'Olne et al. [2023].

Original languageEnglish
Pages (from-to)374-382
Number of pages9
JournalIEEE Open Journal of Signal Processing
Volume5
DOIs
StatePublished - 1 Jan 2024
Externally publishedYes

Keywords

  • Augmented reality (AR)
  • cocktail party
  • dataset
  • head-worn array
  • speech enhancement

ASJC Scopus subject areas

  • Signal Processing

Fingerprint

Dive into the research topics of 'Group Conversations in Noisy Environments (GiN) - Multimedia Recordings for Location-Aware Speech Enhancement'. Together they form a unique fingerprint.

Cite this