TY - GEN
T1 - TOWARDS IMPROVING HARMONIC SENSITIVITY AND PREDICTION STABILITY FOR SINGING MELODY EXTRACTION
AU - Shao, Keren
AU - Chen, Ke
AU - Berg-Kirkpatrick, Taylor
AU - Dubnov, Shlomo
N1 - Publisher Copyright:
© K. Shao, K. Chen, T. Berg-Kirkpatrick, S. Dubnov.
PY - 2023/1/1
Y1 - 2023/1/1
N2 - In deep learning research, many melody extraction models rely on redesigning neural network architectures to improve performance. In this paper, we propose an input feature modification and a training objective modification based on two assumptions. First, harmonics in the spectrograms of audio data decay rapidly along the frequency axis. To enhance the model's sensitivity on the trailing harmonics, we modify the Combined Frequency and Periodicity (CFP) representation using discrete z-transform. Second, the vocal and non-vocal segments with extremely short duration are uncommon. To ensure a more stable melody contour, we design a differentiable loss function that prevents the model from predicting such segments. We apply these modifications to several models, including MSNet, FTANet, and a newly introduced model, PianoNet, modified from a piano transcription network. Our experimental results demonstrate that the proposed modifications are empirically effective for singing melody extraction.
AB - In deep learning research, many melody extraction models rely on redesigning neural network architectures to improve performance. In this paper, we propose an input feature modification and a training objective modification based on two assumptions. First, harmonics in the spectrograms of audio data decay rapidly along the frequency axis. To enhance the model's sensitivity on the trailing harmonics, we modify the Combined Frequency and Periodicity (CFP) representation using discrete z-transform. Second, the vocal and non-vocal segments with extremely short duration are uncommon. To ensure a more stable melody contour, we design a differentiable loss function that prevents the model from predicting such segments. We apply these modifications to several models, including MSNet, FTANet, and a newly introduced model, PianoNet, modified from a piano transcription network. Our experimental results demonstrate that the proposed modifications are empirically effective for singing melody extraction.
UR - http://www.scopus.com/inward/record.url?scp=85193430090&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85193430090
T3 - 24th International Society for Music Information Retrieval Conference, ISMIR 2023 - Proceedings
SP - 657
EP - 663
BT - 24th International Society for Music Information Retrieval Conference, ISMIR 2023 - Proceedings
A2 - Sarti, Augusto
A2 - Antonacci, Fabio
A2 - Sandler, Mark
A2 - Bestagini, Paolo
A2 - Dixon, Simon
A2 - Liang, Beici
A2 - Richard, Gael
A2 - Pauwels, Johan
PB - International Society for Music Information Retrieval
T2 - 24th International Society for Music Information Retrieval Conference, ISMIR 2023
Y2 - 5 November 2023 through 9 November 2023
ER -