On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks

Abstract

Speech enhancement (SE) systems aim to improve the quality and intelligibility of degraded speech signals obtained from far-field microphones. Subjective evaluation of the intelligibility performance of these SE systems is uncommon. Instead, objective intelligibility measures (OIMs) are generally used to predict subjective performance increases. Many recent deep learning (DL) based SE systems, are expected to improve the intelligibility of degraded speech as measured by OIMs. However, validation of the ability of these OIMs to predict subjective intelligibility when enhancing a speech signal using DL-based systems is lacking. Therefore, in this study, we evaluate the predictive performance of five popular OIMs. We compare the metrics' predictions with subjective results. For this purpose, we recruited 50 human listeners, and subjectively tested both single channel and multi-channel Deep Complex Convolutional Recurrent Network (DCCRN) based speech enhancement systems. We found that none of the OIMs gave reliable predictions, and that all OIMs overestimated the intelligibility of ‘enhanced’ speech signals.

Read the publication

Language

English

Author(s)

Femke B. Gelderblom
Tron Vedul Tronstad
Torbjørn Karl Svendsen
Tor Andre Myrvoll

Affiliation

SINTEF Digital / Sustainable Communication Technologies
Norwegian University of Science and Technology

Year

2023

Published in

IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

ISSN

2329-9290

Volume

Page(s)

215 - 226

DOI

https://doi.org/10.1109/taslp.2023.3329378

Read fulltext

https://hdl.handle.net/11250/3136237

View this publication at Norwegian Research Information Repository

Contact us

Our services

Career

Sustainability

Management and board

Institutes

Other units

About us

Follow us