To main content

Advancing listening effort-based evaluation of speech enhancement systems

Abstract

This study investigates response time as a behavioral indicator related to listening effort (LE) for evaluating speech enhancement (SE) systems. English and Norwegian intelligibility matrix tests were conducted within a single-task paradigm that incorporated click-time recording (logging the precise time of all participant clicks), enabling simultaneous estimation of speech intelligibility and LE-related temporal behavior. Three temporal proxy measures for LE were examined — time per stimulus, reaction time, and word click time — across a broad range of input signal-to-noise ratios (SNRs) and for both discriminative and generative enhancement approaches. Time per stimulus showed an inverted-U pattern across SNRs, whereas reaction time and word click time exhibited monotonic behavior, providing more directly interpretable metrics for comparative evaluation. Analyses of pairwise SNR comparisons revealed that increases in our LE-related temporal measures at higher SNRs precede measurable intelligibility declines, suggesting that these temporal metrics can be more sensitive than intelligibility in this regime. Overall, the proposed framework — where LE-related measurements remain unknown to participants — offers a comprehensive and nuanced behavioral tool for SE evaluation, complementing intelligibility particularly under realistic, moderate-to-high SNR conditions.

Category

Academic article

Language

English

Author(s)

Affiliation

  • SINTEF Digital / Sustainable Communication Technologies
  • Trinity College Dublin
  • University of Granada
  • Norwegian University of Science and Technology

Date

08.05.2026

Year

2026

Published in

Computer Speech and Language

ISSN

0885-2308

Volume

101

Page(s)

101996 - 101996

View this publication at Norwegian Research Information Repository