Abstract
This study investigates response time as a behavioral indicator related to listening effort (LE) for evaluating speech enhancement (SE) systems. English and Norwegian intelligibility matrix tests were conducted within a single-task paradigm that incorporated click-time recording (logging the precise time of all participant clicks), enabling simultaneous estimation of speech intelligibility and LE-related temporal behavior. Three temporal proxy measures for LE were examined — time per stimulus, reaction time, and word click time — across a broad range of input signal-to-noise ratios (SNRs) and for both discriminative and generative enhancement approaches. Time per stimulus showed an inverted-U pattern across SNRs, whereas reaction time and word click time exhibited monotonic behavior, providing more directly interpretable metrics for comparative evaluation. Analyses of pairwise SNR comparisons revealed that increases in our LE-related temporal measures at higher SNRs precede measurable intelligibility declines, suggesting that these temporal metrics can be more sensitive than intelligibility in this regime. Overall, the proposed framework — where LE-related measurements remain unknown to participants — offers a comprehensive and nuanced behavioral tool for SE evaluation, complementing intelligibility particularly under realistic, moderate-to-high SNR conditions.