Abstract
Background
Automated measurements in cardiac imaging with the use of deep learning (DL) is a highly active area of research and innovation. However, some concerns challenge the translation of DL methods from research to clinical implementation.
Objectives
The authors evaluated 3 challenges for cardiac measurements by DL using left ventricular ejection fraction (LVEF) for management of heart failure and discuss mitigation strategies.
Methods
Using 3 different populations (N = 3,538), automated LVEF measurements were obtained with the use of supervised end-to-end learning and analyzed in terms of HF management. Three common challenges related to evaluation metrics, training data, and model generalization were studied.
Results
For the evaluation challenge, the authors identified significant unreliability of the AUC when applied to dichotomized heart failure diagnosis. Specifically, AUC varied from 0.71 to 0.98 owing solely to changes in population characteristics. For the training data challenge, model performance could be enhanced even after reducing the number of training subjects by 40%. For the generalization challenge, a performance degradation was observed compared with internal data when testing the model on external data. Integrating medical imaging domain knowledge in the DL framework effectively helped to recover performance and improve generalizability.
Conclusions
Both training data and generalization aspects challenge the performance of DL algorithms for automated cardiac measurements. In addition, evaluation metrics challenge the ability to detect underperforming algorithms. By considering evaluation metrics and training data distribution, and incorporating imaging domain knowledge, the design and evaluation of DL models can be improved, leading to more robust models, improved interpretation, and easier comparison across data sets. These findings may guide researchers and clinicians in implementing DL models for cardiovascular imaging.