To main content

SafetyCage: A misclassification detector for feed-forward neural networks

Abstract

Deep learning classifiers have reached state-of-the-art performance in many fields, particularly so image classification. Wrong class assignment by the classifiers can often be inconsequential when distinguishing pictures of cats and dogs, but in more critical operations like autonomous driving vehicles or process control in industry, wrong classifications can lead to disastrous events. While reducing the error rate of the classifier is of primary importance, it is impossible to completely remove it. Having a system that is able to flag wrong or suspicious classifications is therefore a necessary component for safety and robustness in operations. In this work, we present a general statistical inference framework for detection of misclassifications. We test our approach on two well-known benchmark datasets: MNIST and CIFAR-10. We show that, given the underlying classifier is well trained, SafetyCage is effective at flagging wrong classifications. We also include a detailed discussion of the drawbacks, and what can be done to improve the approach.

Category

Academic article

Client

  • Research Council of Norway (RCN) / 304843

Language

English

Affiliation

  • SINTEF Digital / Mathematics and Cybernetics

Year

2024

Published in

Proceedings of Machine Learning Research (PMLR)

ISSN

2640-3498

Volume

233

Page(s)

113 - 119

View this publication at Cristin