TY - JOUR
T1 - Is it all a cluster game? - Exploring Out-of-Distribution Detection based on Clustering in the Embedding Space
AU - Sinhamahapatra, Poulami
AU - Koner, Rajat
AU - Roscher, Karsten
AU - Günnemann, Stephan
N1 - Publisher Copyright:
Copyright © 2022for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
PY - 2022
Y1 - 2022
N2 - It is essential for safety-critical applications of deep neural networks to determine when new inputs are significantly different from the training distribution. In this paper, we explore this out-of-distribution (OOD) detection problem for image classification using clusters of semantically similar embeddings of the training data and exploit the differences in distance relationships to these clusters between in- and out-of-distribution data. We study the structure and separation of clusters in the embedding space and find that the supervised contrastive learning leads to well separated clusters while its self-supervised counterpart fails to do so. In our extensive analysis of different training methods, clustering strategies, distance metrics and thresholding approaches, we observe that there is no clear winner. The optimal approach depends on the model architecture and selected datasets for in- and out-of-distribution. While we could reproduce the outstanding results for contrastive training on CIFAR-10 as in-distribution data, we find standard cross-entropy paired with cosine similarity outperforms all contrastive training methods when training on CIFAR-100 instead. Cross-entropy provides competitive results as compared to expensive contrastive training methods.
AB - It is essential for safety-critical applications of deep neural networks to determine when new inputs are significantly different from the training distribution. In this paper, we explore this out-of-distribution (OOD) detection problem for image classification using clusters of semantically similar embeddings of the training data and exploit the differences in distance relationships to these clusters between in- and out-of-distribution data. We study the structure and separation of clusters in the embedding space and find that the supervised contrastive learning leads to well separated clusters while its self-supervised counterpart fails to do so. In our extensive analysis of different training methods, clustering strategies, distance metrics and thresholding approaches, we observe that there is no clear winner. The optimal approach depends on the model architecture and selected datasets for in- and out-of-distribution. While we could reproduce the outstanding results for contrastive training on CIFAR-10 as in-distribution data, we find standard cross-entropy paired with cosine similarity outperforms all contrastive training methods when training on CIFAR-100 instead. Cross-entropy provides competitive results as compared to expensive contrastive training methods.
UR - http://www.scopus.com/inward/record.url?scp=85125401996&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85125401996
SN - 1613-0073
VL - 3087
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 2022 Workshop on Artificial Intelligence Safety, SafeAI 2022
Y2 - 28 February 2022
ER -