TY - JOUR
T1 - Analysis of loss functions for fast single-class classification
AU - Keren, Gil
AU - Sabato, Sivan
AU - Schuller, Björn
N1 - Publisher Copyright:
© 2019, Springer-Verlag London Ltd., part of Springer Nature.
PY - 2020/1/1
Y1 - 2020/1/1
N2 - We consider neural network training, in applications in which there are many possible classes, but at test time, the task is a binary classification task of determining whether the given example belongs to a specific class. We define the single logit classification (SLC) task: training the network so that at test time, it would be possible to accurately identify whether the example belongs to a given class in a computationally efficient manner, based only on the output logit for this class. We propose a natural principle, the Principle of Logit Separation, as a guideline for choosing and designing losses suitable for the SLC task. We show that the cross-entropy loss function is not aligned with the Principle of Logit Separation. In contrast, there are known loss functions, as well as novel batch loss functions that we propose, which are aligned with this principle. Our experiments show that indeed in almost all cases, losses that are aligned with the Principle of Logit Separation obtain at least 20% relative accuracy improvement in the SLC task compared to losses that are not aligned with it, and sometimes considerably more. Furthermore, we show that fast SLC does not cause any drop in binary classification accuracy, compared to standard classification in which all logits are computed, and yields a speedup which grows with the number of classes.
AB - We consider neural network training, in applications in which there are many possible classes, but at test time, the task is a binary classification task of determining whether the given example belongs to a specific class. We define the single logit classification (SLC) task: training the network so that at test time, it would be possible to accurately identify whether the example belongs to a given class in a computationally efficient manner, based only on the output logit for this class. We propose a natural principle, the Principle of Logit Separation, as a guideline for choosing and designing losses suitable for the SLC task. We show that the cross-entropy loss function is not aligned with the Principle of Logit Separation. In contrast, there are known loss functions, as well as novel batch loss functions that we propose, which are aligned with this principle. Our experiments show that indeed in almost all cases, losses that are aligned with the Principle of Logit Separation obtain at least 20% relative accuracy improvement in the SLC task compared to losses that are not aligned with it, and sometimes considerably more. Furthermore, we show that fast SLC does not cause any drop in binary classification accuracy, compared to standard classification in which all logits are computed, and yields a speedup which grows with the number of classes.
KW - Classification
KW - Extreme classification
KW - Neural networks
UR - http://www.scopus.com/inward/record.url?scp=85071113414&partnerID=8YFLogxK
U2 - 10.1007/s10115-019-01395-6
DO - 10.1007/s10115-019-01395-6
M3 - Article
AN - SCOPUS:85071113414
SN - 0219-1377
VL - 62
SP - 337
EP - 358
JO - Knowledge and Information Systems
JF - Knowledge and Information Systems
IS - 1
ER -