TY - JOUR
T1 - Categorising the world into local climate zones
T2 - towards quantifying labelling uncertainty for machine learning models
AU - Hechinger, Katharina
AU - Zhu, Xiao Xiang
AU - Kauermann, Göran
N1 - Publisher Copyright:
© The Royal Statistical Society 2023. All rights reserved.
PY - 2024/1
Y1 - 2024/1
N2 - Image classification is often prone to labelling uncertainty. To generate suitable training data, images are labelled according to evaluations of human experts. This can result in ambiguities, which will affect subsequent models. In this work, we aim to model the labelling uncertainty in the context of remote sensing and the classification of satellite images. We construct a multinomial mixture model given the evaluations of multiple experts. This is based on the assumption that there is no ambiguity of the image class, but apparently in the experts’ opinion about it. The model parameters can be estimated by a stochastic expectation maximisation algorithm. Analysing the estimates gives insights into sources of label uncertainty. Here, we focus on the general class ambiguity, the heterogeneity of experts, and the origin city of the images. The results are relevant for all machine learning applications where image classification is pursued and labelling is subject to humans.
AB - Image classification is often prone to labelling uncertainty. To generate suitable training data, images are labelled according to evaluations of human experts. This can result in ambiguities, which will affect subsequent models. In this work, we aim to model the labelling uncertainty in the context of remote sensing and the classification of satellite images. We construct a multinomial mixture model given the evaluations of multiple experts. This is based on the assumption that there is no ambiguity of the image class, but apparently in the experts’ opinion about it. The model parameters can be estimated by a stochastic expectation maximisation algorithm. Analysing the estimates gives insights into sources of label uncertainty. Here, we focus on the general class ambiguity, the heterogeneity of experts, and the origin city of the images. The results are relevant for all machine learning applications where image classification is pursued and labelling is subject to humans.
KW - expert evaluations
KW - labelling uncertainty
KW - mixture models
KW - multiple labellers
KW - stochastic expectation maximisation
UR - http://www.scopus.com/inward/record.url?scp=85182707756&partnerID=8YFLogxK
U2 - 10.1093/jrsssc/qlad089
DO - 10.1093/jrsssc/qlad089
M3 - Article
AN - SCOPUS:85182707756
SN - 0035-9254
VL - 73
SP - 143
EP - 161
JO - Journal of the Royal Statistical Society. Series C: Applied Statistics
JF - Journal of the Royal Statistical Society. Series C: Applied Statistics
IS - 1
ER -