TY - GEN
T1 - PIXEL-WISE FAILURE PREDICTION FOR SEMANTIC VIDEO SEGMENTATION
AU - Kuhn, Christopher B.
AU - Hofbauer, Markus
AU - Xu, Ziqin
AU - Petrovic, Goran
AU - Steinbach, Eckehard
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - We propose a pixel-accurate failure prediction approach for semantic video segmentation. The proposed scheme improves previously proposed failure prediction methods which so far disregarded the temporal information in videos. Our approach consists of two main steps: First, we train an LSTM-based model to detect spatio-temporal patterns that indicate pixel-wise misclassifications in the current video frame. Second, we use sequences of failure predictions to train a denoising autoencoder that both refines the current failure prediction and predicts future misclassifications. Since public data sets for this scenario are limited, we introduce the large-scale densely annotated video driving (DAVID) data set generated using the CARLA simulator. We evaluate our approach on the real-world Cityscapes data set and the simulator-based DAVID data set. Our experimental results show that spatio-temporal failure prediction outperforms single-image failure prediction by up to 8.8 %. Refining the prediction using a sequence of previous failure predictions further improves the performance by a significant 15.2 % and allows to accurately predict misclassifications for future frames. While we focus our study on driving videos, the proposed approach is general and can be easily used in other scenarios as well.
AB - We propose a pixel-accurate failure prediction approach for semantic video segmentation. The proposed scheme improves previously proposed failure prediction methods which so far disregarded the temporal information in videos. Our approach consists of two main steps: First, we train an LSTM-based model to detect spatio-temporal patterns that indicate pixel-wise misclassifications in the current video frame. Second, we use sequences of failure predictions to train a denoising autoencoder that both refines the current failure prediction and predicts future misclassifications. Since public data sets for this scenario are limited, we introduce the large-scale densely annotated video driving (DAVID) data set generated using the CARLA simulator. We evaluate our approach on the real-world Cityscapes data set and the simulator-based DAVID data set. Our experimental results show that spatio-temporal failure prediction outperforms single-image failure prediction by up to 8.8 %. Refining the prediction using a sequence of previous failure predictions further improves the performance by a significant 15.2 % and allows to accurately predict misclassifications for future frames. While we focus our study on driving videos, the proposed approach is general and can be easily used in other scenarios as well.
KW - Failure prediction
KW - Introspection
KW - Recurrent neural network
KW - Semantic segmentation
UR - http://www.scopus.com/inward/record.url?scp=85117809481&partnerID=8YFLogxK
U2 - 10.1109/ICIP42928.2021.9506552
DO - 10.1109/ICIP42928.2021.9506552
M3 - Conference contribution
AN - SCOPUS:85117809481
T3 - Proceedings - International Conference on Image Processing, ICIP
SP - 614
EP - 618
BT - 2021 IEEE International Conference on Image Processing, ICIP 2021 - Proceedings
PB - IEEE Computer Society
T2 - 2021 IEEE International Conference on Image Processing, ICIP 2021
Y2 - 19 September 2021 through 22 September 2021
ER -