TY - GEN
T1 - Diabetes60 - Inferring Bread Units From Food Images Using Fully Convolutional Neural Networks
AU - Christ, Patrick Ferdinand
AU - Schlecht, Sebastian
AU - Ettlinger, Florian
AU - Grün, Felix
AU - Heinle, Christoph
AU - Tatavatry, Sunil
AU - Ahmadi, Seyed Ahmad
AU - Diepold, Klaus
AU - Menze, Bjoern H.
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2018/1/19
Y1 - 2018/1/19
N2 - In this paper we propose a challenging new computer vision task of inferring Bread Units (BUs) from food images. Assessing nutritional information and nutrient volume from a meal is an important task for diabetes patients. At the moment, diabetes patients learn the assessment of BUs on a scale of one to ten, by learning correspondence of BU and meals from textbooks. We introduce a large scale data set of around 9k different RGB-D images of 60 western dishes acquired using a Microsoft Kinect v2 sensor. We recruited 20 diabetes patients to give expert assessments of BU values to each dish based on several images. For this task, we set a challenging baseline using state-of-the-art CNNs and evaluated it against the performance of human annotators. In our work we present a CNN architecture to infer the depth from RGB-only food images to be used in BU regression such that the pipeline can operate on RGB data only and compare its performance to RGB-D input data. We show that our inferred depth maps from RGB images can replace RGB-D input data at high significance for the BU regression task. In its best configuration, our proposed method achieves a RMSE of 1.53 BUs using RGB and inferred depth. Considering the variability among the raters themselves of RMSE = 0.89, we can show that our baseline method with depth prediction can extract reasonable nutritional information from RGB image data only.
AB - In this paper we propose a challenging new computer vision task of inferring Bread Units (BUs) from food images. Assessing nutritional information and nutrient volume from a meal is an important task for diabetes patients. At the moment, diabetes patients learn the assessment of BUs on a scale of one to ten, by learning correspondence of BU and meals from textbooks. We introduce a large scale data set of around 9k different RGB-D images of 60 western dishes acquired using a Microsoft Kinect v2 sensor. We recruited 20 diabetes patients to give expert assessments of BU values to each dish based on several images. For this task, we set a challenging baseline using state-of-the-art CNNs and evaluated it against the performance of human annotators. In our work we present a CNN architecture to infer the depth from RGB-only food images to be used in BU regression such that the pipeline can operate on RGB data only and compare its performance to RGB-D input data. We show that our inferred depth maps from RGB images can replace RGB-D input data at high significance for the BU regression task. In its best configuration, our proposed method achieves a RMSE of 1.53 BUs using RGB and inferred depth. Considering the variability among the raters themselves of RMSE = 0.89, we can show that our baseline method with depth prediction can extract reasonable nutritional information from RGB image data only.
UR - https://www.scopus.com/pages/publications/85046254012
U2 - 10.1109/ICCVW.2017.180
DO - 10.1109/ICCVW.2017.180
M3 - Conference contribution
AN - SCOPUS:85046254012
T3 - Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017
SP - 1526
EP - 1535
BT - Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 16th IEEE International Conference on Computer Vision Workshops, ICCVW 2017
Y2 - 22 October 2017 through 29 October 2017
ER -