Systematic outperformance of 112 dermatologists in multiclass skin cancer image classification by convolutional neural networks

Roman C. Maron, Michael Weichenthal, Jochen S. Utikal, A. Hekler, Carola Berking, Axel Hauschild, Alexander H. Enk, Sebastian Haferkamp, Joachim Klode, Dirk Schadendorf, Philipp Jansen, Tim Holland-Letz, Bastian Schilling, Christof von Kalle, Stefan Fröhling, Maria R. Gaiser, Daniela Hartmann, Anja Gesierich, Katharina C. Kähler, Ulrike WehkampAnte Karoglan, Claudia Bär, Titus J. Brinker, Laurenz Schmitt, Wiebke K. Peitsch, Friederike Hoffmann, Jürgen C. Becker, Christina Drusio, G. Lodde, Stefanie Sammet, Wiebke Sondermann, S. Ugurel, Jeannine Zader, Alexander Enk, Martin Salzmann, S. Schäfer, Knut Schäkel, J. Winkler, Priscilla Wölbing, Hiba Asper, Ann Sophie Bohne, Victoria Brown, Bianca Burba, Sophia Deffaa, Cecilia Dietrich, Matthias Dietrich, Katharina Antonia Drerup, Friederike Egberts, Anna Sophie Erkens, S. Greven, V. Harde, Marion Jost, M. Kaeding, Katharina Kosova, Stephan Lischner, M. Maagk, Anna Laetitia Messinger, M. Metzner, Rogina Motamedi, Ann Christine Rosenthal, Ulrich Seidl, Jana Stemmermann, Kaspar Torz, Juliana Giraldo Velez, Jennifer Haiduk, Mareike Alter, Paul Bergenthal, Anne Gerlach, Christian Holtorf, Sophie Kindermann, L. Kraas, Moritz Felcht, Claus Detlev Klemke, Hjalmar Kurzen, Thomas Leibing, Verena Müller, Raphael R. Reinhard, Jochen Utikal, Franziska Winter, Laurie Eicher, Markus Heppt, Katharina Kilian, Sebastian Krammer, D. Lill, Anne Charlotte Niesert, Eva Oppel, Elke Sattler, S. Senner, Jens Wallmichrath, Hans Wolff, Tina Giner, Valerie Glutsch, Andreas Kerstan, Dagmar Presser, Philipp Schrüfer, Patrick Schummer, Ina Stolze, Judith Weber, Konstantin Drexler, Marion Mickler, Camila Toledo Stauner, Alexander Thiem

Research output: Contribution to journalArticlepeer-review

157 Scopus citations

Abstract

Background: Recently, convolutional neural networks (CNNs) systematically outperformed dermatologists in distinguishing dermoscopic melanoma and nevi images. However, such a binary classification does not reflect the clinical reality of skin cancer screenings in which multiple diagnoses need to be taken into account. Methods: Using 11,444 dermoscopic images, which covered dermatologic diagnoses comprising the majority of commonly pigmented skin lesions commonly faced in skin cancer screenings, a CNN was trained through novel deep learning techniques. A test set of 300 biopsy-verified images was used to compare the classifier's performance with that of 112 dermatologists from 13 German university hospitals. The primary end-point was the correct classification of the different lesions into benign and malignant. The secondary end-point was the correct classification of the images into one of the five diagnostic categories. Findings: Sensitivity and specificity of dermatologists for the primary end-point were 74.4% (95% confidence interval [CI]: 67.0–81.8%) and 59.8% (95% CI: 49.8–69.8%), respectively. At equal sensitivity, the algorithm achieved a specificity of 91.3% (95% CI: 85.5–97.1%). For the secondary end-point, the mean sensitivity and specificity of the dermatologists were at 56.5% (95% CI: 42.8–70.2%) and 89.2% (95% CI: 85.0–93.3%), respectively. At equal sensitivity, the algorithm achieved a specificity of 98.8%. Two-sided McNemar tests revealed significance for the primary end-point (p < 0.001). For the secondary end-point, outperformance (p < 0.001) was achieved except for basal cell carcinoma (on-par performance). Interpretation: Our findings show that automated classification of dermoscopic melanoma and nevi images is extendable to a multiclass classification problem, thus better reflecting clinical differential diagnoses, while still outperforming dermatologists at a significant level (p < 0.001).

Original languageEnglish
Pages (from-to)57-65
Number of pages9
JournalEuropean Journal of Cancer
Volume119
DOIs
StatePublished - Sep 2019

Keywords

  • Artificial intelligence
  • Melanoma
  • Skin cancer
  • Skin cancer screening

Fingerprint

Dive into the research topics of 'Systematic outperformance of 112 dermatologists in multiclass skin cancer image classification by convolutional neural networks'. Together they form a unique fingerprint.

Cite this