• search hit 4 of 207
Back to Result List

Comparing artificial intelligence algorithms to 157 German dermatologists: the melanoma classification benchmark

Please always quote using this URN: urn:nbn:de:bvb:20-opus-220569
  • Background Several recent publications have demonstrated the use of convolutional neural networks to classify images of melanoma at par with board-certified dermatologists. However, the non-availability of a public human benchmark restricts the comparability of the performance of these algorithms and thereby the technical progress in this field. Methods An electronic questionnaire was sent to dermatologists at 12 German university hospitals. Each questionnaire comprised 100 dermoscopic and 100 clinical images (80 nevi images and 20Background Several recent publications have demonstrated the use of convolutional neural networks to classify images of melanoma at par with board-certified dermatologists. However, the non-availability of a public human benchmark restricts the comparability of the performance of these algorithms and thereby the technical progress in this field. Methods An electronic questionnaire was sent to dermatologists at 12 German university hospitals. Each questionnaire comprised 100 dermoscopic and 100 clinical images (80 nevi images and 20 biopsy-verified melanoma images, each), all open-source. The questionnaire recorded factors such as the years of experience in dermatology, performed skin checks, age, sex and the rank within the university hospital or the status as resident physician. For each image, the dermatologists were asked to provide a management decision (treat/biopsy lesion or reassure the patient). Main outcome measures were sensitivity, specificity and the receiver operating characteristics (ROC). Results Total 157 dermatologists assessed all 100 dermoscopic images with an overall sensitivity of 74.1%, specificity of 60.0% and an ROC of 0.67 (range = 0.538–0.769); 145 dermatologists assessed all 100 clinical images with an overall sensitivity of 89.4%, specificity of 64.4% and an ROC of 0.769 (range = 0.613–0.9). Results between test-sets were significantly different (P < 0.05) confirming the need for a standardised benchmark. Conclusions We present the first public melanoma classification benchmark for both non-dermoscopic and dermoscopic images for comparing artificial intelligence algorithms with diagnostic performance of 145 or 157 dermatologists. Melanoma Classification Benchmark should be considered as a reference standard for white-skinned Western populations in the field of binary algorithmic melanoma classification.show moreshow less

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar Statistics
Metadaten
Author: Titus J. Brinker, Achim Hekler, Axel Hauschild, Carola Berking, Bastian Schilling, Alexander H. Enk, Sebastian Haferkamp, Ante Karoglan, Christof von Kalle, Michael Weichenthal, Elke Sattler, Dirk Schadendorf, Maria R. Gaiser, Joachim Klode, Jochen S. Utikal
URN:urn:nbn:de:bvb:20-opus-220569
Document Type:Journal article
Faculties:Medizinische Fakultät / Klinik und Poliklinik für Dermatologie, Venerologie und Allergologie
Language:English
Parent Title (English):European Journal of Cancer
Year of Completion:2019
Volume:111
Pagenumber:30-37
Source:European Journal of Cancer (2019) 111:30-37. https://doi.org/10.1016/j.ejca.2018.12.016
DOI:https://doi.org/10.1016/j.ejca.2018.12.016
Dewey Decimal Classification:6 Technik, Medizin, angewandte Wissenschaften / 61 Medizin und Gesundheit / 610 Medizin und Gesundheit
Tag:artificial intelligence; benchmark; deep learning; melanoma
Release Date:2024/08/08
Licence (German):License LogoCC BY: Creative-Commons-Lizenz: Namensnennung 4.0 International