Experts fail to reliably detect AI-generated histological data.
Jan Hartung, Stefanie Reuter, Vera Anna Kulow, Michael F��hling, Cord Spreckelsen, Ralf Mrowka
Author Information
Jan Hartung: Institute for Physiology, Faculty of Medicine, University of Freiburg, 79108, Freiburg, Germany. jan.hartung@physiologie.uni-freiburg.de. ORCID
Vera Anna Kulow: Charit�� - Universit��tsmedizin Berlin, Corporate member of Freie Universit��t Berlin and Freie Universit��t Berlin and Humboldt-Universit��t zu Berlin, Institut f��r Translationale Physiologie (CCM), Charit��platz 1, 10117, Berlin, Germany. ORCID
Michael F��hling: Charit�� - Universit��tsmedizin Berlin, Corporate member of Freie Universit��t Berlin and Freie Universit��t Berlin and Humboldt-Universit��t zu Berlin, Institut f��r Translationale Physiologie (CCM), Charit��platz 1, 10117, Berlin, Germany. ORCID
Cord Spreckelsen: Institute of Medical Statistics, Computer and Data Sciences, Jena University Hospital, Bachstrase 18, 07743, Jena, Germany.
Ralf Mrowka: Department of Internal Medicine III, Experimental Nephrology, Jena University Hospital, Nonnenplan 4, 07745, Jena, Germany. ralf.mrowka@med.uni-jena.de. ORCID
AI-based methods to generate images have seen unprecedented advances in recent years challenging both image forensic and human perceptual capabilities. Accordingly, these methods are expected to play an increasingly important role in the fraudulent fabrication of data. This includes images with complicated intrinsic structures such as histological tissue samples, which are harder to forge manually. Here, we use stable diffusion, one of the most recent generative algorithms, to create such a set of artificial histological samples. In a large study with over 800 participants, we study the ability of human subjects to discriminate between these artificial and genuine histological images. Although they perform better than naive participants, we find that even experts fail to reliably identify fabricated data. While participant performance depends on the amount of training data used, even low quantities are sufficient to create convincing images, necessitating methods and policies to detect fabricated data in scientific publications.