A deep learning approach to private data sharing of medical images using conditional generative adversarial networks (GANs).
Hanxi Sun, Jason Plawinski, Sajanth Subramaniam, Amir Jamaludin, Timor Kadir, Aimee Readie, Gregory Ligozio, David Ohlssen, Mark Baillie, Thibaud Coroller
Author Information
Hanxi Sun: Department of Statistics, Purdue University, West Lafayette, IN, United States of America.
Jason Plawinski: Novartis Pharmaceutical Corporation, East Hanover, New Jersey, United States of America.
Sajanth Subramaniam: Novartis Pharmaceutical Corporation, East Hanover, New Jersey, United States of America.
Amir Jamaludin: Oxford Big Data Institute, Oxford, United Kingdom.
Timor Kadir: Plexalis Ltd, Oxford, United Kingdom.
Aimee Readie: Novartis Pharmaceutical Corporation, East Hanover, New Jersey, United States of America.
Gregory Ligozio: Novartis Pharmaceutical Corporation, East Hanover, New Jersey, United States of America.
David Ohlssen: Novartis Pharmaceutical Corporation, East Hanover, New Jersey, United States of America.
Mark Baillie: Novartis Pharmaceutical Corporation, East Hanover, New Jersey, United States of America.
Thibaud Coroller: Novartis Pharmaceutical Corporation, East Hanover, New Jersey, United States of America. ORCID
Clinical data sharing can facilitate data-driven scientific research, allowing a broader range of questions to be addressed and thereby leading to greater understanding and innovation. However, sharing biomedical data can put sensitive personal information at risk. This is usually addressed by data anonymization, which is a slow and expensive process. An alternative to anonymization is construction of a synthetic dataset that behaves similar to the real clinical data but preserves patient privacy. As part of a collaboration between Novartis and the Oxford Big Data Institute, a synthetic dataset was generated based on images from COSENTYX® (secukinumab) ankylosing spondylitis (AS) clinical studies. An auxiliary classifier Generative Adversarial Network (ac-GAN) was trained to generate synthetic magnetic resonance images (MRIs) of vertebral units (VUs), conditioned on the VU location (cervical, thoracic and lumbar). Here, we present a method for generating a synthetic dataset and conduct an in-depth analysis on its properties along three key metrics: image fidelity, sample diversity and dataset privacy.
References
Nat Commun. 2020 Aug 3;11(1):3877
[PMID: 32747659]
Lancet. 2013 Nov 23;382(9906):1705-13
[PMID: 24035250]
IEEE Trans Pattern Anal Mach Intell. 2021 Dec;43(12):4217-4228
[PMID: 32012000]
Clin Cancer Res. 2019 Jun 1;25(11):3266-3275
[PMID: 31010833]
PLoS Med. 2018 Nov 30;15(11):e1002711
[PMID: 30500819]