※ Overview
    Since December, 2019, the outbreak of an unknown viral pneumonia has severely affected Wuhan, China. This virus was quickly identified and named by the World Health Organization (WHO) as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and the resulting viral pneumonia was referred to as coronavirus disease 2019 (COVID-19) pneumonia. By the end of March, 2020, nearly 200 countries and regions were affected with > 500,000 confirmed cases, which are still increasing. Such a severe situation underscores the urgency for developing effective measures to control this pandemic.
    From the accumulated data in our hospitals, we prepare two cohorts that in total include 1170 and 351 laboratory-confirmed, COVID-19-negative/control and suspected patients, and collect their corresponding chest computed tomography (CT) images, clinical features (CFs) and SARS-CoV-2 laboratory testing results if available. Then, we develop a patient-centric resource named integrative CT images and CFs for COVID-19 (iCTCF) to archive chest CT images, 130 types of CFs and laboratory-confirmed SARS-CoV-2 clinical status from 1521 patients with or without COVID-19 pneumonia, reaching a data volume of 265.1 GB. Using the Cohort 1, we integrate the highly heterogeneous CT and CF datasets, and build a novel framework of Hybrid-learning for UnbiaSed predicTion of COVID-19 patients (HUST-19) to predict the clinical outcomes. For morbidity outcomes, the area under the curve (AUC) values of HUST-19 are 0.978, 0.921 and 0.931 for predicting negative cases (Control), mild/regular (Type I) and severe/critically ill (Type II) patients, respectively. We also use the Cohort 2 as an independent dataset to evaluate HUST-19, which consistently produces a promising accuracy. For mortality outcomes, we merge the two cohorts and achieve an AUC value of 0.856 for predicting deceased cases. Using HUST-19, we conducted a retrospective analysis of 299 suspected cases in the Cohort 1, and predict 207 and 71 potential Type I and II patients. Our predictions are highly consistent with following RT-PCR validations. In conclusion, this medical resource can be a fundamental resource not only for the purpose of retrospective analysis, but also provide a useful tool for improving diagnosis and treatment of COVID-19 patients. Together with HUST-19, iCTCF will be continuously maintained and updated, meanwhile all source data sets including chest CT images, CFs and laboratory confirmations are available for academic research. All data sets in iCTCF are made available under a CC BY-NC 4.0 license.

For publication of results please cite the following article:

Open resource of clinical data from patients with pneumonia for the prediction of COVID-19 outcomes via deep learning
Wanshan Ning, Shijun Lei, Jingjing Yang, Yukun Cao, Peiran Jiang, Qianqian Yang, Jiao Zhang, Xiaobei Wang, Fenghua Chen, Zhi Geng, Liang Xiong, Hongmei Zhou, Yaping Guo, Yulan Zeng, Heshui Shi, Lin Wang, Yu Xue, Zheng Wang.

[Abstract][Full Text(HTML)][Full Text(PDF)]

Last update: Aug. 19th, 2020