Development and Validation of a Risk Assessment Model for Pulmonary Nodules Using Plasma Proteins and Clinical Factors.
Anil Vachani, Stephen Lam, Pierre P Massion, James K Brown, Michael Beggs, Amanda L Fish, Luis Carbonell, Shan X Wang, Peter J Mazzone
Author Information
Anil Vachani: Pulmonary, Allergy, and Critical Care Division, Department of Medicine, University of Pennsylvania, Philadelphia, PA; Corporal Michael J. Crescenz VA Medical Center, Department of Medicine, Philadelphia, PA. Electronic address: avachani@pennmedicine.upenn.edu.
Stephen Lam: Department of Integrative Oncology, British Columbia Cancer Research Institute, University of British Columbia, Vancouver, BC, Canada.
Pierre P Massion: Division of Allergy, Pulmonary and Critical Care Medicine, Vanderbilt University, Nashville, TN.
James K Brown: Division of Pulmonary, Critical Care, Allergy and Sleep Medicine, Department of Medicine, University of California, San Francisco, CA; VA Medical Center San Francisco, Department of Medicine, San Francisco, CA.
Michael Beggs: MagArray, Inc., Milpitas, CA.
Amanda L Fish: MagArray, Inc., Milpitas, CA.
Luis Carbonell: MagArray, Inc., Milpitas, CA.
Shan X Wang: MagArray, Inc., Milpitas, CA.
Peter J Mazzone: Respiratory Institute, Cleveland Clinic, Cleveland, OH.
BACKGROUND: Deficiencies in risk assessment for patients with pulmonary nodules (PNs) contribute to unnecessary invasive testing and delays in diagnosis. RESEARCH QUESTION: What is the accuracy of a novel PN risk model that includes plasma proteins and clinical factors? How does the accuracy compare with that of an established risk model? STUDY DESIGN AND METHODS: Based on technology using magnetic nanosensors, assays were developed with seven plasma proteins. In a training cohort (n = 429), machine learning approaches were used to identify an optimal algorithm that subsequently was evaluated in a validation cohort (n = 489), and its performance was compared with the Mayo Clinic model. RESULTS: In the training set, we identified a support vector machine algorithm that included the seven plasma proteins and six clinical factors that demonstrated an area under the receiver operating characteristic curve of 0.87 and met other selection criteria. The resulting risk reclassification model (RRM) was used to recategorize patients with a pretest risk of between 10% and 84%, and its performance was assessed across five risk strata (low, ≤ 10%; moderate, 10%-34%; intermediate, 35%-70%; high, 71%-84%; very high, > 85%). Stratification by the RRM decreased the proportion of intermediate-risk patients from 26.7% to 10.8% (P < .001) and increased the low-risk and high-risk strata from 16.8% to 21.9% (P < .001) and from 3.7% to 12.1% (P < .001), respectively. Among patients classified as low risk by the RRM and Mayo Clinic model, the corresponding true-negative to false-negative ratios were 16.8 and 19.5, respectively. Among patients classified as very high risk by the RRM and Mayo Clinic model, the corresponding true-positive to false-positive ratios were 28.5 and 17.0, respectively. Compared with the Mayo Clinic model, the RRM provided higher specificity at the low-risk threshold and higher sensitivity at the very high-risk threshold. INTERPRETATION: The RRM accurately reclassified some patients into low-risk and very high-risk categories, suggesting the potential to improve PN risk assessment.