Research Paper Volume 12, Issue 10 pp 9840—9854
Development of a machine learning-based multimode diagnosis system for lung cancer
- 1 College of Public Health, Zhengzhou University, Zhengzhou 450001, China
- 2 The First Affiliated Hospital of Zhengzhou University, Zhengzhou 450001, China
- 3 Henan Provincial Chest Hospital, Zhengzhou 450001, China
- 4 Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB R3E 3N4, Canada
- 5 Henan Joint International Research Laboratory of Green Construction of Functional Molecules and Their Bioanalytical Applications, Zhengzhou 450001, China
- 6 The Key Laboratory of Nanomedicine and Health Inspection of Zhengzhou, Zhengzhou 450001, China
Received: February 10, 2020 Accepted: April 20, 2020 Published: May 23, 2020
https://doi.org/10.18632/aging.103249How to Cite
Copyright © 2020 Duan et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Abstract
As an emerging technology, artificial intelligence has been applied to identify various physical disorders. Here, we developed a three-layer diagnosis system for lung cancer, in which three machine learning approaches including decision tree C5.0, artificial neural network (ANN) and support vector machine (SVM) were involved. The area under the curve (AUC) was employed to evaluate their decision powers. In the first layer, the AUCs of C5.0, ANN and SVM were 0.676, 0.736 and 0.640, ANN was better than C5.0 and SVM. In the second layer, ANN was similar with SVM but superior to C5.0 supported by the AUCs of 0.804, 0.889 and 0.825. Much higher AUCs of 0.908, 0.910 and 0.849 were identified in the third layer, where the highest sensitivity of 94.12% was found in C5.0. These data proposed a three-layer diagnosis system for lung cancer: ANN was used as a broad-spectrum screening subsystem basing on 14 epidemiological data and clinical symptoms, which was firstly adopted to screen high-risk groups; then, combining with additional 5 tumor biomarkers, ANN was used as an auxiliary diagnosis subsystem to determine the suspected lung cancer patients; C5.0 was finally employed to confirm lung cancer patients basing on 22 CT nodule-based radiomic features.
Abbreviations
CT: computed tomography; DT: decision tree; ANN: artificial neural network; SVM: support vector machine; AUC: area under the receiver operating characteristic curve; LDCT: low-dose computed tomography; ProGRP: progastrin-releasing peptide; VEGF: vascular endothelial growth factor; CEA: carcinoembryonic antigen; CYFRA21-1: cytokeratin 19 fragment; NSE: neuron specific enolase; PPV: positive predictive value; NPV: negative predictive value; CI: confidence interval.