COVID-19Research Paper|Volume 12, Issue 20|pp 19938—19944

An improved multivariate model that distinguishes COVID-19 from seasonal flu and other respiratory diseases

Xing Guo¹^,*, Yanrong Li²^,*, Hua Li³, Xueqin Li⁴, Xu Chang⁵, Xuemei Bai⁶, Zhanghong Song⁷, Junfeng Li¹, Kefeng Li⁸

¹Department of Radiology, Heping Hospital Affiliated to Changzhi Medical College, Shanxi 046000, China
²Department of Pharmacy, Changzhi Medical College, Shanxi 046000, China
³Department of Respiratory Medicine, Third Hospital of Linfen, Shanxi 041000, China
⁴Department of Respiratory Medicine, Jincheng General Hospital, Shanxi 048006, China
⁵Graduate School of Changzhi Medical College, Shanxi 046000, China
⁶Department of Nephrology, Jiexiu People’s Hospital, Shanxi 032000, China
⁷Department of Nephrology, Fenyang Hospital, Shanxi 032200, China
⁸School of Medicine, University of California, San Diego, CA 92093, USA

* Equal contribution

Received: June 3, 2020Accepted: September 5, 2020Published: October 21, 2020

Copyright: © 2020 Guo et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Abstract

COVID-19 shared many symptoms with seasonal flu, and community-acquired pneumonia (CAP) Since the responses to COVID-19 are dramatically different, this multicenter study aimed to develop and validate a multivariate model to accurately discriminate COVID-19 from influenza and CAP. Three independent cohorts from two hospitals (50 in discovery and internal validation sets, and 55 in the external validation cohorts) were included, and 12 variables such as symptoms, blood tests, first reverse transcription-polymerase chain reaction (RT-PCR) results, and chest CT images were collected. An integrated multi-feature model (RT-PCR, CT features, and blood lymphocyte percentage) established with random forest algorism showed the diagnostic accuracy of 92.0% (95% CI: 73.9 - 99.1) in the training set, and 96. 6% (95% CI: 79.6 - 99.9) in the internal validation cohort. The model also performed well in the external validation cohort with an area under the receiver operating characteristic curve of 0.93 (95% CI: 0.79 - 1.00), an F1 score of 0.80, and a Matthews correlation coefficient (MCC) of 0.76. In conclusion, the developed multivariate model based on machine learning techniques could be an efficient tool for COVID-19 screening in nonendemic regions with a high rate of influenza and CAP in the post-COVID-19 era.