Research Paper Volume 14, Issue 10 pp 4270—4280
Identification of combined biomarkers for predicting the risk of osteoporosis using machine learning
- 1 Department of Dermatology, Yanbian University Hospital, Yanji, Jilin Province, China
- 2 Department of Dermatology and Cutaneous Biology Research Institute, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea
- 3 Department of Pathology, Yanbian University College of Medicine, Yanji, Jilin Province, China
- 4 Oral Cancer Research Institute, Yonsei University College of Dentistry, Seoul, Korea
- 5 Institute for the Integration of Medicine and Innovative Technology, Hanyang University College of Medicine, Seoul, Korea
- 6 BK21 PLUS Project, Department of Dental Education, Yonsei University College of Dentistry, Seoul, Korea
Received: January 20, 2022 Accepted: May 7, 2022 Published: May 17, 2022
https://doi.org/10.18632/aging.204084How to Cite
Copyright: © 2022 Zheng et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Abstract
Osteoporosis is a severe chronic skeletal disorder that affects older individuals, especially postmenopausal women. However, molecular biomarkers for predicting the risk of osteoporosis are not well characterized. The aim of this study was to identify combined biomarkers for predicting the risk of osteoporosis using machine learning methods. We merged three publicly available gene expression datasets (GSE56815, GSE13850, and GSE2208) to obtain expression data for 6354 unique genes in postmenopausal women (45 with high bone mineral density and 45 with low bone mineral density). All machine learning methods were implemented in R, with the GEOquery and limma packages, for dataset download and differentially expressed gene identification, and a nomogram for predicting the risk of osteoporosis was constructed. We detected 378 significant differentially expressed genes using the limma package, representing 15 major biological pathways. The performance of the predictive models based on combined biomarkers (two or three genes) was superior to that of models based on a single gene. The best predictive gene set among two-gene sets included PLA2G2A and WRAP73. The best predictive gene set among three-gene sets included LPN1, PFDN6, and DOHH. Overall, we demonstrated the advantages of using combined versus single biomarkers for predicting the risk of osteoporosis. Further, the predictive nomogram constructed using combined biomarkers could be used by clinicians to identify high-risk individuals and in the design of efficient clinical trials to reduce the incidence of osteoporosis.