Copyright: © 2022 Hong et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Background: Breast cancer is an invasive disease with complex molecular mechanisms. Prognosis-related biomarkers are still urgently needed to predict outcomes of breast cancer patients.
Methods: Original data were download from The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO). The analyses were performed using perl-5.32 and R-x64-4.1.1.
Results: In this study, 1086 differentially expressed genes (DEGs) were identified in the TCGA cohort; 523 shared DEGs were identified in the TCGA and GSE10886 cohorts. Eight subtypes were estimated using non-negative matrix factorization clustering with significant differences seen in overall survival (OS) and progression-free survival (PFS) (P < 0.01). Univariate Cox analysis and least absolute shrinkage and selection operator (LASSO) regression analysis were performed to develop a related risk score related to the 17 DEGs; this score separated breast cancer into low- and high-risk groups with significant differences in survival (P < 0.01) and showed powerful effectiveness (TCGA all group: 1-year area under the curve [AUC] = 0.729, 3-year AUC = 0.778, 5-year AUC = 0.781). A nomogram prediction model was constructed using non-negative matrix factorization clustering, the risk score, and clinical characteristics. Our model was confirmed to be related with tumor microenvironment. Furthermore, DEGs in high-risk breast cancer were enriched in histidine metabolism (normalized enrichment score [NES] = 1.49, P < 0.05), protein export (NES = 1.58, P < 0.05), and steroid hormone biosynthesis signaling pathways (NES = 1.56, P < 0.05).
Conclusions: We established a comprehensive model that can predict prognosis and guide treatment.