Research Paper Volume 12, Issue 14 pp 14506—14527
Identification of lncRNA biomarkers for lung cancer through integrative cross-platform data analyses
- 1 Department of Quantitative Health Sciences, University of Hawaii John A. Burns School of Medicine, The University of Hawaii at Manoa, Honolulu, HI 96813, USA
- 2 Department of Molecular Biosciences and Bioengineering, The University of Hawaii at Manoa College of Tropical Agriculture and Human Resources, Agricultural Sciences 218, Honolulu, HI 96822, USA
Received: January 9, 2020 Accepted: June 1, 2020 Published: July 16, 2020
https://doi.org/10.18632/aging.103496How to Cite
Abstract
This study was designed to identify lncRNA biomarker candidates using lung cancer data from RNA-Seq and microarray platforms separately.
Lung cancer datasets were obtained from the Gene Expression Omnibus (GEO, n = 287) and The Cancer Genome Atlas (TCGA, n = 216) repositories, only common lncRNAs were used. Differentially expressed (DE) lncRNAs in tumors with respect to normal were selected from the Affymetrix and TCGA datasets. A training model consisting of the top 20 DE Affymetrix lncRNAs was used for validation in the TCGA and Agilent datasets. A second similar training model was generated using the TCGA dataset.
First, a model using the top 20 DE lncRNAs from Affymetrix for training and validated using TCGA and Agilent, achieved high prediction accuracy for both training (98.5% AUC for Affymetrix) and validation (99.2% AUC for TCGA and 92.8% AUC for Agilent). A similar model using the top 20 DE lncRNAs from TCGA for training and validated using Affymetrix and Agilent, also achieved high prediction accuracy for both training (97.7% AUC for TCGA) and validation (96.5% AUC for Affymetrix and 80.9% AUC for Agilent). Eight lncRNAs were found to be overlapped from these two lists.
Abbreviations
LUAD/ADC: adenocarcinoma; LUSC/SCC: squamous cell carcinoma; SCLC: small cell lung cancer; NSCLC: non-small cell lung cancer; lncRNA: long non-coding RNA; GEO: Gene Expression Omnibus; TCGA: The Cancer Genome Atlas; PCA: principal component analysis; TANRIC: The Atlas of ncRNA in Cancer; DAVID: Database for Annotation, Visualization, and Integrated Discovery; TARGET: Tumor Alterations Relevant for Genomics-driven Therapy.