Introduction

Lung cancer is the deadliest malignancy worldwide [1], and its 5-year survival rate is approximately 4-17% [2]. The pathological types of lung cancer include non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC). NSCLC is further divided into adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC), of which adenocarcinoma is the most common type. For last decades, it was believed that smoking was the primary cause of NSCLC. However, the incidence of NSCLC in non-smokers is increasing year by year [3], but the specific reason is still unclear. Surgery is considered to be the gold standard in treating early-stage lung cancer [4], while radiotherapy, chemotherapy, targeted therapy, and immunotherapy are also available for patients with advanced disease [5]. In recent years, EGFR-TKIs targeting EGFR mutations have shown significant effects [6]. The application of the immune checkpoint inhibitors PD-1 and PD-L1 also provides a broader strategy for the comprehensive treatment of lung adenocarcinoma [7]. Thus, exploration of novel approaches for monitoring prognosis and improving LUAD immunotherapy is urgently needed.

Ubiquitination is a type of post-translational modification that participates in regulating protein function or degradation by the proteasome [8]. The process of protein ubiquitination involves three types of enzymes: ubiquitin-activating enzyme (E1), ubiquitin-coupled enzyme (E2), and ubiquitin ligase (E3) [9]. The E1 enzyme is responsible for the adenylation of ubiquitin, a process that consumes one molecule ATP, and the E2 enzyme transports ubiquitin that is adenylated by E1 to E3 [10]. Finally, different E3 enzymes recognize diverse substrate proteins and catalyze ubiquitin transfer from E2 to the lysine (K) residues of substrate proteins to complete monoubiquitination or polyubiquitination of substrate proteins [11, 12]. Indeed, like other posttranslational modifications, ubiquitin chains on the lysine (K) residues of substrate proteins can be removed by deubiquitinases (DUBs, also known as deubiquitinating enzymes). The most classical function of ubiquitination and deubiquitination is to mediate the degradation of a protein by the 26S proteasome or stabilize it [13]. However, it has been reported that ubiquitin enzymes (UBs) and deubiquitinases (DUBs) are associated with various biological processes, such as the cell cycle, transcription, signal transduction, apoptosis, the immune response, protein interactions, and subcellular localization [1416]. Among these enzymes, E3s and DUBs have been shown to bind specifically to substrates [17].

In lung cancer, several studies have reported that E3s/DUBs can interact with their substrates and play significant roles in LUAD. For example, it has been demonstrated that USP7 Monoubiquitination Histone H2b sensitizes lung cancer cells to ferroptosis [18]. Furthermore, UBE2O mediates Mxi1 ubiquitination and degradation to promotes lung cancer cell proliferation and radioresistance [19]. An E3 ubiquitin ligase complex CRL3(BTBD9) targets TNFAIP1 for degradation to suppress lung cancer cell migration. Moreover, the mRNA expression of BTBD9 is associated with the overall survival in lung cancer patients [20]. Thus, it is believed that constructing an E3s/DUB-related gene prediction model to monitor LUAD patients' prognosis would be valuable and that timely intervention can significantly improve the prognosis of patients with high-risk scores. In this study, we analyzed E3s/DUB-related gene expression associated with patients’ clinical outcomes obtained from The Cancer Genome Atlas (TCGA) database. A prognostic prediction model for LUAD patients was constructed. The substrate proteins of these signature genes were next predicted. Besides, functional enrichment analysis of the substrate proteins showed that they might affect sensitivity to chemotherapy, targeted therapy, and immunotherapy.

Results

Evaluation of the predictive ability of the five signature genes for overall survival in the training group and test group

We calculated the risk scores of all patients in the training group using the five signature gene prediction model and classified patients into high-risk and low-risk groups according to the median risk score values (Figure 3A). Kaplan-Meier log-rank analyses showed that patients with high risk had a worse survival than patients with low risk in the training group (Figure 3B). Moreover, the mortality rate of patients in the high-risk group was higher than that of patients in the low-risk group (Figure 3C). The prediction model's ROC curves at 1, 2, and 3 years all showed that the model accurately predicted patient survival (Figure 3D). Additionally, the heatmap showed the landscape of the five signature genes in the training group (Figure 3E). To verify the model's reliability, we calculated the risk score of all patients in the test group and then used the same cutoff value as that used for the training group to group the patients (Figure 4A). As shown in Figure 4B, 4C, in the test group, high-risk patients had a significantly worse prognosis and mortality rate than low-risk patients. Besides, the ROC curve for the test group proved that the model was reliable (Figure 4D). Additionally, we visualized the expression of five signature genes in the test group (Figure 4E).

The five-gene signature predicts overall survival in the training group. (A) The distribution of risk-score and patients' grouping. (B) Kaplan-Meier survival curves of high and low-risk patients. (C) Vital status of patients. (D) ROC curves of the predictive model in the training group. (E) Heatmap of five genes in the training group.
Validation of the predictive model. (A) The distribution of risk-score in the test group. (B) Kaplan-Meier survival curves of high and low-risk patients in the test group. (C) Vital status of patients in the test group. (D) ROC curves of the predictive model in the test group. (E) Heatmap of five genes in the test group.

Prediction of the substrate proteins of the signature genes and their functional analysis

To investigate the potential functional impact of the five signature genes, we predicted the likely substrate proteins of the signature genes using “UbiBrowser”. The top 20 substrate proteins of each signature gene, which were identified according to their predicted scores were selected for further functional analysis (Figure 6A). GO enrichment analysis showed that all substrate proteins were involved in regulating the classical P53 pathway and various epigenetic regulation mechanisms, including histone modification, chromatin modification, and protein modification (Figure 6B). Additionally, KEGG analysis suggested that the substrate proteins were mainly enriched in the non-small cell lung cancer pathway, which confirmed our model's validity. Besides, the substrate proteins were found to participate in some intriguing pathways, such as EGFR tyrosine kinase inhibitor resistance, Fc epsilon RI signaling pathway, platinum drug resistance, human T-cell leukemia virus 1 infection, PD-L1 expression and PD-1 checkpoint pathway in cancer, and Fc gamma R-mediated phagocytosis (Figure 6C). Enrichment for the above pathways predicts that our five signature genes may regulate immune infiltration. Since protein ubiquitination has been previously reported to influence innate and adaptive immunity [21], we further performed a detailed analysis of this process.

Substrate protein prediction and potential functional analysis. (A) The prediction of substrate proteins. (B) GO enrichment analysis. (C) KEGG pathway enrichment analysis.

Identification of immune infiltration in the low-risk group and high-risk groups

According to the risk score, we divided all patients into high- and low-risk groups and then used the R package “CIBERSORT” to analyze infiltrated immune cells in each sample (Figure 7A). As shown in Figure 7B, the infiltration of CD8 cells was higher in the low-risk group than in the high-risk group, while NK cells, dendritic cells, and mast cells had higher infiltration levels in the high-risk group than in the low-risk group. However, the activation levels of these cells were not significantly different between the two groups group. In contrast, CD4 memory cells were significantly more activated in the low-risk group than in the high-risk group (Figure 7B). The above differences in these immune cells may be associated with differences in overall survival and tumor resistance to multiple treatments.

The immune infiltration analysis in low-risk and high-risk groups. (A) The distribution of the immune cells. (B) Differential analysis of immune cell infiltration in the high-risk and low-risk group.

Discussion

Protein ubiquitination has been reported to be involved in various biological processes to regulate the growth or death of tumor cells [22]. E3s/DUB is involved in the regulation of numerous pathways by ubiquitinating substrates or changing the level of ubiquitination. Deepening further research and understanding of such E3s/DUB will help to expand existing cancer therapeutic targets and effective biomarkers, so it is worth exploring a prognostic prediction model for patients based on E3s/DUB. Although some signature gene-based prediction models for lung adenocarcinoma were recently reported [2325], no study has focused on protein ubiquitination in LUAD. In the present study, we constructed an E3s/DUB-related signature gene model to predict prognosis and further analyzed the functional impact.

We systematically investigated the expression levels of E3s/DUB-related genes in lung adenocarcinoma patients and their relationship with overall survival. We constructed a brand-new five signature gene (WDR86, TRIM2, TRIM17, RNF144B, and UHRF1)-based prediction model for the training group and then validated the model in the test group. In this model, WDR86, TRIM2, and RNF144B were protective factors for risk score, while UHRF1 and TRIM17 were risk factors for LUAD. Among these genes, UHRF1, which has been demonstrated experimentally can promote disease progression and is associated with poor prognosis by affecting the cell cycle pathway in lung adenocarcinoma [26], which is consistent with our findings. However, previous studies have reported that TRIM2 mediates proliferation and metastasis of lung adenocarcinoma, by deubiquitinating and stabilizing Snail1 protein [27]. That is contradictory to our study in which TRIM2 was a risk protective factor. We suggest that TRIM2 binding to Snail1 may not be unique. Because deubiquitinase recognizes protein possessing the same specific domain. By affecting the ubiquitination levels of different substrate proteins, TRIM2 may play inconsistent cancer-promoting or cancer-inhibiting roles. When considering the impact of TRIM2 on patient survival, the combined effects of TRIM2 needs to be considered. More experiments are needed to investigate the combined effects of TRIM2 on survival in patients with lung adenocarcinoma. Besides, the specific role of other genes in the prediction model in lung cancer has not been reported, and further experiments on them can also help us to expand the understanding of the mechanisms of lung cancer development and progression. In our study, both Kaplan-Meier survival curves and ROC curves confirmed the reliability of our model in predicting survival. Then, we predicted the substrate proteins of the five signature genes to explore their functional impact. Interestingly, GO and KEGG analysis showed that the enriched pathways are associated with treatment resistance and the immune response and thus affect prognosis. Then, we studied immune cell infiltration in samples grouped according to risk scores calculated by our prediction model. Indeed, we observed some differences in immune cell infiltration scores, with the most significant difference being related to the activation of CD4 memory T cells, which was significantly increased in the low-risk group. This difference may have contributed to the better survival of patients in the low-risk group than those in the high-risk group. This finding is also consistent with previous reports showing that quiescent CD4 memory T cells exist in the microenvironment with tumors, where they can be activated by locally continuously released IL-12, allowing them to proliferate and secrete IFN-γ, leading to tumor cell death [28]. Of course, this study has some limitations, such as insufficient sample size and lack of experimental validation, which is also the focus of our future research.

In conclusion, we constructed an E3s/DUB-related signature gene-based model for predicting prognosis that performed well in predicting LUAD patient survival. Additionally, we identified the infiltrated immune cells significantly associated with the prognostic signature genes. These findings may provide novel insights for monitoring LUAD prognosis and guide the development of cancer immunotherapy.

Data availability statement

The public datasets can be downloaded from TCGA (https://portal.gdc.cancer.gov/) and the IUUCD database (http://iuucd.biocuckoo.org/). The data used to support the findings of this study are available from the corresponding author upon request.

Materials and Methods

Data downloading and processing

The RNA-seq data (HTSeq-FPKM) and clinical datasets for tumors (n=497) and normal samples (n=54) were downloaded from The Cancer Genome Atlas (TCGA) website (https://cancergenome.nih.gov/). E3s/DUB-related genes (n=1016) were identified from the IUUCD database (http://iuucd.biocuckoo.org/).

Patient grouping and establishment and validation of the prediction model

Mean gene expression levels for samples with the same patient ID were calculated by the R package “limma”. Repeated patient samples and samples with insufficient follow-up information were excluded (n = 43). Then, the package “caret” was used to randomly divide the tumor samples into the training group (n=227; 50%) and test group (n=227; 50%). We identified E3/DUB-related DEGs associated with overall survival by the univariate Cox regression method after combining gene expression and response to clinical treatment information (P < 0.1). Moreover, we performed LASSO-Cox regression using signatures derived from univariate Cox regression and established prognostic signature formulae to avoid overfitting. LASSO regression was performed by the R package “glmnet” to identify robustly expressed genes. The parameter for LASSO was selected by ten-fold cross-validation. Subsequently, multivariate Cox regression analyses (P < 0.05) of these robustly expressed genes were performed to construct the risk score model. The risk scores of patients in the training group were calculated, and the median value of the risk score was used to divide the patients into high-risk and low-risk groups. Risk scores for patients in the test group were calculated as described above, and the same cutoff score was used for grouping. Kaplan-Meier analysis with the log-rank test and a ROC curve was used to detect the reliability of the model. A heatmap showing signature gene expression in the two groups was generated with the R package “survival”.

Prediction and functional enrichment analysis of substrate proteins

The substrate protein of E3s/DUB is predicted using “UbiBrowser” (http://ubibrowser.ncpsb.org). The substrate proteins with the top 20 prediction scores were then selected for functional enrichment analysis. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were performed with the R package “clusterProfiler”.

Analysis of immune infiltration between the low-risk and high-risk groups

To understand the relationship between the model and infiltrated immune cells, we characterized immune cell composition using the R package “CIBERSORT”. The low-risk and high-risk groups were divided according to risk scores, and differentially infiltrated immune cells were identified. Here, a P-value < 0.05 was considered to be statistically significant.

Statistical analysis

All statistical analyses are performed with R version 4.0.2 and the attached packages. The p-value and hazard ratio (HR) for survival analysis were derived from Cox regression. LASSO (least absolute shrinkage and selection operator) regression was used to filter the robustly expressed genes. The student’s t-test was used to explore the differences between the two groups. Moreover, the log-rank test was used for Kaplan-Meier survival analysis.

Author Contributions

RX was responsible for analyzing and processing data and writing the article, TL was responsible for plotting figures, and JW was responsible for writing the figure legend. LYZ was accountable for the guidance of the idea of the article and the review of the article.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References