Figure 4. Independent datasets validation of the reliability of Fges-derived subtypes. (A) Schematic of the XGBoost model for predicting Fges-derived subtypes (see Materials and Methods). (B) Heatmap shows whether a particular sample, predicted by XGBoost classifier from the training set, falls under Cluster 3 (light blue) or Cluster 1&2 (purple), out of the 151 total samples. (C, D) The association between predicted clusters and survival was tested using Kaplan-Meier survival curves for predicted Cluster 3 versus Cluster 1&2. P-values were from log-rank tests. (E) Bar plot shows the importance ranking of the top 35 feature genes (ordered by Gain index) filtered by XGBoost in the training set. (F) Bubble plots show the expression of the top 15 important feature genes in the three NB cohorts. The depth of the color indicates the average level of expression of one gene in a particular subtype, and the size of the circle indicates the percentage of that gene expressed in a particular subtype. (G) ROC curves depict how accurately the expression of three genes can predict subtypes Cluster 1&2 in comparison to Cluster 3 based on the TARGET-NB cohort. (top) GREB1; (middle) CDK4; (bottom) GPR125. (H) KM curves show the prognostic impact of high and low expression of GREB1, CDK4 and GPR125 genes in three NB cohorts. (top panel) TARGET-NB; (middle panel) GSE49710; and (bottom panel) GSE85047.