Research Paper Volume 16, Issue 5 pp 4075—4094

Genome-wide transcriptome profiling and development of age prediction models in the human brain

Joseph A. Zarrella1, , Amy Tsurumi2,3, ,

  • 1 Department of Health Policy and Management, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
  • 2 Department of Surgery, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
  • 3 Shriner's Hospitals for Children-Boston, Boston, MA 02114, USA

Received: May 2, 2022       Accepted: March 28, 2023       Published: February 28, 2024      

https://doi.org/10.18632/aging.205609
How to Cite

Copyright: © 2024 Zarrella and Tsurumi. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Abstract

Aging-related transcriptome changes in various regions of the healthy human brain have been explored in previous works, however, a study to develop prediction models for age based on the expression levels of specific panels of transcripts is lacking. Moreover, studies that have assessed sexually dimorphic gene activities in the aging brain have reported discrepant results, suggesting that additional studies would be advantageous. The prefrontal cortex (PFC) region was previously shown to have a particularly large number of significant transcriptome alterations during healthy aging in a study that compared different regions in the human brain. We harmonized neuropathologically normal PFC transcriptome datasets obtained from the Gene Expression Omnibus (GEO) repository, ranging in age from 21 to 105 years, and found a large number of differentially regulated transcripts in the old and elderly, compared to young samples overall, and compared female and male-specific expression alterations. We assessed the genes that were associated with age by employing ontology, pathway, and network analyses. Furthermore, we applied various established (least absolute shrinkage and selection operator (Lasso) and Elastic Net (EN)) and recent (eXtreme Gradient Boosting (XGBoost) and Light Gradient Boosting Machine (LightGBM)) machine learning algorithms to develop accurate prediction models for chronological age and validated them. Studies to further validate these models in other large populations and molecular studies to elucidate the potential mechanisms by which the transcripts identified may be related to aging phenotypes would be advantageous.

Abbreviations

PFC: Prefrontal cortex; GEO: Gene Expression Omnibus; Lasso: Least absolute shrinkage and selection operator; EN: Elastic Net; XGBoost: eXtreme Gradient Boosting; LightGBM: Light Gradient Boosting Machine; SHAP: SHapely Additive exPlanations; GO: Gene Ontology; KEGG: Kyoto Encyclopedia of Genes and Genomes; CV: Cross-validation.