Aging
Navigate
Research Paper|Volume 9, Issue 7|pp 1721—1737

Machine learning for predicting lifespan-extending chemical compounds

Diogo G. Barardo1, Danielle Newby2, Daniel Thornton1, Taravat Ghafourian3, João Pedro de Magalhães1, Alex A. Freitas4
  • 1Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK
  • 2Department of Psychiatry, University of Oxford, Warneford Hospital, Oxford, UK
  • 3School of Life Sciences, University of Sussex, Falmer, Brighton, UK
  • 4School of Computing, University of Kent, Canterbury, UK
* Equal contribution
# Joint last authors
Received: June 6, 2017Accepted: July 12, 2017Published: July 18, 2017

Copyright: © 2017 Barardo et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Abstract

Increasing age is a risk factor for many diseases; therefore developing pharmacological interventions that slow down ageing and consequently postpone the onset of many age-related diseases is highly desirable. In this work we analyse data from the DrugAge database, which contains chemical compounds and their effect on the lifespan of model organisms. Predictive models were built using the machine learning method random forests to predict whether or not a chemical compound will increase Caenorhabditis elegans’ lifespan, using as features Gene Ontology (GO) terms annotated for proteins targeted by the compounds and chemical descriptors calculated from each compound’s chemical structure. The model with the best predictive accuracy used both biological and chemical features, achieving a prediction accuracy of 80%. The top 20 most important GO terms include those related to mitochondrial processes, to enzymatic and immunological processes, and terms related to metabolic and transport processes. We applied our best model to predict compounds which are more likely to increase C. elegans’ lifespan in the DGIdb database, where the effect of the compounds on an organism’s lifespan is unknown. The top hit compounds can be broadly divided into four groups: compounds affecting mitochondria, compounds for cancer treatment, anti-inflammatories, and compounds for gonadotropin-releasing hormone therapies.