Priority Research Paper Volume 16, Issue 22 pp 13452—13504

Cell-type specific epigenetic clocks to quantify biological age at cell-type resolution

Huige Tong1, *, , Xiaolong Guo1, *, , Macsue Jacques2, , Qi Luo1, , Nir Eynon2, , Andrew E. Teschendorff1, ,

  • 1 CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institute for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
  • 2 Australian Regenerative Medicine Institute (ARMI), Faculty of Medicine, Nursing and Health Sciences, Monash University, Clayton, Victoria 3800, Australia
* Equal contribution

Received: August 12, 2024       Accepted: December 12, 2024       Published: December 29, 2024      

https://doi.org/10.18632/aging.206184
How to Cite

Copyright: © 2024 Tong et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Abstract

The ability to accurately quantify biological age could help monitor and control healthy aging. Epigenetic clocks have emerged as promising tools for estimating biological age, yet they have been developed from heterogeneous bulk tissues, and are thus composites of two aging processes, one reflecting the change of cell-type composition with age and another reflecting the aging of individual cell-types. There is thus a need to dissect and quantify these two components of epigenetic clocks, and to develop epigenetic clocks that can yield biological age estimates at cell-type resolution. Here we demonstrate that in blood and brain, approximately 39% and 12% of an epigenetic clock’s accuracy is driven by underlying shifts in lymphocyte and neuronal subsets, respectively. Using brain and liver tissue as prototypes, we build and validate neuron and hepatocyte specific DNA methylation clocks, and demonstrate that these cell-type specific clocks yield improved estimates of chronological age in the corresponding cell and tissue-types. We find that neuron and glia specific clocks display biological age acceleration in Alzheimer’s Disease with the effect being strongest for glia in the temporal lobe. Moreover, CpGs from these clocks display a small but significant overlap with the causal DamAge-clock, mapping to key genes implicated in neurodegeneration. The hepatocyte clock is found accelerated in liver under various pathological conditions. In contrast, non-cell-type specific clocks do not display biological age-acceleration, or only do so marginally. In summary, this work highlights the importance of dissecting epigenetic clocks and quantifying biological age at cell-type resolution.

Introduction

Individuals of the same chronological age may display marked differences in health and future disease risk [1]. These differences are attributed to variations in biological age, and whilst a number of biomarkers for biological age have been developed, these are mostly measured in, or restricted to, easily accessible tissues like blood [14]. More recently, epigenetic clocks based on DNA methylation (DNAm) marks [57] have emerged as powerful tools to predict chronological age, and to some extent also biological age [3, 816]. However, many challenges remain that limit the practical utility and interpretation of these epigenetic clocks [9, 10, 17].

Prominent among these challenges is cell-type heterogeneity (CTH) [18, 19]: most if not all epigenetic clocks have been developed or trained from heterogeneous bulk-tissues that are composed of many different cell-types. This CTH has two undesirable effects, as recently stressed by others [2022]. First, it can distort biological age estimates by underlying variations in cell-type fractions. As a concrete example, whole blood, as well as solid tissues with a significant immune-cell infiltration, are all characterized by an age-related shift from naïve to mature T-cell fractions [2326], which as shown by Jonkman et al. can confound biological age estimates in immune cell subtypes [23]. Whole blood tissue is also subject to an age-related skew towards the myeloid lineage [27]. Another example is brain, with early studies reporting correlations of an accelerated Horvath-clock [28] with neurodegenerative features (e.g. neurotic plaques or amyloid load) [29, 30] in the prefrontal cortex, or with genetic variants associated with neurodegenerative diseases [3133], but with more recent studies attributing such age-accelerations to changes in underlying cell-type proportions [3436]. The second undesirable effect of CTH is that, even if we adjust for it a-posteriori, as it is often done in blood tissue by adjusting epigenetic clock values for variations in estimated immune cell fractions [28], the ensuing epigenetic clock estimates are still averages over all the distinct cell-types in the tissue. Thus, standard a-posteriori adjustment for cell-type fractions does not inform us about the biological aging of individual cell-types. This is a serious limitation for understanding which cell-types age faster and contribute most to age-related diseases, or for identifying the cell-types that respond best to cellular rejuvenation and other healthy-age promoting interventions [3742].

Thus, current epigenetic clocks are composites, reflecting a mixture of two underlying age-related processes: an extrinsic one reflecting age-related variations of a tissue’s cell-type composition, and an intrinsic one, reflecting age-related DNAm changes in the individual cell-types of the tissue, the latter being itself a composite process, as each cell-type may age at a different rate. Importantly, the relative contribution of extrinsic and intrinsic aging processes to an epigenetic clock’s accuracy has not yet been quantified, and the only attempts so far at quantifying biological age at cell-type resolution have come from studies that have applied epigenetic clocks derived from bulk-tissue (e.g. liver) to sparse and unreliable single-cell DNAm data (e.g. hepatocytes) [43].

Here we advance the epigenetic clock field in two ways. First, we dissect epigenetic clocks into their extrinsic and intrinsic components, rigorously quantifying these aging components in two of the most frequently profiled tissues (whole blood and brain), demonstrating that extrinsic aging can account for a substantial proportion of an epigenetic clock’s predictive accuracy. Second, we present a computational strategy for building cell-type specific epigenetic clocks that can yield biological age estimates at cell-type resolution. This strategy explicitly adjusts for CTH whilst training the clock, leveraging recent state-of-the-art cell-type deconvolution methods [24, 44] to identify cell-type specific age-associated DNAm changes [45]. As such this is one of the first studies to characterize age-associated DNAm changes at cell-type resolution. We illustrate our strategy in brain and liver tissue, demonstrating how cell-type specific epigenetic clocks from these tissues can improve tissue-specific estimation of chronological and biological age.

Results

Dissecting and quantifying the two components of epigenetic clocks

Epigenetic clocks are composites of two underlying age-related processes, an extrinsic one that reflects age-associated variations in cell-type composition and an intrinsic one that captures age-associated DNAm changes in individual cell-types (Figure 1A). Thus, the predictive accuracy (R2 value) of a clock can be composed as the sum of an extrinsic and intrinsic accuracy, so that the extrinsic R2 value can be estimated from the predictive accuracies of two clocks: one adjusted for CTH, and another which is not (Figure 1A). To quantify the relative rates of extrinsic and intrinsic aging, we assembled a large collection of whole blood Illumina DNAm datasets, using two of the largest cohorts to train Elastic Net age-predictors unadjusted and adjusted for CTH, respectively (Methods). Adjustment for CTH was performed at the resolution of 12 immune cell subtypes using a validated DNAm reference matrix encompassing naïve and memory B-cells, naïve and memory CD4T-cells, naïve and memory CD8T-cells, T-regulatory cells, natural-killer cells, monocytes, neutrophils, eosinophils and basophils [24] (Methods). We also built an elastic net clock adjusting for 9 immune-cell subtypes, by merging together the estimated fractions of the naïve and memory lymphocyte subsets, allowing us to assess the importance of the known age-related shift from naïve to mature T-cell subsets in driving an epigenetic clock’s accuracy (Methods) [24]. We then estimated R2 values of the 3 clocks in 11 independent whole blood DNAm datasets (Methods). This revealed that the unadjusted clock achieved substantially higher R2 values than the two adjusted clocks, especially when compared to the clock adjusted for all 12 immune cell types (Figure 1B and Supplementary Figures 1, 2). We estimated that the intrinsic aging component accounts for approximately 61% of the predictive accuracy of the unadjusted clock (Figure 1B), which means that extrinsic aging accounts for a substantial fraction of a clock’s accuracy (39%). Of note, not adjusting for relative shifts in naïve to memory subsets, we would have over-and-under estimated the intrinsic and extrinsic aging components to be approximately 84% and 16%, respectively (Figure 1B). By assembling a large collection of brain DNAm datasets (Methods, Supplementary Table 1) and adjusting for 7 brain cell-type fractions (GABAergic excitatory neurons, glutamatergic inhibitory neurons, astrocytes, microglia, oligodendrocytes, endothelial cells and stromal cells), as estimated with the HiBED algorithm [46], we verified that the intrinsic component of aging in brain is around 88%, higher than the estimate obtained in blood (Figure 1C and Supplementary Figure 3). To see which brain cell-types are driving the extrinsic aging component, we performed a fixed and random effects meta-analysis over 13 Illumina DNAm brain datasets, encompassing over 2,000 samples (Supplementary Table 1), which revealed an age-associated increase of excitatory neurons, whilst inhibitory neurons and astrocytes decreased (Figure 2). Thus, this age-related shift in neuronal subtypes accounts for the 12% extrinsic aging component. Of note, these findings are broadly consistent with recent single-nucleus DNAm studies reporting reduced inhibitory neuron fractions in the brain of aged donors [47, 48]. Overall, these results obtained in blood and brain tissue highlight the critical importance of adjusting for CTH when building epigenetic clocks.

Quantification of intrinsic and extrinsic aging. (A) Graphical depiction of how the predictive accuracy of an epigenetic clock derived from bulk-tissue can be decomposed in terms of an extrinsic and intrinsic aging process, as shown. The accuracy, measured by the R2 value, of a clock reflecting the extrinsic process can be estimated from the given formula, if one can estimate the accuracies of the full unadjusted clock and the one fully adjusted for cell-type heterogeneity (CTH), the latter reflecting the intrinsic aging of the individual cell-types. (B) Barplots compare the R2 values of the unadjusted clock, and clocks adjusted for 12 and 9 immune cell types, in each of 11 whole blood cohorts. Right panel displays the average and standard deviation of the two indicated ratios across the 11 cohorts. These ratios measure the fraction of the unadjusted clock’s accuracy that can be attributed to intrinsic aging. (C) As (B) but for brain DNAm datasets, where the adjusted clock was adjusted for 7 brain cell-types.

Figure 1. Quantification of intrinsic and extrinsic aging. (A) Graphical depiction of how the predictive accuracy of an epigenetic clock derived from bulk-tissue can be decomposed in terms of an extrinsic and intrinsic aging process, as shown. The accuracy, measured by the R2 value, of a clock reflecting the extrinsic process can be estimated from the given formula, if one can estimate the accuracies of the full unadjusted clock and the one fully adjusted for cell-type heterogeneity (CTH), the latter reflecting the intrinsic aging of the individual cell-types. (B) Barplots compare the R2 values of the unadjusted clock, and clocks adjusted for 12 and 9 immune cell types, in each of 11 whole blood cohorts. Right panel displays the average and standard deviation of the two indicated ratios across the 11 cohorts. These ratios measure the fraction of the unadjusted clock’s accuracy that can be attributed to intrinsic aging. (C) As (B) but for brain DNAm datasets, where the adjusted clock was adjusted for 7 brain cell-types.

Meta-analysis of brain cell-type fractions with age. (A) Forest plots of associations between brain cell-type fractions and age, for 7 brain cell-types (Abbreviations: ExN: excitatory neurons; InN: inhibitory neurons; Oligo/OPC: oligodendrocytes/oligo precursor cells; Micro: microglia; Astro: astrocytes, Stromal; Endo: endothelial) across 13 independent DNAm datasets. Number of samples in each cohort is given in brackets. P-values from a Fixed and Random Effects models are also given. Heterogeneity statistics and P-values of heterogeneity are given above each plot. (B) Effect sizes and P-values from the Random Effects meta-analysis model.

Figure 2. Meta-analysis of brain cell-type fractions with age. (A) Forest plots of associations between brain cell-type fractions and age, for 7 brain cell-types (Abbreviations: ExN: excitatory neurons; InN: inhibitory neurons; Oligo/OPC: oligodendrocytes/oligo precursor cells; Micro: microglia; Astro: astrocytes, Stromal; Endo: endothelial) across 13 independent DNAm datasets. Number of samples in each cohort is given in brackets. P-values from a Fixed and Random Effects models are also given. Heterogeneity statistics and P-values of heterogeneity are given above each plot. (B) Effect sizes and P-values from the Random Effects meta-analysis model.

A computational strategy to build cell-type specific epigenetic clocks

In order to dissect the intrinsic aging component, we next devised a computational strategy to build cell-type specific epigenetic clocks (Figure 3A). After estimation of the cell-type fractions in a given bulk-tissue DNAm dataset, we apply the CellDMC algorithm [45] to a training subset of the data to identify age-associated differentially methylated cytosines in each cell-type of interest (age-DMCTs) (Figure 3A). CellDMC uses the estimated cell-type fractions as weights in a model that expresses a CpG’s DNAm profile across samples as a corresponding weighted mixture of latent phenotype-dependent DNAm profiles. In the present context, the phenotype is age and the described model includes interaction terms between age and the cell-type fractions (Methods). These interaction terms allow cell-type specific age-DMCTs to be found (see Figure 3A for a hypothetical example). Assuming there is a substantial number of age-DMCTs for a given cell-type, we then apply an Elastic Net Regression model [49] to train predictive models for chronological age parameterized by a penalty parameter (Figure 3B). This training is restricted to age-DMCTs of the given cell-type and can be performed either on the residuals of the DNAm data matrix, obtained by regressing out the effect of cell-type composition, or on the unadjusted DNAm data matrix (Methods). In the former case, we define the clocks as being “intrinsic”, otherwise they are “semi-intrinsic”. The best performing models in a blinded model-selection set define our cell-type specific intrinsic and semi- intrinsic epigenetic clocks, one for each cell-type (Figure 3B). These clocks are then validated in independent DNAm datasets.

Computational strategy for building cell-type specific epigenetic clocks. (A) Given a tissue-type of interest with DNAm profiles measured over a relatively large number of samples, we first estimate the underlying cell-type proportions in each sample using an existing tissue-specific DNAm reference matrix. Density plots depict the distribution of cell-type fractions across all samples. Next, using a sufficiently large training set of samples encompassing a relatively wide age-range, we apply the CellDMC algorithm to infer age-associated DNAm changes in each of the underlying cell-types (age-DMCTs). Barplot depicts the number of age-DMCTs in each cell-type. (B) The construction of cell-type specific clocks then proceeds by restricting to age-DMCTs of one cell-type: an intrinsic clock is built by adjusting the DNAm training dataset for variations in cell-type fractions (CTFs), defining a matrix of DNAm residuals. Alternatively, the training over the age-DMCTs can be done on the DNAm data matrix without adjustment for CTFs which will result in a “semi-intrinsic” clock. In either case, Elastic Net models are learned for each choice of a penalty parameter L, and the optimal model L* is selected based on the best generalization performance (smallest root mean square error (RMSE)) obtained in a blinded model selection set. This optimal model then defines the corresponding cell-type specific DNAm-clock. This procedure can be done for each cell-type separately, assuming that sufficient numbers of age-DMCTs in that cell-type can be identified. Once the cell-type specific clocks are built, these are then validated in independent DNAm datasets. (C) Top row: boxplots display the root mean square error (RMSE) between predicted and true ages, for semi-intrinsic (Sin) and intrinsic (In) clocks, as assessed in 50 simulated validation sets of 200 mixtures for four different scenarios. Cell-type specific clocks were constructed from 50 simulated training sets of 200 mixtures (mixing together 3 cell-types that we call granulocytes, monocytes and lymphocytes) with age-DMCTs occurring only in one cell-type (lymphocytes). In scenarios (i) and (iii), no cell-type fractions change with age, in scenarios (ii) and (iv) the lymphocyte fraction changes with age. In scenarios (i) and (ii), age-DMCTs do not discriminate immune-cell types from each other, in scenarios (iii) and (iv) age-DMCTs discriminate lymphocytes from granulocytes and monocytes. P-values from a two-tailed paired Wilcoxon test are given. Bottom row: as top-row but now displaying the PCC of predicted vs. true age.

Figure 3. Computational strategy for building cell-type specific epigenetic clocks. (A) Given a tissue-type of interest with DNAm profiles measured over a relatively large number of samples, we first estimate the underlying cell-type proportions in each sample using an existing tissue-specific DNAm reference matrix. Density plots depict the distribution of cell-type fractions across all samples. Next, using a sufficiently large training set of samples encompassing a relatively wide age-range, we apply the CellDMC algorithm to infer age-associated DNAm changes in each of the underlying cell-types (age-DMCTs). Barplot depicts the number of age-DMCTs in each cell-type. (B) The construction of cell-type specific clocks then proceeds by restricting to age-DMCTs of one cell-type: an intrinsic clock is built by adjusting the DNAm training dataset for variations in cell-type fractions (CTFs), defining a matrix of DNAm residuals. Alternatively, the training over the age-DMCTs can be done on the DNAm data matrix without adjustment for CTFs which will result in a “semi-intrinsic” clock. In either case, Elastic Net models are learned for each choice of a penalty parameter L, and the optimal model L* is selected based on the best generalization performance (smallest root mean square error (RMSE)) obtained in a blinded model selection set. This optimal model then defines the corresponding cell-type specific DNAm-clock. This procedure can be done for each cell-type separately, assuming that sufficient numbers of age-DMCTs in that cell-type can be identified. Once the cell-type specific clocks are built, these are then validated in independent DNAm datasets. (C) Top row: boxplots display the root mean square error (RMSE) between predicted and true ages, for semi-intrinsic (Sin) and intrinsic (In) clocks, as assessed in 50 simulated validation sets of 200 mixtures for four different scenarios. Cell-type specific clocks were constructed from 50 simulated training sets of 200 mixtures (mixing together 3 cell-types that we call granulocytes, monocytes and lymphocytes) with age-DMCTs occurring only in one cell-type (lymphocytes). In scenarios (i) and (iii), no cell-type fractions change with age, in scenarios (ii) and (iv) the lymphocyte fraction changes with age. In scenarios (i) and (ii), age-DMCTs do not discriminate immune-cell types from each other, in scenarios (iii) and (iv) age-DMCTs discriminate lymphocytes from granulocytes and monocytes. P-values from a two-tailed paired Wilcoxon test are given. Bottom row: as top-row but now displaying the PCC of predicted vs. true age.

The reasoning behind defining separate intrinsic and semi-intrinsic clocks can be understood with simulation models. We generated simulated DNAm mixtures of 3 underlying cell-types (granulocytes, monocytes and lymphocytes), and in which age-DMCTs are introduced into only one cell-type (lymphocytes) under four different scenarios (Methods): (i) age-DMCTs do not map to cell-type specific CpGs and no cell-type fractions change with age, (ii) as (i) but with the lymphocyte fraction changing with age, (iii) age-DMCTs discriminate lymphocytes from the other two cell-types and no cell-type fraction changes with age, (iv) as (iii) but with the lymphocyte fraction changing with age. For each scenario, we generated training and validation sets, deriving lymphocyte specific intrinsic and semi-intrinsic clocks, subsequently validating them in the independent test sets. Interestingly, whilst the intrinsic clock displayed marginally better performance in the scenarios where cell-type fractions did not change with age, the semi-intrinsic clock was substantially better in cases where the lymphocyte fraction did change with age (Figure 3C). Thus, the use of residuals in scenarios where a given cell-type fraction changes with age can dilute out the biological signal in that cell-type, and under these circumstances, the cell-type specific semi-intrinsic clock would be preferable.

Construction and validation of neuron-specific epigenetic clocks

Next, we applied this strategy to the case of brain tissue. For training the clocks, we used one of the largest collections of prefrontal cortex (PFC) samples profiled with Illumina 450k technology [50], encompassing 416 samples with a wide age range (18–97 years) (Supplementary Table 1 and Figure 4A). Although the HiBED algorithm [46] inferred fractions for 7 brain cell-types (Supplementary Figure 4A), for subsequent CellDMC analysis, power considerations [45, 51] required us to collapse the inferred fractions to 3 broad cell-types: neurons, glia and endothelial/stromal (Figure 4A). Applying CellDMC with the estimated fractions, disease status and sex as covariates, resulted in 16,814 neuron-specific and 11,157 glia-specific age-DMCTs (Figure 4A, Methods). Visualization of representative age-DMCTs confirmed the cell-type specific nature of their age-associated DNAm changes (Figure 4B). Interestingly, neuron and glia age-DMCTs displayed a weak marginal co-localization with the cell-type specific CpGs in the brain DNAm reference matrix (Supplementary Figure 5). Focusing on the neuron-DMCTs and adjusting the DNAm data for variations in cell-type fractions, we next applied the Elastic Net regression framework to learn different models specified by a penalty parameter. For model (i.e. optimal parameter) selection, we used a completely independent Illumina DNAm dataset from Philstrom et al. [52] which included 67 PFC control samples with an age range 49–100 years (Figure 4C and Supplementary Table 1). We declared the model achieving best generalization performance in the Philstrom dataset, as defining our neuron-specific intrinsic (“Neu-In”) clock (Figure 4C, Supplementary Figure 4B and Supplementary Table 2). This neuron-specific clock was further validated in 5 independent datasets of sorted neuron samples, encompassing a total of 143 samples (Figure 4D, Supplementary Figure 4C and Supplementary Table 1). An analogous neuron “semi-intrinsic” clock (“Neu-Sin”) was obtained by applying the same training procedure to the same age-DMCTs in the Jaffe et al. DNAm data but now without adjusting the DNAm data for variations in cell-type fractions (Methods, Figure 4D, Supplementary Figure 4D and Supplementary Table 3). To assess specificity of the clocks to neurons and brain tissue, we assembled a large collection of Illumina DNAm datasets encompassing other brain cell-types (glia), other sorted non-brain cell-types and bulk tissue-types (Supplementary Table 1, Methods, and Supplementary Methods). As a further control, we also benchmarked our Neu-In/Sin clocks to two other clocks: a clock obtained by running the Elastic Net on all CpGs but adjusting the DNAm data for cell-type fractions (Methods, Figure 4D, Supplementary Figure 4E and Supplementary Table 4), thus defining a brain, but not neuron-specific, clock (“BrainClock”), and the multi-tissue Horvath clock. We restricted the assessment of all clocks to control samples only, to avoid potential confounding by disease. First, we observed that the neuron intrinsic and semi-intrinsic clocks also displayed good correlations with chronological age in sorted glia datasets, although the largest of these glia datasets displayed a significantly weaker association, suggesting that our neuron-clocks are indeed neuron-specific (Figure 4D). In line with this, a multivariate linear regression of correlation strength against cell-type (neuron vs. glia), weighted by cohort size, revealed a significantly higher correlation in neurons compared to glia for the neuron-intrinsic clock but not for the other 3 clocks (Supplementary Figure 6A). When assessed between brain and non-brain tissue/cell-types, we observed that the two neuron-specific clocks as well as the brain-specific clock achieved better predictions of chronological age in the brain-tissue datasets compared to non-brain ones, whilst, as expected for a multi-tissue clock, Horvath’s clock performed similarly between brain and non-brain tissues (Figure 4E, 4F). Importantly, and despite the relatively small number of sorted neuron datasets, the neuron-specific clocks also displayed stronger correlations with age in the sorted neuron datasets compared to the brain-specific clock (Supplementary Figure 6B). Thus, overall, our data support the view that the neuron-specific clocks are indeed optimized to predict chronological age in brain cell subtypes and that they are less confounded by changes in cell-type composition compared to e.g. the brain-specific clock.

Construction and validation of neuron-clocks. (A) The training is done on the Jaffe et al. dataset encompassing 416 prefrontal cortex (PFC) samples. Fractions for 3 broad cell-types (neurons, glia and endothelial/stromal) are inferred using the HiBED algorithm. Subsequently, CellDMC is applied to infer DMCTs in each cell-type. The Fisher-test one-tailed P-value of overlap between Neu-DMCTs and Glia-DMCTs is given. Finally, elastic net predictors of chronological age are trained from the Neu-DMCTs parameterized by a penalty parameter. (B) Scatterplots of adjusted DNAm values (adjusted for variations in cell-type fractions) of one Neu-DMCT and one Glia-DMCT against the corresponding Neuron and Glia fraction, respectively. Samples have been colored according to age-group and regression lines for each age-group have been added. (C) Optimal model (parameter) selection is performed using a completely independent PFC dataset from Philstrom et al. Scatterplot displays the predicted vs. chronological age for the optimal model. (D) Barplot of Pearson Correlation Coefficients (PCC) for 4 clocks (Neu-In, Neu-Sin, Brain and Horvath) across sorted neuron and glia datasets. Number of samples in each dataset is given in brackets. (E) Barplot of Pearson Correlation Coefficients (PCC) for the same 4 clocks (Neu-In, Neu-Sin, Brain and Horvath) across brain and non-brain DNAm datasets. Each cohort is labeled with the number of samples in brackets. Abbreviations: WB: whole blood; PFC: prefrontal cortex; TL: temporal lobe; FL: frontal lobe; MTG/STG: middle/superior temporal gyrus; PL: parietal lobe; SC: sensory cortex; VC: visual cortex; MC: motor cortex; Hippo: hippocampus; EC: entorhinal cortex; Stria: striatum; CG: cingulate gyrus; NAc: nucleus accumbens; CRB: cerebellum; Mono: monocyte. (F) For each of the 4 clocks, boxplots of PCCs comparing the values obtained in brain vs. non-brain datasets. The number of datasets in each group is given below x-axis. P-value is from a one-tailed Wilcoxon rank sum test.

Figure 4. Construction and validation of neuron-clocks. (A) The training is done on the Jaffe et al. dataset encompassing 416 prefrontal cortex (PFC) samples. Fractions for 3 broad cell-types (neurons, glia and endothelial/stromal) are inferred using the HiBED algorithm. Subsequently, CellDMC is applied to infer DMCTs in each cell-type. The Fisher-test one-tailed P-value of overlap between Neu-DMCTs and Glia-DMCTs is given. Finally, elastic net predictors of chronological age are trained from the Neu-DMCTs parameterized by a penalty parameter. (B) Scatterplots of adjusted DNAm values (adjusted for variations in cell-type fractions) of one Neu-DMCT and one Glia-DMCT against the corresponding Neuron and Glia fraction, respectively. Samples have been colored according to age-group and regression lines for each age-group have been added. (C) Optimal model (parameter) selection is performed using a completely independent PFC dataset from Philstrom et al. Scatterplot displays the predicted vs. chronological age for the optimal model. (D) Barplot of Pearson Correlation Coefficients (PCC) for 4 clocks (Neu-In, Neu-Sin, Brain and Horvath) across sorted neuron and glia datasets. Number of samples in each dataset is given in brackets. (E) Barplot of Pearson Correlation Coefficients (PCC) for the same 4 clocks (Neu-In, Neu-Sin, Brain and Horvath) across brain and non-brain DNAm datasets. Each cohort is labeled with the number of samples in brackets. Abbreviations: WB: whole blood; PFC: prefrontal cortex; TL: temporal lobe; FL: frontal lobe; MTG/STG: middle/superior temporal gyrus; PL: parietal lobe; SC: sensory cortex; VC: visual cortex; MC: motor cortex; Hippo: hippocampus; EC: entorhinal cortex; Stria: striatum; CG: cingulate gyrus; NAc: nucleus accumbens; CRB: cerebellum; Mono: monocyte. (F) For each of the 4 clocks, boxplots of PCCs comparing the values obtained in brain vs. non-brain datasets. The number of datasets in each group is given below x-axis. P-value is from a one-tailed Wilcoxon rank sum test.

Neuron and glia-specific clocks are accelerated in Alzheimer’s Disease

Next, we asked if the clocks predict age-acceleration in neurodegeneration. To this end, we collated eleven DNAm-studies of Alzheimer’s Disease encompassing two brain regions, frontal lobe (FL) and temporal lobe (TL) (Supplementary Table 5), amounting to a total of 213 AD cases and 185 controls in FL, and 456 cases and 341 controls in TL. As before, we first performed a meta-analysis of cell-type fractions to see if any of these fractions change in AD (Methods). This revealed a significant decrease of excitatory neurons and corresponding increases of oligodendrocyte/OPC and microglia fractions in AD (Supplementary Figure 7). Whilst the decreased excitatory neuron fraction was observed in both TL and FL, the microglia and oligo/OPC increases were more specific to the frontal and temporal lobes, respectively (Supplementary Figure 8). Thus, only the excitatory neuron fraction is observed to change with both age and AD (Figure 2 and Supplementary Figure 7), although interestingly the increased fraction seen with age is reversed in AD cases. For each of the 4 clocks in each brain region we then performed a fixed and random effects meta-analysis for AD (Methods), revealing a weak but statistically significant age-acceleration in AD cases for the neuron-intrinsic clock in the TL, but not for brain-specific and Horvath’s clocks (Figure 5A, 5B). In the case of Horvath’s clock, the association remained non-significant upon adjustment for brain cell-type fractions (Supplementary Figure 9). The association of the neuron-intrinsic clock in the PFC/FL was marginally significant. Given that the I2 test for heterogeneity showed great consistency in the effect sizes between cohorts and regions (Figure 5A, 5B), we extended the meta-analysis to include all datasets from both regions, now revealing significant associations for both intrinsic and semi-intrinsic neuron-clocks (Figure 5C). Because all of the AD datasets considered in this analysis are from bulk-tissue, we wondered if a similar finding would apply to the other major cell-type in brain, i.e. glia. To this end, we used the previous Jaffe and Philstrom DNAm datasets to build and validate analogous glia intrinsic (Glia-In) and semi-intrinsic (Glia-Sin) clocks (Methods, Supplementary Figure 10, Supplementary Tables 6, 7), subsequently computing their DNAm-Ages in the AD datasets. Interestingly, this also revealed age-acceleration effects in the glia compartment of both FL and TL regions, with the effect being strongest for the Glia-Sin clock (Figure 5A5C). Comparing the predicted age-acceleration of the Neu-In with the Glia-In clock, and that of the Neu-Sin with the Glia-Sin clocks, revealed moderate but significant correlations between the Neu and Glia clock estimates (Supplementary Figure 11). Thus, age-acceleration patterns in neurons and glia are similar, despite the two clocks being made up of entirely distinct sets of CpGs. In summary, these data indicate that both neuron and glia specific clocks display age-acceleration in AD, and that this acceleration is more prominent in the TL glia fraction.

Neuron and glia specific clocks predict age-acceleration in AD. (A) Forest plot of associations between DNAmAge and Alzheimer’s Disease (AD) for 6 different epigenetic clocks across 5 independent DNAm datasets from frontal lobe (FL) or prefrontal cortex (PFC). The effect sizes and P-values of a fixed and random effects models are also given. Statistics and P-values of heterogeneity are shown above each panel. Number of samples in each dataset is given to the left. (B) As (A) but for DNAm datasets profiling regions in the temporal lobe (TL), which includes middle and superior temporal gyrus (MTG and STG). (C) As (A), but combing FL and TL regions.

Figure 5. Neuron and glia specific clocks predict age-acceleration in AD. (A) Forest plot of associations between DNAmAge and Alzheimer’s Disease (AD) for 6 different epigenetic clocks across 5 independent DNAm datasets from frontal lobe (FL) or prefrontal cortex (PFC). The effect sizes and P-values of a fixed and random effects models are also given. Statistics and P-values of heterogeneity are shown above each panel. Number of samples in each dataset is given to the left. (B) As (A) but for DNAm datasets profiling regions in the temporal lobe (TL), which includes middle and superior temporal gyrus (MTG and STG). (C) As (A), but combing FL and TL regions.

Construction and validation of a hepatocyte-specific epigenetic clock

As a second example, we considered the case of liver tissue, aiming to build a hepatocyte-specific epigenetic clock. We used our EpiSCORE algorithm [53] and liver DNAm reference matrix defined over 5 cell-types (hepatocytes, cholangiocytes, Kupffer cells, endothelial cells and lymphocytes) [44] to estimate corresponding cell-type fractions in a series of 210 normal liver specimens (age range = 18–75, mean age = 47, sd = 12) with available Illumina EPIC DNAm profiles [54] (Methods). Liver cell-type fractions changed with chronological age, although this was mainly restricted to cholangiocytes and endothelial cells, a pattern that was replicated in independent cohorts (Supplementary Figure 12). Next, we assessed if we can reliably identify age-DMCTs. To this end we applied CellDMC [45] to all 210 samples, revealing that most age-DMCTs were in either hepatocytes or cholangiocytes, with hepatocyte and cholangiocyte age-DMCTs displaying strong overlap but with opposite directionality of change (Supplementary Figure 13A13D). As with the age-DMCTs in neurons and glia, here too we observed a preferential co-localization to marker CpGs in our liver DNAm reference matrix (Supplementary Figure 5). To validate the hepatocyte age-DMCTs, we computed the average DNAm over hepatocyte age-hypermethylated DMCTs in an independent set of 55 primary hepatocyte cultures from donors of different ages (Methods), which revealed the expected positive correlation with age (Supplementary Figure 13E). Correspondingly, hepatocyte age-hypomethylated DMCTs displayed the expected anti-correlation, with the pattern exactly inverted for cholangiocyte age-DMTCs (Supplementary Figure 13E). To further validate the hepatocyte and cholangiocyte age-DMCTs, we collected three additional liver DNAm datasets (Methods) and repeated the CellDMC analysis in each one. This confirmed the strong overlap between hepatocyte and cholangiocyte age-DMCTs in each validation cohort (Supplementary Figures 13F and 14). This strong validation supports the view that we can confidently identify age-associated DMCTs in hepatocytes and that these display opposite directionality of change in cholangiocytes.

Having demonstrated that we can successfully identify hepatocyte age-DMCTs, we next applied a 10-fold cross-validation procedure to build a hepatocyte-specific semi-intrinsic clock (Methods). Of note, this procedure involved re-applying the CellDMC algorithm on the full set of CpGs but now using only on 9 folds, with the leave-one-out bag being used for model selection (Methods). This procedure was iterated 10 times, each time using a different bag as the leave-one-out test set, a procedure that avoids overfitting [55], which resulted in a final age-predictor (“HepClock”) (Supplementary Table 8 and Figure 6A). We also trained an analogous clock but now using age-DMCs derived from a linear model that only adjusts for cell-type fractions, the resulting clock thus being liver-specific but not hepatocyte-specific (“LiverClock”, Methods, Supplementary Table 9). To validate and benchmark these clocks, we applied them to the DNAm dataset of primary hepatocyte cultures (Methods). Although R-values (Pearson Correlation Coefficients) were similar for all clocks, HepClock attained a significantly lower median absolute error of 4.9 years compared to values of 20 and 10 for the liver and Horvath clocks, respectively (Figure 6B). This suggests that improved predictions of chronological age in sorted hepatocytes are indeed possible with a corresponding hepatocyte specific clock. Next, we applied HepClock and LiverClock to a collection of DNAm datasets from bulk-liver tissue, sorted immune-cell types, as well as other tissue-types including blood, skin, prostate, colon and kidney (Methods). This confirmed the clear specificity of HepClock and LiverClock to liver tissue (Figure 6C). We note that Horvath’s clock was excluded from this analysis, because some of the liver DNAm datasets were used for the original training of the Horvath clock itself [28].

Construction and validation of hepatocyte clock. (A) Overall strategy used to train a hepatocyte specific clock from bulk liver tissue samples. Plots show how the root mean square error (RMSE) changes as a function of the penalty parameter in the elastic net regression model, and scatterplot at the bottom compares the predicted and chronological ages for the optimal model defining the hepatocyte-clock. Pearson Correlation Coefficient (PCC), Median Absolute Error (MedAE) and regression P-value are given. (B) Validation of the hepatocyte and liver clocks (latter trained on age-DMCs, not hepatocyte-DMCTs) in a primary hepatocyte culture. Also shown is the result for Horvath’s clock. PCC, MedAE and P-values from a linear regression are given. (C) Barplot comparing the PCC-values of the hepatocyte and liver clocks across several liver and non-liver tissue datasets. Boxplots compare the PCC values of the hepatocyte and liver clocks in the liver/hepatocyte dataset compared to all others. P-value is from a one-tailed Wilcoxon rank sum test. (D) Boxplots compare the extrinsic age-acceleration (EAA) of the liver, hepatocyte and Horvath clocks in two independent liver DNAm datasets, one profiling non-alcoholic fatty liver disease (NAFLD) cases and the other profiling liver-tissue from obese (BMI >35) individuals. In each case, the P-value is from a one-tailed Wilcoxon rank sum test comparing the distribution to a mean value of 0 (red dashed line). (E) Boxplots of EAA from the three clocks in an independent DNAm liver dataset, with samples stratified according to BMI levels, as shown. For each panel, we give the PCC and linear regression test P-value.

Figure 6. Construction and validation of hepatocyte clock. (A) Overall strategy used to train a hepatocyte specific clock from bulk liver tissue samples. Plots show how the root mean square error (RMSE) changes as a function of the penalty parameter in the elastic net regression model, and scatterplot at the bottom compares the predicted and chronological ages for the optimal model defining the hepatocyte-clock. Pearson Correlation Coefficient (PCC), Median Absolute Error (MedAE) and regression P-value are given. (B) Validation of the hepatocyte and liver clocks (latter trained on age-DMCs, not hepatocyte-DMCTs) in a primary hepatocyte culture. Also shown is the result for Horvath’s clock. PCC, MedAE and P-values from a linear regression are given. (C) Barplot comparing the PCC-values of the hepatocyte and liver clocks across several liver and non-liver tissue datasets. Boxplots compare the PCC values of the hepatocyte and liver clocks in the liver/hepatocyte dataset compared to all others. P-value is from a one-tailed Wilcoxon rank sum test. (D) Boxplots compare the extrinsic age-acceleration (EAA) of the liver, hepatocyte and Horvath clocks in two independent liver DNAm datasets, one profiling non-alcoholic fatty liver disease (NAFLD) cases and the other profiling liver-tissue from obese (BMI >35) individuals. In each case, the P-value is from a one-tailed Wilcoxon rank sum test comparing the distribution to a mean value of 0 (red dashed line). (E) Boxplots of EAA from the three clocks in an independent DNAm liver dataset, with samples stratified according to BMI levels, as shown. For each panel, we give the PCC and linear regression test P-value.

Hepatocyte specific clock predicts biological age-acceleration in liver pathologies

We next applied HepClock, LiverClock and Horvath’s clock to 3 separate DNAm studies representing a variety of liver pathologies, in order to assess the ability of these clocks to predict biological age-acceleration. One study encompassed 115 liver samples from patients with non-alcoholic fatty liver disease (NAFLD) [54], another set comprised 67 liver tissue samples from obese individuals (BMI >35) [56], with the 3rd cohort comprising 94 liver tissue samples from patients with Alpha-1 Antitrypsin Deficiency (AATD) [57], a genetically inherited disorder that increases the risk of cirrhosis (Methods). Whilst on the NAFLD samples all three clocks were accelerated, the observed acceleration was strongest for the Liver and HepClocks (Figure 6D). On the clinically obese samples only HepClock displayed a very significant age-acceleration effect, with Horvath’s clock being marginally associated (Figure 6D). In the AATD-cohort, only the Liver and HepClocks displayed age-acceleration (Figure 6D). In another independent liver-tissue DNAm dataset encompassing healthy non-obese and obese individuals [58, 59], HepClock and Horvath clock age-acceleration measures displayed a correlation with BMI, but not so for the LiverClock (Figure 6E), mirroring the results in the previous cohort of 67 obese individuals (Figure 6D). The age-acceleration effect in the hepatocytes of NAFLD and obese livers is noteworthy, because the hepatocyte fraction itself decreases substantially in NAFLD and obese individuals (Supplementary Figure 15). Overall, as assessed over the 4 studies, the hepatocyte-specific epigenetic clock improves the sensitivity to detect biological age acceleration in the hepatocytes of liver tissue in variety of different liver pathologies including an inherited one where Horvath’s clock did not predict age-acceleration.

Minimal but significant overlaps with chronological age clocks

Having demonstrated how cell-type specific clocks predict age-acceleration in relevant conditions and cell-types, it is important to study the nature of the specific CpGs making up these clocks. We first assessed overlaps of our clock CpGs with previous clocks including Horvath [28], Zhang [11] and PhenoAge [3]. In general, although overlaps were small in absolute terms, some were statistically significant (Supplementary Figure 16). For instance, most of our cell-type specific clock CpGs displayed a significant overlap with CpGs from Zhang and Horvath’s clocks, two clocks trained to predict chronological age. The smaller non-significant overlap with PhenoAge clock CpGs may indicate that our cell-type specific clocks in brain and liver are capturing different facets of biological aging compared to PhenoAge.

Minimal but significant overlap with DamAge causality clock

The overlap with the recently proposed causal DamAge and AdaptAge clocks [60] was also small in absolute terms, but interestingly the neuron and glia-clock CpGs displayed a significant overlap with the DamAge clock (Binomial P < 0.001) and not with AdaptAge (Binomial P > 0.05) (Supplementary Figure 16). The overlapping CpGs from Neu-In and Neu-Sin with DamAge mapped to 4 genes (SESN2, EPS8L2, SGMS1, ZNF642), whilst the corresponding overlapping DamAge-CpGs from Glia-In and Glia-Sin mapped to 3 genes (ST3GAL4, CBX7, TOMM40L) (Supplementary Table 10). Supporting the statistical significance of these overlaps with the neuron and glia clocks, these overlapping genes have previously been implicated in AD. For instance, SESN2 plays an important role in AD progression [61, 62] and is regarded as a diagnostic AD biomarker [63]. The expression level of EPS8L2 is significantly reduced in an AD mouse model [64], whilst SGMS1 is functionally implicated in AD pathogenesis [65], displaying elevated expression levels in the brain of AD cases [66]. Interestingly, ZNF642 is upregulated in Parkinson’s disease [67], and Zinc finger proteins are well-known to be associated with neurological diseases [68]. One of the shared DamAge glia-clock CpGs mapped to the promoter of ST3GAL4, a gene associated with brain atrophy in AD patients [69, 70] and that correlates with Braak staging [71]. Expression of PRC1 component CBX7 has been shown to regulate the AD related INK4b-ARF-INK4a locus [72] and is associated with suppression of axon growth and regeneration [73]. Finally, TOMM40L is in linkage disequilibrium with APOE and contributes synergistically to the risk of Alzheimer’s Disease [7477]. Specifically, the APOEɛ4-TOMM40 523 L haplotype is associated with a higher risk and shorter times of conversion from mild cognitive impairment (MCI) to AD [78], possibly driven by mitochondrial dysfunction [75]. Overall, these data demonstrate that our brain cell-type specific clocks do map to CpGs and genes that have been causally and negatively implicated in neurodegeneration and AD, which is consistent with our neuron and glia clocks displaying age-acceleration in AD.

The Neu-In clock CpGs are enriched for AD EWAS hits and mQTLs

Given the associations of our clocks with AD and BMI, we next asked if our clock CpGs overlap with AD or BMI associated differentially methylated cytosines (DMCs), as determined by corresponding EWAS [79, 80] (Methods). Whilst for the brain-clocks we did observe a significant overlap with AD-DMCs, this was not evident for the Hep/LiverClocks and BMI-DMCs (Supplementary Figure 17).

Next, we asked if our clock CpGs are associated with mQTLs. We performed an enrichment against three broad classes of mQTLs: adult blood derived mQTLs as given by the ARIES database [81], mQTLs derived from the eGTEX consortium [82] which are available for a number of different tissue-types including colon, prostate, ovary, lung, breast, kidney, skeletal muscle, testis and whole blood, and mQTLs derived for fetal and adult brain [83, 84] (Methods). Using the ARIES database, this revealed a significant enrichment for mQTLs among brain clock CpGs, but only marginal associations for the Hep/LiverClocks (Supplementary Figure 18A), probably because of the smaller number of Hep/LiverClock CpGs, itself reflecting the smaller training set in the case of liver tissue. Interestingly, all brain clock CpGs displayed significant enrichment for mQTLs as defined in adult brain, but not fetal brain, although this negative finding could be due to the smaller number of mQTLs in fetal brain (Supplementary Figure 18B). For the mQTLs as defined by eGTEX we also observed enrichment, but this was restricted mainly to the neuron-clocks and tissues with the largest numbers of mQTLs (e.g. colon, lung, prostate and ovary) (Supplementary Figure 18C), suggesting that other associations may become significant if we had more CpGs in either the clock or mQTL list. Supporting this, we note that we did not observe any enrichment for mQTLs as derived from whole blood in eGTEX, whilst we did observe enrichment against the larger mQTL ARIES database. In the case of the Neu-In clock and blood mQTLs from ARIES, we observed 2 clock-CpGs defining cis-mQTLs where the linked SNP has been associated with AD [85] (Supplementary Figure 18D and Supplementary Table 11). Interestingly, these cis-mQTLs map to the gene CASS4, a CAS scaffolding protein family member involved in the formation of neuritic plaques and neurofibrillary tangles, as well as in the disruption of synaptic connections in AD [86, 87]. Of note, Neu-In clock CpGs defining cis-mQTLs in adult brain also included SNPs that mapped to CASS4. In addition, another Neu-In/Neu-Sin clock CpG defines a cis-mQTL in brain mapping to the TSPAN14 gene (Supplementary Figure 18D), which is associated with increased AD risk [85, 88, 89]. Thus, a minor component of the neuron-specific clocks involves CpGs whose DNAm-levels are also influenced by genetic variants associated with AD.

Discussion

Here we have quantified, within the context of epigenetic clocks, the relative contributions of extrinsic and intrinsic aging, demonstrating that the extrinsic component in blood can account for up to around 40% of an epigenetic clock’s accuracy, whilst in brain this fraction is much lower at around 12%. Thus, whilst most of an epigenetic clock’s accuracy is intrinsic, driven by age-associated DNAm changes in individual cell-types, the age-related shifts in tissue composition can contribute a substantial amount. For instance, based on our analysis, epigenetic clocks like Horvath or Zhang clocks [11, 28], which typically attain an R2 value of 0.8 or higher, would likely only display R2 values of approximately 0.5 if one were to rederive the clocks adjusting for 12 immune-cell type fractions. Whilst in the case of blood, the extrinsic component is driven by an age-related increase of the mature to naïve T-cell fractions, in brain, this is driven by an age-related shift in the excitatory to inhibitory neuron fractions.

Our second key contribution is the dissection of the intrinsic component of aging through construction of cell-type specific epigenetic clocks. As shown here in the case of brain and liver tissue, the resulting cell-type specific clocks demonstrate clear cell-type and tissue-specificity, achieving improved predictions of chronological age in the corresponding sorted cells and tissue-types. Most importantly, our cell-type specific clocks yield improved estimates of biological age in the relevant cell-types and tissues (here brain and liver), as compared to Horvath’s multi-tissue clock and the non-cell-type specific (but tissue-specific) brain and liver clocks, the latter result highlighting the importance of distinguishing cell-type specific from tissue-specific clocks.

The case of Alzheimer’s Disease is important to discuss further, since there has been substantial controversy and debate as to whether epigenetic clocks display age-acceleration in AD or not. Here we have significantly advanced the field by constructing neuron and glia-specific clocks, and demonstrating an age-acceleration effect for both neurons and glia, with the effect being strongest for the Glia-Sin clock in the temporal lobe, suggesting that epigenetic age-acceleration associated with AD may be particularly prominent in the glia TL compartment. As far as the age-acceleration of the Neu-In and Neu-Sin clocks is concerned, it is worth noting that the excitatory neuron fraction was observed to increase with age, whilst it was decreased in AD. This pattern would suggest that the observed age-acceleration of the neuron-compartment in AD is not due to shifts in underlying neuronal subtype proportions, but instead caused by DNAm changes in these specific neuronal subtypes. It is also noteworthy that one of the CpGs in the Neu-In clock defines a cis-mQTL linked with a SNP that has been associated with AD and which maps to CASS4, a gene which has previously been implicated in neurotic plaque formation and loss of synaptic connectivity in AD [86, 87]. Another CpG in the Neu-In clock mapped to a cis-mQTL implicating TSPAN14, which is associated with increased AD risk [85, 88, 89]. These findings are important, because it is conceivable that clock-CpGs whose DNAm levels track age-acceleration in AD due to environmental risk factors (e.g. social isolation, sleep deprivation) may also be under the influence of genetic variants that determine AD-risk, which would thus point towards a shared causal pathway to AD [17].

The neuron and glia specific clock CpGs also displayed a statistically significant overlap with CpGs of the causal DamAge clocks. Remarkably, the few overlapping CpGs mapped to genes that have been strongly implicated in AD. Alongside CBX7 and TOMM40L discussed earlier, it is worth highlighting here also SESN2 and SGMS1. Inactivation of Sestrin-2 (SESN2) results in diverse metabolic pathologies, including oxidative damage, fat accumulation, mitochondrial dysfunction, and muscle degeneration, that resemble accelerated tissue aging [90, 91]. SESN2 is also induced by amyloid-β, playing a protective role against amyloid-β neurotoxicity [61]. Knockdown of sphingomyelin synthase-1 (SGMS1) attenuates AD-like pathology through promoting lysosomal degradation of BACE1 [65]. Moreover, SGMS1 is significantly elevated in the hippocampus of AD brain and inhibition significantly reduces the level of amyloid-β [66]. Of note, for SGMS1, the implicated CpG maps to the 5′UTR, where an age-related decrease in DNAm (as observed here in our brain clocks, Supplementary Table 10) would normally be associated with increased expression, hence consistent with the damaging effect as implicated in the DamAge clock. However, it should also be noted that the relation between DNAm and gene-expression is complex.

The ability to quantify biological age at cell-type resolution can also help address whether age-acceleration in various cell-types of a tissue correlate with each other or not. Here we explored this important question in the context of the neuron and glia clocks, revealing that the age-acceleration of these two clocks are indeed significantly correlated. Despite this, the age-acceleration measures were also sufficiently distinct, with the glia clocks displaying stronger acceleration in AD compared to the neuron-clocks. In the case of liver, it should be noted that most of the identified age-associated DNAm changes occurred in hepatocytes and cholangiocytes, with the underlying DNAm changes in these two cell-types displaying a very strong anti-correlation. Thus, a putative cholangiocyte clock would be composed of overwhelmingly the exact same CpGs making up the hepatocyte clock, and thus be effectively analogous to it, which is why we did not consider it here.

Whilst the computational framework presented here could be extended to construct cell-type specific epigenetic clocks for other tissue and cell-types, it is worth highlighting some of the key limitations. First, it must be stressed that our strategy hinges critically on the ability to detect age-associated DNAm changes in individual cell-types and that this requires appropriately powered datasets [26, 51]. Indeed, the need for large sample sizes (typically at least many hundreds) has recently been emphasized by us [51]. The required sample size to ensure adequate power is a complex function of many parameters, including the number of cell-types (i.e. cellular resolution) at which the inference is to be made and the distribution (notably means and variances) of their proportions across samples. At higher cellular resolution, there is a corresponding need that the cohort be big enough so that all possible combinations of cell-type fractions are realized, which in practice may be difficult because of intrinsic correlations between cell-types. Thus, the need for fairly large DNAm datasets currently limits our strategy to a few tissue-types (e.g. brain/liver) and a low cellular resolution (e.g. neuron, glia and stroma). In order to be able to quantify biological age in additional subsets, say glia subtypes in brain, or for the different immune cell subtypes in liver, one would need on the order of thousands of samples, which in future could be realized by merging DNAm datasets together. For instance, for an age-related disease like NAFLD that is associated with molecular alterations in hepatocytes and which leads to a marked reduction in the hepatocyte fraction with age, other cell-types (e.g. Kupffer macrophages) may also undergo biological aging, marking the persistent inflammation associated with NAFLD [92], but here we were underpowered to detect age-DMCTs in macrophages. Ultimately, increased cellular resolution may be critical as for instance, adjustment for a broad cell-type that encompasses two cell subtypes cannot account for age-related relative variations in these subtypes. A related important question is the relevant cellular resolution or definition of cell-types [93] to use in such analyses, as this is often dependent on the specific biological questions to be addressed.

Another key unresolved issue is the relative quantification of cell-type specific versus shared/common age-associated DNAm changes. For instance, as shown by us, most age-associated DNAm changes in blood seem to be shared between distinct immune cell-types [94], implying that the number of truly cell-type specific age-associated DNAm changes may be small in a tissue like blood. Ideally, cell-type specific clocks should be built from those CpGs that only change in one cell-type of the tissue and not others, yet confidently identifying these is challenging and would require the aforementioned very large DNAm datasets. Certainly, an in-depth study in a tissue like blood, where large numbers of DNAm datasets are available would merit further investigation, although given that the relative fractions of naïve and mature T-cell subtypes change with age [24], this may require on the order of tens of thousands of samples. In the case of brain too, we have observed a substantial overlap between neuron and glia age-associated DMCs, the resulting lineage-specificity of the corresponding clocks also being less evident than in the case of liver and the hepatocyte specific clock. Indeed, in the case of liver it is worth stressing that we were able to confidently validate the hepatocyte specific age-DMCs in sorted hepatocytes, as well as the hepatocyte and cholangiocyte age-DMCs in 3 independent bulk liver DNAm datasets.

In summary, we have quantified the relative contributions of extrinsic and intrinsic aging to an epigenetic clock’s predictive accuracy, whilst also presenting a computational strategy to dissect the intrinsic component, building cell-type specific epigenetic clocks that clearly demonstrate cell-type and tissue specificity in relation to prediction of chronological and biological age. The construction of such cell-type specific clocks will be critical to advance our understanding of biological aging at cell-type resolution and for evaluating in which cell-types cellular rejuvenation interventions work best.

Methods

Brief description of main datasets analyzed

Human liver DNA methylation datasets

Johnson et al. [54]: This is an Illumina EPIC DNAm dataset of liver-tissue, comprising 341 samples, with 16 duplicates among them. The dataset includes samples from four stages representing the progression of non-alcoholic fatty liver disease (NAFLD): stage 0, stage 3, stage 3–4, and stage 4. Stage 0 is normal liver. Data is available at GEO (https://www.ncbi.nlm.nih.gov/geo/; accession number GSE180474). We downloaded a preprocessed data matrix following QC, including P-values of detection for each probe. To select high-quality probes for clock construction, we utilized the dropLociWithSnps() function of minfi v.1.44.0 to remove SNP probes [95]. Sex chromosome CpGs were also excluded to minimize gender disparities. Any beta value with a P-value greater than 0.01 was considered as a missing value, and any probe with a coverage-fraction less than 0.99 was discarded. Missing values were then imputed using impute.knn (k = 5) of impute v.1.72.3 [96]. Type-2 bias was corrected using BMIQ algorithm [97], and finally duplicates were averaged, resulting in a final data matrix of 325 samples encompassing 812,168 CpGs on autosomes. Of these 325 samples, 210 were from normal healthy liver, whereas the remaining 115 represent NAFLD of grade-3 or higher. In this work, the 210 normal liver samples were used to build and validate the Liver and HepClocks in a 10-fold CV-strategy, whereas the 115 NAFLD samples were used to test for biological age-acceleration.

Park et al. [98]: Illumina HM850 (EPIC) dataset encompassing 56 primary hepatocyte cultures. Raw IDAT files were downloaded from GSE123995. Data was processed as before. Any probe with missing values was excluded, resulting in 809,289 probes. A sample was removed due to very low predicted hepatocyte fraction (as estimated using EpiSCORE).

Bonder et al. [56]: Illumina HM450k dataset profiling 67 obese liver samples. All individuals have a BMI greater than 35. Raw IDAT files were downloaded from GSE61446. minfi::preprocessRaw() was employed to process the raw IDAT data. The remaining steps in the preprocessing pipeline align with the aforementioned description, with a probe coverage threshold set at 0.90. A total of 411,233 probes remained.

Bacalini et al. [99]: Illumina HM450 dataset profiling 40 normal liver samples. Raw IDAT files were downloaded from GSE107038 and processed as the other datasets above. Any probe with missing values was excluded, resulting in 446,651 probes. A sample was removed due to very low predicted hepatocyte fraction (as estimated using EpiSCORE).

Horvath and Ahrens et al. [58, 59]: Illumina HM450 datasets comprising 77 samples from healthy non-obese (n = 44) and healthy obese (n = 33) individuals. Processed data was downloaded from GEO under accession numbers GSE61258 and GSE48325. For both datasets, we only used probes with no missing values (i.e., significant detection P-value < 0.01) across all samples. Datasets were then normalized with BMIQ. The merged dataset encompassed 77 samples and 417,213 probes. A subset of samples was obtained following bariatric surgery, but since surgery had no effect on clock estimates [58, 59] we kept these samples in. Of note, the subset of 44 healthy non-obese samples was used to assess the predictive accuracy of the clocks with chronological age.

Wang et al. AATD dataset [57]: Illumina HM850 (EPIC) dataset encompassing 108 liver samples from 94 Alpha-1 Antitrypsin Deficiency (AATD) patients and 14 AATD patients with cirrhosis. Only the normal liver samples from the 94 AATD patients without cirrhosis were used. Processed normalized data was downloaded from GEO under accession number GSE119100. This normalized data is already adjusted for type-2 probe bias (adjusted with Subset-quantile Within Array Normalization (SWAN)), and probes that failed to achieve significant detection (p < 0.01) in all samples alongside cross-reactive probes were filtered out, leaving a total of 866,836 probes.

Human brain DNA methylation datasets used for training and model selection

Jaffe: The 450k dataset from Jaffe et al. [50], which profiled DNAm in human dorsolateral prefrontal cortex from controls and patients with schizophrenia (SZ), was obtained from the NCBI GEO website under the accession number GSE74193 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE74193). The file “GSE169156_RAW.tar” which contains the IDAT files was downloaded and processed with minfi package [95]. Probes with >25% NAs (defined by P > 0.01) were discarded. Samples younger than 18 years old were also discarded. The filtered data was subsequently normalized with BMIQ [97], resulting in a normalized data matrix for 473,536 probes and 416 samples, of which 226 are controls and 190 are SZ cases, age range 18–97 years.

Pihlstrøm: The EPIC dataset from Pihlstrøm et al. [52], which profiled DNAm in human frontal cortex, was obtained from the NCBI GEO website under the accession number GSE203332 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE203332). The file “GSE203332_RAW.tar” which contains the IDAT files was downloaded and processed with minfi package. Probes with NAs (defined by P > 0.01) were discarded. The filtered data was subsequently normalized with BMIQ, resulting in a normalized data matrix for 832,899 probes and 442 samples, of which 67 are controls with an age-range of 49 to 99 years.

Human brain DNA methylation datasets used for validation

Due to the large number of brain DNAm datasets analyzed, these are all summarized in Supplementary Methods.

Human DNA methylation datasets from other tissues

All the whole blood datasets used here have been preprocessed and analyzed previously by us [24]. In addition to whole blood, we also analyzed the following DNAm datasets from other tissue-types:

Colon [100]: Illumina HM450 dataset comprising 96 normal colon mucosa. Raw data is available from GSE131013. IDAT files were downloaded and processed with minfi package. Probes with NAs (defined by P > 0.01) were discarded. The filtered data was subsequently normalized with BMIQ, resulting in a normalized data matrix for 448,286 probes and 240 samples. 144 tumor samples were discarded.

Prostate [101]: Illumina HM450 dataset comprising 51 normal prostate samples. Raw data is available from GSE76938. IDAT files were downloaded and processed with minfi package. Probes with NAs (defined by P > 0.01) were discarded. The filtered data was subsequently normalized with BMIQ, resulting in a normalized data matrix for 471,112 probes and 123 samples and 72 prostate cancer samples were discarded.

Kidney [102]: Illumina HM450 dataset profiling 31 normal kidney tissue samples. Raw data is available from GSE79100. There are three technical replicates per tissue. IDAT files were downloaded and processed with minfi package. Probes with NAs (defined by P > 0.01) were discarded. The filtered data was subsequently normalized with BMIQ, resulting in a normalized data matrix for 454,671 probes and 93 samples.

Skin [103]: Illumina HM450 dataset profiling 322 normal skin samples from a peri-umbilical punch biopsy. From GSE90124, we downloaded the processed type-2 probe bias adjusted DNAm data with 450,531 probes and 322 samples.

Estimating cell-type fractions in whole blood, liver and brain

For a given bulk-tissue sample with a genome-wide DNAm profile, it is possible to estimate the cell-type composition of the tissue with a corresponding tissue-specific DNAm reference matrix [104]. Such a DNAm reference matrix consists of cell-type specific CpGs and cell-types, and is typically built from genome-wide DNAm data of sorted cell-types [105]. The cell-type fractions themselves are estimated by regressing the DNAm profile of the bulk-tissue samples against the reference profiles of the DNAm reference matrix, a procedure that only uses the highly discriminatory cell-type specific CpGs as defined in the DNAm reference matrix [106]. For whole blood, we estimated fractions for 12 immune cell subtypes (naïve and memory B-cells, naïve and memory CD4+ T-cells, naïve and memory CD8+ T-cells, T-regulatory cells, natural killer cells, monocytes, neutrophils, eosinophils, basophils) using EpiDISH and our validated DNAm reference matrix for these 12 cell-types [24]. In the case of liver tissue, we used our validated EpiSCORE algorithm and the associated liver DNAm reference matrix defined for hepatocytes, cholangiocytes, endothelial cells, Kupffer macrophages and lymphocytes [53]. In the case of brain-tissue, we applied the HiBED algorithm [46], since the EpiSCORE brain DNAm reference matrix led to the microglia fraction always being zero. We applied HiBED to estimate proportions for 7 brain cell-types including GABAergic excitatory neurons, glutamatergic inhibitory neurons, astrocytes, microglia, oligodendrocytes, endothelial cells and stromal cells. When applying CellDMC we collapsed these estimates to 3 broad cell-types: neurons, glia and endothelial/stromal.

Relative quantification of extrinsic and intrinsic aging in whole blood and brain

Whole blood: As training data we used two of the largest whole blood DNAm datasets, the Airway [107] and Lehne [108] cohorts, which we have previously normalized and analyzed [24]. Briefly, the Airway EPIC DNAm dataset (GSE147740) included 1032 samples with age information. The Lehne dataset is a 450k DNAm dataset (GSE55763) encompassing 2707 samples in total. The two datasets were merged, resulting in 314,436 CpGs and 3739 samples. In the Airway dataset, age is cofounded by gender (Pearson correlation test, P-value = 0.003896). To train the unadjusted and adjusted clocks, we applied the Elastic Net regression to the training dataset with alpha = 0.5 and a 10-fold cross-validation to infer the optimal penalty parameter lambda. In the unadjusted case, we applied the Elastic Net to the standardized DNAm residuals after adjusting for sex and cohort only. In the adjusted case, we applied the Elastic Net to the standardized DNAm residuals after adjusting for sex, cohort and the 12 estimated immune-cell type fractions. We also trained another clock adjusted for 9 immune cell-type fractions by merging the naïve and memory lymphocyte subsets together, in other words, by adding the fractions of naïve and memory B-cells to yield a fraction for B-cells, and similarly for CD4T and CD8T-cells. The 3 optimal clocks were then applied to 11 independent whole blood datasets (using only control/healthy samples, see Supplementary Table 1), all previously normalized as in Luo et al. [24]. In the case of the adjusted clocks, the clocks were applied to the standardized DNAm residuals after adjustment for the 12 or 9 immune-cell fractions. Finally, the predicted ages were linearly regressed to the true chronological ages in each cohort, resulting in R2 values for the unadjusted and the two adjusted clocks. The ratio of the adjusted clock’s R2 value to that of the unadjusted clock, RADJ2/RUNADJ2 quantifies the fraction of the unadjusted clock’s accuracy that can be attributed to intrinsic aging. For the extrinsic component, its fractional contribution is given by 1RADJ2/RUNADJ2.

Brain: For training, we used the Jaffe et al. dataset [50] encompassing 491 samples, including 300 controls and 191 samples from schizophrenia patients, with an age range 0.003 ~ 96.98 years. We used HiBED to estimate the fractions for 7-cell-types (ExN, InN, Oligo/OPC, Micro, Astro, Endo, Strom), as described earlier. To train the Elastic Net clocks with alpha = 0.5, we used a 10-fold cross-validation to infer the optimal penalty parameter lambda. In the unadjusted case, we applied the Elastic Net to the standardized DNAm residuals obtained after adjusting for sex and disease-status, whilst in the adjusted case, it was applied on the standardized DNAm residuals obtained after adjusting for sex, disease-status and 7 brain cell-type fractions. The two clocks were then applied to 11 independent brain DNAm datasets (using control/healthy samples only, see Supplementary Table 1), obtaining in each cohort corresponding RADJ2&RUNADJ2 values. The fraction of the unadjusted clock’s accuracy attributed to intrinsic and extrinsic aging was then estimated as described above for whole blood.

Construction of the neuron/glia/brain-specific epigenetic clocks

We used HiBED [46] to estimate fractions for 7 brain cell-types in the Jaffe and Philstrom datasets, in each case summing up fractions to yield estimates for 3 broad cell-types: neurons (Neu), glia (Glia) and endothelial/stromal. We then applied CellDMC [45], as implemented in the EpiDISH package [109], to the Jaffe dataset in order to infer age-associated DMCTs in each of these 3 broad cell-types. Briefly, the CellDMC model postulates the following relationship for the DNAm level βcs of a CpG c in sample s:

βcs=t=13f^tsβcts=t=13f^ts(μct+Agesγct)+εcs

where f^ts are the estimated cell-type fractions for cell-type t in sample s. This can be re-expressed as a multivariate regression model with interaction terms between age and cell-type fractions:

βcs=t=13μctf^ts+t=13γctf^tsAges+εcs

This estimates the regression coefficients μct and γct so as to minimize the error in a least squares sense. When applying CellDMC to the Jaffe dataset, we also adjusted for schizophrenia (SZ) status and gender, i.e., we ran the model:

βcs=t=13μctf^ts+t=13γctf^tsAges+θcSexs+φcSZs+εcs

We included both SZ cases and controls, adjusting for case/control status, so as to increase power. Having identified the significant neuron-DMCTs (those CpGs c with a significant γct with t = neuron, t-test FDR <0.05), we next trained Elastic Net regression (glmnet R package) [49] models (alpha = 0.5) for age, each parameterized by a different penalty parameter (lambda) value. This training was done on the residuals obtained by regressing out the cell-type fractions from the DNAm beta matrix. To select the optimal model, we then applied the age-predictors for each choice of lambda to the Philstrom DNAm dataset (using the residuals after adjusting this DNAm data matrix for cell-type fractions), selecting the lambda that minimized the root-mean-square error (RMSE). This defines a neuron-specific epigenetic clock that is applicable to residuals after adjusting for cell-type fraction, a clock we call “Neu-In” (neuron-intrinsic).

We also constructed a similar clock called “Neu-Sin” (neuron semi-intrinsic), following the same procedure as before, but now not adjusting the DNAm data matrix for cell-type fractions. In other words, whilst the Neu-In and Neu-Sin clocks are both built from CpGs that change with age in neurons, the Neu-In clock uses the residuals for construction, whilst the Neu-Sin clock does not. A third non-cell-type specific clock (Brain) was also built using the same procedure as before, but not using CellDMC to identify CpGs changing with age, instead identifying age-DMCs by running a linear model of DNAm versus age adjusting for cell-type fractions, i.e. in this case the regression model is:

βcs=t=13μctf^ts+γcAges+θcSexs+φcSZs+εcs

and one finds age-DMCs, as those CpGs c with a significant γc (t-test, FDR <0.05). The age-predictor itself was then also trained on residuals obtained after regressing out the effect of cell-type fractions. Of note, the model selection step to define the Neu-Sin and Brain clocks was also done using the independent Philstrom DNAm dataset.

Finally, an analogous procedure describing the construction of the Neu-In and Neu-Sin clocks was followed to build corresponding glia-specific intrinsic and semi-intrinsic clocks (Glia-In and Glia-Sin).

Validation of the neuron/glia/brain specific clocks

To validate and demonstrate specificity of the Neu-In/Sin, Glia-In/Sin and Brain clocks, we applied them to sorted neuron, sorted glia, brain and non-brain DNAm cohorts. When applying the Neu-In, Glia-In and Brain clocks, we adjusted the DNAm datasets for underlying cell-type fractions, which were estimated using HiBED in the case of brain tissue, EpiDISH in whole blood cohorts and EpiSCORE in other non-brain tissues.

Construction of HepClock and LiverClock

We first describe the construction of the hepatocyte-specific clock (HepClock). We selected 210 normal liver samples from Johnson et al.’s dataset for training and model selection. EpiSCORE was used to estimate fractions for 5 liver cell-types including hepatocytes, as described above. The 210 samples were then randomly divided into 10 bags (each bag with 21 samples), with stratified sampling within each age group to ensure a similar age distribution in each bag. To identify CpGs changing with age in hepatocytes, we then applied the CellDMC algorithm [45] to a subset composed of 9 bags, resulting in a set of hepatocyte age-DMCTs. The specific CellDMC model run was:

βcs=t=15μctf^ts+t=15γctf^tsAges+εcs

Having identified the significant hepatocyte-DMCTs (those CpGs c with a significant γct with t = hepatocyte, t-test FDR <0.05), and using the same 9 bags, we then trained a range of lasso regression models parameterized by a penalty parameter lambda. Each of these models (parameterized by lambda) were subsequently applied to the left-out-bag to yield age-predictions for the samples in this left-out-bag. This procedure was repeated 10-times, each time using a different subset of 9 bags for training and one left-out-bag for model selection. The age-predictions over all 10 folds were then combined and compared to the true ages computing the RMSE for each choice of lambda. The model minimizing the RMSE was then selected as the optimal model. Of note, the above procedure applies CellDMC a total of 10 times, once for each fold. The number of hepatocyte age-DMCTs is variable between folds, so we identified the minimum number (k) obtained across all folds, and subsequently only used the top-k hepatocyte age-DMCTs from each fold before running the lasso regression model. Lasso was implemented using the glmnet function from the glmnet R-package (version 4.7.1) [49], with alpha set to 1. Having identified the optimal model (lambda parameter), we finally reran the lasso model with this optimal parameter on the full set of 210 samples using the top-k hepatocyte age-DMCTs as inferred using CellDMC on all samples.

The LiverClock was built in an analogous fashion with only one key difference: instead of applying CellDMC in each fold, we ran a linear regression of DNAm against age with all the liver cell-type fractions as covariates. That is, the model in this case is:

βcs=t=15μctf^ts+γctAges+εcs

and one finds age-DMCs, as those CpGs c with a significant γc (t-test, FDR <0.05). After optimization of lambda, the final age lasso predictor was built from such age-DMCs derived using all 210 samples.

Simulation analysis on mixtures of sorted cell-types

We used a simulation framework to assess and compare the intrinsic and semi-intrinsic paradigms for cell-type specific clock construction. We first generated basis DNAm data matrices for 3 hypothetical cell-types “A”, “B” and “C” each one defined over 10,000 CpGs and 200 samples, drawing DNAm beta-values from Beta (10,40), i.e. from a Beta-distribution with a mean value of 0.25. We then declared the first 1000 CpGs to be cell-type specific for a cell-type “A”, by drawing the DNAm-values for two other cell-types (“B” and “C”) from Beta (40,10), i.e. from a Beta-distribution with mean 0.75. Thus, the first 1000 CpGs are unmethylated in cell-type “A” compared to “B” and “C”. Likewise, another 1000 non-overlapping CpGs were defined to be unmethylated in cell-type “B” and another 1000 non-overlapping CpGs for cell-type “C”. The remaining 7000 CpGs are unmethylated in all cell-types. To generate realistic mixtures of these 3 cell-types, we randomly sampled cell-type fractions from true estimated fractions in whole blood by applying our EpiDISH algorithm [109] to estimate 7 immune cell-type fractions in a large dataset of 656 whole blood samples [110]. We declared 3 cell-types to be granulocytes, monocytes and lymphocytes, by adding together the fractions of B-cells, CD4T-cells, CD8 T-cells and NK-cells to represent “lymphocytes” and adding together the fractions of neutrophils and eosinophils to represent “granulocytes”. To define the ages of the 200 samples we bootstrapped 200 values from the ages of 139 donors from the BLUEPRINT consortium [111] in order to mimic an age distribution of a real human cohort. Before mixing together the artificially generated DNAm data matrix for the 3 hypothetical cell-types, we introduced age-DMCTs in the cell-type defining lymphocytes. This was done by drawing DNAm-values for these age-DMCTs to increase with the age of the sample. In more detail, because the minimum age of the BLUERPINT samples was 30, we first assigned M-values according to the formula:

Mval=Mval0+(age30)×effS+σ

where effS = 0.1 and Mval0 = log2(0.1/(1–0.1)) and σ representing a Gaussian deviate with a standard deviation value equal to effS. These values were chosen to mimic age-related DNAm changes as seen in real data, although with a smaller noise component in order to better understand and interpret downstream results.

In total we considered 4 different scenarios depending on the choice of age-DMCTs in lymphocytes and whether the fraction of “lymphocytes” increases with age or not. In scenarios (i) and (ii), 1000 age-DMCTs were chosen to not overlap with any of the 3000 cell-type specific marker CpGs. In scenarios (iii) and (iv), 900 age-DMCTs in lymphocytes were chosen to be lymphocyte-specific markers. Reason we did not pick all 1000 lymphocyte markers, is to leave 100 markers that remain cell-type specific to allow for reference construction and estimation of cell-type fractions using EpiDISH. In scenarios (i) and (iii), cell-type fractions were chosen so that no fraction changed significantly with age, whilst in scenarios (ii) and (iv) the lymphocyte-fraction was chosen to increase with age.

For each of the 4 scenarios, we generated training and validation mixture sets. DNAm reference matrices were built from different realizations of the basis DNAm data matrices but using the same parameter values. These reference matrices were then applied to the training dataset to infer cell-type fractions. With the estimated cell-type fractions we then applied CellDMC [45] to the training set to infer age-DMCTs. Since age-DMCTs are only introduced in lymphocytes, we focus on the predicted age-DMCTs in this cell-type, applying an Elastic Net (alpha = 0.5) penalized regression framework in a 10-fold cross-validation framework to learn a predictor of chronological age using either an intrinsic or semi-intrinsic paradigm: in the semi-intrinsic paradigm the Elastic Net predictor was trained on the original DNAm-values, whilst in the intrinsic case it was applied to residuals obtained by regressing out the effect of the cell-type fractions. The optimal penalty parameter was selected via the 10-fold CV procedure and the optimal predictor then applied to the validation dataset. Of note, in the semi-intrinsic setting the validation set is defined on the original DNAm-values, whereas the intrinsic model is applied to the residuals of the validation set. For each scenario we ran 50 different simulations, recording the accuracy (root mean square error-RMSE and Pearson Correlation Coefficient-PCC) of the semi-intrinsic and intrinsic lymphocyte-specific clock predictors in the corresponding validation set.

Meta-analyses

In the case of brain tissue datasets, we performed two separate meta-analyses. One of these explored if brain cell-type fractions change with age, whilst the other was done to explore if our cell type specific clocks (extrinsic age-acceleration-EAA) display correlations with Alzheimer’s Disease (AD) status. The former analysis was done for 7 brain cell types over 13 independent DNAm datasets encompassing frontal and temporal lobe regions, in each dataset adjusting for disease status and sex. The latter meta-analysis was done for frontal lobe (n = 5) and temporal lobe (n = 6) separately, and then again treating both regions together (n = 11). In each cohort, associations between EAA and AD status was assessed using a multivariate linear regression with sex and age as covariates, AD status encoded as ordinal with 0 = control, 1 = AD case. We extracted the corresponding effect size, standard error, Student’s t-test, and P-value for each regression. Meta-analysis was then performed using both fixed effect and random effect inverse variance method using the metagen function implemented in the meta R-package [112, 113].

Co-localization of age-DMCTs with DNAm reference matrix CpGs

The liver DNAm reference matrix used to estimate cell type proportions before training the clocks contains 171 marker gene promoters. We calculated the number of hepatocyte and cholangiocyte age-DMCTs that mapped to the promoter regions of these 171 marker genes (where the region is defined as within 1kb upstream or downstream of the transcription start site). To generate a null distribution, we did the following: from the DNAm dataset that we used to infer the age-DMCTs, we randomly selected a matched number of age-DMCTs and checked how many mapped to the promoter regions of these 171 marker genes. This process was repeated 1,000 times to obtain the null distribution, yielding an empirical P-value. In the case of the HiBED DNAm reference matrix, we focused on the CpGs in the first layer of deconvolution, and computed overlaps with neuron and glia age-DMCTs using a 1kb window. A null distribution was estimated in the same way as for liver.

Enrichment analyses

We assessed enrichment of clock-CpGs against EWAS-DMCs for AD and BMI, as determined by the EWAS catalog [79] and EWAS-atlas [80]. Enrichment analysis was done using a one-tailed Fisher exact test. We also assessed enrichment of our clock-CpGs for (i) blood mQTLs as determined by the ARIES mQTL database (n = 123010 is the number of distinct CpGs in mQTLs present in the training cohorts of the clocks) [81], focusing on the mQTLs identified in the adult population, (ii) mQTLs (n(breast) = 13256, n(colon) = 156419, n(kidney) = 18578, n(lung) = 166242, n(skeleton muscle) = 14437, n(ovary) = 134215, n(prostate) = 68724, n(testis) = 14894, n(blood) = 21222) identified by the eGTEX consortium [82] and (iii) mQTLs identified from adult [84] (n = 71965) and fetal [83] (n = 30707) brain DNAm datasets. Numbers of mQTLs mapping to the datasets used to train the clocks were computed. Since the GTEx project identified mQTLs using the Illumina EPIC array, while our clocks were trained using 450k probes, we only used mQTLs that map to 450k probes. Significance was evaluated using a one-tailed Binomial test. Overlap with GWAS AD SNPs from Bellenguez et al. [85] and GWAS BMI SNPs from Pulit et al. [114] was done by restricting to SNPs that were part of mQTLs that included clock-CpGs.

Code availability

All the cell-type specific clocks presented here are freely available as part of the R-package CTSclocks, downloadable from https://github.com/HGT-UwU/CTSclocks.

Data availability

The Illumina DNA methylation datasets analyzed here are all freely available from GEO (https://www.ncbi.nlm.nih.gov/geo). Details: Liver-NAFLD (n = 325, GEO: GSE180474); Hepatocyte cultures (n = 56, GEO: GSE123995); Liver-obese (n = 67, GEO: GSE61446); Liver-Horvath/Ahrens (n = 77, GEO: GSE61258 and GSE48325); Liver tissue (40 normal, GEO: GSE107038); Liver-AATD (n = 94, GEO: GSE119100); MESA (214 purified CD4+ T-cells and 1202 Monocyte samples, GEO: GSE56046 and GSE56581); Colon (96 normal and 144 adenocarcinoma, GEO: GSE131013); Prostate (123 normal and 63 benign, GEO: GSE76938); Kidney (93 normal, GEO: GSE79100); Skin (322 normal, GEO: GSE90124); Brain sorted-Pai (33 normal and 67 BD, SCZ, GEO: GSE112179); Brain sorted-Gasparoni (32 normal and 30 AD, GEO: GSE66351); Brain sorted-Kozlenkov (29 normal and 59 Heroin or suicide, GEO: GSE98203); Brain sorted-Guintivano (58 normal, GEO: GSE41826); Brain sorted-Hannon (103 normal, GEO: GSE234520); Brain sorted-Witte (20 normal and 36 BD, SCZ, MDD, GEO: GSE191200); Brain-Jaffe (226 normal and 190 SCZ, GEO: GSE74193); Brain-Philstrom (67 normal and 375 Various mental disease and neurodegenerative disease, GEO: GSE203332); Brain-Semick (187 normal and 82 AD, GEO: GSE125895); Brain-Pidsley (71 normal and 179 AD, GEO: GSE43414); Brain-Smith (138 normal and 148 AD, GEO: GSE80970); Brain-Horvath (95 normal and 205 AD, HD, GEO: GSE72778); Brain-Gasparoni (52 normal and 76 AD, GEO: GSE66351); Brain-Watson (34 normal and 34 AD, GEO: GSE76105); Brain-Brokaw (358 normal and 450 AD, GEO: GSE134379); Brain-Murphy (38 normal and 37 MDD, GEO: GSE88890); Brain-Rydbirk (37 normal and 41 MSA, GEO: GSE143157); Brain-Torabi (50 normal and 75 SCZ, GEO: GSE128601); Brain-Xu (25 normal and 23 AUD, GEO: GSE49393); Brain-Huynh (19 normal and 27 Multiple sclerosis, GEO: GSE40360); Brain-Viana2 (13 normal and 14 SCZ, GEO: GSE89703); Brain-Viana3 (17 normal and 16 SCZ, GEO: GSE89705); Brain-Viana4 (28 normal and 21 SCZ, GEO: GSE89706); Brain-Markunas (221 normal, GEO: GSE147040); Brain-Viana1 (17 normal and 16 SCZ, GEO: GSE89702); Brain-Stefania1 (37 normal and 21 Suicide, GEO: GSE137222); Brain-Stefania2 (15 normal and 18 Suicide, GEO: GSE137223); Whole blood-Airway (1032 normal, GEO: GSE147740); Whole blood-Barturen (101 normal and 473 SARS, GEO: GSE179325); Whole blood-Flanagan (184 normal, GEO: GSE61151); Whole blood-Hannon1 (304 normal and 332 SCZ, GEO: GSE80417); Whole blood-Hannon2 (405 normal and 260 SCZ, GEO: GSE84727); Whole blood-Hannum (656 normal, GEO: GSE40279); Whole blood-Johansson (729 normal, GEO: GSE87571); Whole blood-Lehne (2707 normal, GEO: GSE55763); Whole blood-LiuMS (139 normal and 140 Multiple Scleroris, GEO: GSE106648); Whole blood-LiuRA (335 normal and 354 Rheumatoid arthritis, GEO: GSE42861); Whole blood-TZH (85 normal and 620 Different health conditions, GEO: Requested Access); Whole blood-Ventham (101 normal and 279 Inflammatory bowel disease, GEO: GSE87648); Whole blood-Zannas (144 normal and 278 Trauma, GEO: GSE72680). The DNAm-atlas encompassing DNAm reference matrices for 13 tissue-types encompassing over 40 cell-types is freely available from the EpiSCORE R-package https://github.com/aet21/EpiSCORE.

Author Contributions

TH, GX and AET performed the statistical and bioinformatic analyses. AET conceived the study, and wrote the manuscript with contributions from TH and GX. QL contributed to the data analyses. MJ and NE contributed pointers to data and helped revise the manuscript.

Acknowledgments

We thank everyone who supports open-access data.

Conflicts of Interest

The authors declare no conflicts of interest related to this study.

Funding

This work was supported by NSFC (National Science Foundation of China) grants, Grant Numbers: 31970632, 32170652 and 32370699.

References

  • 1. Baker GT 3rd, Sprott RL. Biomarkers of aging. Exp Gerontol. 1988; 23:223–39. https://doi.org/10.1016/0531-5565(88)90025-3 [PubMed]
  • 2. Hadley EC, Lakatta EG, Morrison-Bogorad M, Warner HR, Hodes RJ. The future of aging therapies. Cell. 2005; 120:557–67. https://doi.org/10.1016/j.cell.2005.01.030 [PubMed]
  • 3. Levine ME, Lu AT, Quach A, Chen BH, Assimes TL, Bandinelli S, Hou L, Baccarelli AA, Stewart JD, Li Y, Whitsel EA, Wilson JG, Reiner AP, et al. An epigenetic biomarker of aging for lifespan and healthspan. Aging (Albany NY). 2018; 10:573–91. https://doi.org/10.18632/aging.101414 [PubMed]
  • 4. Moqri M, Herzog C, Poganik JR, Justice J, Belsky DW, Higgins-Chen A, Moskalev A, Fuellen G, Cohen AA, Bautmans I, Widschwendter M, Ding J, Fleming A, et al, and Biomarkers of Aging Consortium. Biomarkers of aging for the identification and evaluation of longevity interventions. Cell. 2023; 186:3758–75. https://doi.org/10.1016/j.cell.2023.08.003 [PubMed]
  • 5. Beck S. Taking the measure of the methylome. Nat Biotechnol. 2010; 28:1026–8. https://doi.org/10.1038/nbt1010-1026 [PubMed]
  • 6. Bell CG. Epigenomic insights into common human disease pathology. Cell Mol Life Sci. 2024; 81:178. https://doi.org/10.1007/s00018-024-05206-2 [PubMed]
  • 7. Seale K, Horvath S, Teschendorff A, Eynon N, Voisin S. Making sense of the ageing methylome. Nat Rev Genet. 2022; 23:585–605. https://doi.org/10.1038/s41576-022-00477-6 [PubMed]
  • 8. Horvath S, Raj K. DNA methylation-based biomarkers and the epigenetic clock theory of ageing. Nat Rev Genet. 2018; 19:371–84. https://doi.org/10.1038/s41576-018-0004-3 [PubMed]
  • 9. Field AE, Robertson NA, Wang T, Havas A, Ideker T, Adams PD. DNA Methylation Clocks in Aging: Categories, Causes, and Consequences. Mol Cell. 2018; 71:882–95. https://doi.org/10.1016/j.molcel.2018.08.008 [PubMed]
  • 10. Bell CG, Lowe R, Adams PD, Baccarelli AA, Beck S, Bell JT, Christensen BC, Gladyshev VN, Heijmans BT, Horvath S, Ideker T, Issa JJ, Kelsey KT, et al. DNA methylation aging clocks: challenges and recommendations. Genome Biol. 2019; 20:249. https://doi.org/10.1186/s13059-019-1824-y [PubMed]
  • 11. Zhang Q, Vallerga CL, Walker RM, Lin T, Henders AK, Montgomery GW, He J, Fan D, Fowdar J, Kennedy M, Pitcher T, Pearson J, Halliday G, et al. Improved precision of epigenetic clock estimates across tissues and its implication for biological ageing. Genome Med. 2019; 11:54. https://doi.org/10.1186/s13073-019-0667-1 [PubMed]
  • 12. Lu AT, Quach A, Wilson JG, Reiner AP, Aviv A, Raj K, Hou L, Baccarelli AA, Li Y, Stewart JD, Whitsel EA, Assimes TL, Ferrucci L, Horvath S. DNA methylation GrimAge strongly predicts lifespan and healthspan. Aging (Albany NY). 2019; 11:303–27. https://doi.org/10.18632/aging.101684 [PubMed]
  • 13. Levine ME, Hosgood HD, Chen B, Absher D, Assimes T, Horvath S. DNA methylation age of blood predicts future onset of lung cancer in the women's health initiative. Aging (Albany NY). 2015; 7:690–700. https://doi.org/10.18632/aging.100809 [PubMed]
  • 14. Levine ME, Lu AT, Chen BH, Hernandez DG, Singleton AB, Ferrucci L, Bandinelli S, Salfati E, Manson JE, Quach A, Kusters CD, Kuh D, Wong A, et al. Menopause accelerates biological aging. Proc Natl Acad Sci U S A. 2016; 113:9327–32. https://doi.org/10.1073/pnas.1604558113 [PubMed]
  • 15. Marioni RE, Shah S, McRae AF, Chen BH, Colicino E, Harris SE, Gibson J, Henders AK, Redmond P, Cox SR, Pattie A, Corley J, Murphy L, et al. DNA methylation age of blood predicts all-cause mortality in later life. Genome Biol. 2015; 16:25. https://doi.org/10.1186/s13059-015-0584-6 [PubMed]
  • 16. Marioni RE, Shah S, McRae AF, Ritchie SJ, Muniz-Terrera G, Harris SE, Gibson J, Redmond P, Cox SR, Pattie A, Corley J, Taylor A, Murphy L, et al. The epigenetic clock is correlated with physical and cognitive fitness in the Lothian Birth Cohort 1936. Int J Epidemiol. 2015; 44:1388–96. https://doi.org/10.1093/ije/dyu277 [PubMed]
  • 17. de Magalhães JP. Distinguishing between driver and passenger mechanisms of aging. Nat Genet. 2024; 56:204–11. https://doi.org/10.1038/s41588-023-01627-0 [PubMed]
  • 18. Teschendorff AE, Relton CL. Statistical and integrative system-level analysis of DNA methylation data. Nat Rev Genet. 2018; 19:129–47. https://doi.org/10.1038/nrg.2017.86 [PubMed]
  • 19. Jaffe AE, Irizarry RA. Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome Biol. 2014; 15:R31. https://doi.org/10.1186/gb-2014-15-2-r31 [PubMed]
  • 20. Zhang Z, Reynolds SR, Stolrow HG, Chen JQ, Christensen BC, Salas LA. Deciphering the role of immune cell composition in epigenetic age acceleration: Insights from cell-type deconvolution applied to human blood epigenetic clocks. Aging Cell. 2024; 23:e14071. https://doi.org/10.1111/acel.14071 [PubMed]
  • 21. Marttila S, Rajić S, Ciantar J, Mak JKL, Junttila IS, Kummola L, Hägg S, Raitoharju E, Kananen L. Biological aging of different blood cell types. Geroscience. 2024. [Epub ahead of print]. https://doi.org/10.1007/s11357-024-01287-w [PubMed]
  • 22. Tomusiak A, Floro A, Tiwari R, Riley R, Matsui H, Andrews N, Kasler HG, Verdin E. Development of an epigenetic clock resistant to changes in immune cell composition. Commun Biol. 2024; 7:934. https://doi.org/10.1038/s42003-024-06609-4 [PubMed]
  • 23. Jonkman TH, Dekkers KF, Slieker RC, Grant CD, Ikram MA, van Greevenbroek MMJ, Franke L, Veldink JH, Boomsma DI, Slagboom PE, Heijmans BT, and BIOS Consortium. Functional genomics analysis identifies T and NK cell activation as a driver of epigenetic clock progression. Genome Biol. 2022; 23:24. https://doi.org/10.1186/s13059-021-02585-8 [PubMed]
  • 24. Luo Q, Dwaraka VB, Chen Q, Tong H, Zhu T, Seale K, Raffaele JM, Zheng SC, Mendez TL, Chen Y, Carreras N, Begum S, Mendez K, et al. A meta-analysis of immune-cell fractions at high resolution reveals novel associations with common phenotypes and health outcomes. Genome Med. 2023; 15:59. https://doi.org/10.1186/s13073-023-01211-5 [PubMed]
  • 25. Maity AK, Teschendorff AE. Cell-attribute aware community detection improves differential abundance testing from single-cell RNA-Seq data. Nat Commun. 2023; 14:3244. https://doi.org/10.1038/s41467-023-39017-z [PubMed]
  • 26. Zheng SC, Webster AP, Dong D, Feber A, Graham DG, Sullivan R, Jevons S, Lovat LB, Beck S, Widschwendter M, Teschendorff AE. A novel cell-type deconvolution algorithm reveals substantial contamination by immune cells in saliva, buccal and cervix. Epigenomics. 2018; 10:925–40. https://doi.org/10.2217/epi-2018-0037 [PubMed]
  • 27. Mejia-Ramirez E, Florian MC. Understanding intrinsic hematopoietic stem cell aging. Haematologica. 2020; 105:22–37. https://doi.org/10.3324/haematol.2018.211342 [PubMed]
  • 28. Horvath S. DNA methylation age of human tissues and cell types. Genome Biol. 2013; 14:R115. https://doi.org/10.1186/gb-2013-14-10-r115 [PubMed]
  • 29. Levine ME, Lu AT, Bennett DA, Horvath S. Epigenetic age of the pre-frontal cortex is associated with neuritic plaques, amyloid load, and Alzheimer's disease related cognitive functioning. Aging (Albany NY). 2015; 7:1198–211. https://doi.org/10.18632/aging.100864 [PubMed]
  • 30. Levine AJ, Quach A, Moore DJ, Achim CL, Soontornniyomkij V, Masliah E, Singer EJ, Gelman B, Nemanim N, Horvath S. Accelerated epigenetic aging in brain is associated with pre-mortem HIV-associated neurocognitive disorders. J Neurovirol. 2016; 22:366–75. https://doi.org/10.1007/s13365-015-0406-3 [PubMed]
  • 31. Levine AJ, Panos SE, Horvath S. Genetic, transcriptomic, and epigenetic studies of HIV-associated neurocognitive disorder. J Acquir Immune Defic Syndr. 2014; 65:481–503. https://doi.org/10.1097/QAI.0000000000000069 [PubMed]
  • 32. Lu AT, Hannon E, Levine ME, Hao K, Crimmins EM, Lunnon K, Kozlenkov A, Mill J, Dracheva S, Horvath S. Genetic variants near MLST8 and DHX57 affect the epigenetic age of the cerebellum. Nat Commun. 2016; 7:10561. https://doi.org/10.1038/ncomms10561 [PubMed]
  • 33. Lu AT, Hannon E, Levine ME, Crimmins EM, Lunnon K, Mill J, Geschwind DH, Horvath S. Genetic architecture of epigenetic and neuronal ageing rates in human brain regions. Nat Commun. 2017; 8:15353. https://doi.org/10.1038/ncomms15353 [PubMed]
  • 34. Murthy M, Shireby G, Miki Y, Viré E, Lashley T, Warner TT, Mill J, Bettencourt C. Epigenetic age acceleration is associated with oligodendrocyte proportions in MSA and control brain tissue. Neuropathol Appl Neurobiol. 2023; 49:e12872. https://doi.org/10.1111/nan.12872 [PubMed]
  • 35. Shireby G, Dempster EL, Policicchio S, Smith RG, Pishva E, Chioza B, Davies JP, Burrage J, Lunnon K, Seiler Vellame D, Love S, Thomas A, Brookes K, et al. DNA methylation signatures of Alzheimer's disease neuropathology in the cortex are primarily driven by variation in non-neuronal cell-types. Nat Commun. 2022; 13:5620. https://doi.org/10.1038/s41467-022-33394-7 [PubMed]
  • 36. Shireby GL, Davies JP, Francis PT, Burrage J, Walker EM, Neilson GWA, Dahir A, Thomas AJ, Love S, Smith RG, Lunnon K, Kumari M, Schalkwyk LC, et al. Recalibrating the epigenetic clock: implications for assessing biological age in the human cortex. Brain. 2020; 143:3763–75. https://doi.org/10.1093/brain/awaa334 [PubMed]
  • 37. Brunet A, Goodell MA, Rando TA. Ageing and rejuvenation of tissue stem cells and their niches. Nat Rev Mol Cell Biol. 2023; 24:45–62. https://doi.org/10.1038/s41580-022-00510-w [PubMed]
  • 38. Chondronasiou D, Gill D, Mosteiro L, Urdinguio RG, Berenguer-Llergo A, Aguilera M, Durand S, Aprahamian F, Nirmalathasan N, Abad M, Martin-Herranz DE, Stephan-Otto Attolini C, Prats N, et al. Multi-omic rejuvenation of naturally aged tissues by a single cycle of transient reprogramming. Aging Cell. 2022; 21:e13578. https://doi.org/10.1111/acel.13578 [PubMed]
  • 39. de Magalhães JP, Ocampo A. Cellular reprogramming and the rise of rejuvenation biotech. Trends Biotechnol. 2022; 40:639–42. https://doi.org/10.1016/j.tibtech.2022.01.011 [PubMed]
  • 40. Gill D, Parry A, Santos F, Okkenhaug H, Todd CD, Hernando-Herraez I, Stubbs TM, Milagre I, Reik W. Multi-omic rejuvenation of human cells by maturation phase transient reprogramming. Elife. 2022; 11:e71624. https://doi.org/10.7554/eLife.71624 [PubMed]
  • 41. Cole JJ, Robertson NA, Rather MI, Thomson JP, McBryan T, Sproul D, Wang T, Brock C, Clark W, Ideker T, Meehan RR, Miller RA, Brown-Borg HM, Adams PD. Diverse interventions that extend mouse lifespan suppress shared age-associated epigenetic changes at critical gene regulatory regions. Genome Biol. 2017; 18:58. https://doi.org/10.1186/s13059-017-1185-3 [PubMed]
  • 42. Wang T, Tsui B, Kreisberg JF, Robertson NA, Gross AM, Yu MK, Carter H, Brown-Borg HM, Adams PD, Ideker T. Epigenetic aging signatures in mice livers are slowed by dwarfism, calorie restriction and rapamycin treatment. Genome Biol. 2017; 18:57. https://doi.org/10.1186/s13059-017-1186-2 [PubMed]
  • 43. Trapp A, Kerepesi C, Gladyshev VN. Profiling epigenetic age in single cells. Nat Aging. 2021; 1:1189–201. https://doi.org/10.1038/s43587-021-00134-3 [PubMed]
  • 44. Zhu T, Liu J, Beck S, Pan S, Capper D, Lechner M, Thirlwell C, Breeze CE, Teschendorff AE. A pan-tissue DNA methylation atlas enables in silico decomposition of human tissue methylomes at cell-type resolution. Nat Methods. 2022; 19:296–306. https://doi.org/10.1038/s41592-022-01412-7 [PubMed]
  • 45. Zheng SC, Breeze CE, Beck S, Teschendorff AE. Identification of differentially methylated cell types in epigenome-wide association studies. Nat Methods. 2018; 15:1059–66. https://doi.org/10.1038/s41592-018-0213-x [PubMed]
  • 46. Zhang Z, Wiencke JK, Kelsey KT, Koestler DC, Molinaro AM, Pike SC, Karra P, Christensen BC, Salas LA. Hierarchical deconvolution for extensive cell type resolution in the human brain using DNA methylation. Front Neurosci. 2023; 17:1198243. https://doi.org/10.3389/fnins.2023.1198243 [PubMed]
  • 47. Chien JF, Liu H, Wang BA, Luo C, Bartlett A, Castanon R, Johnson ND, Nery JR, Osteen J, Li J, Altshul J, Kenworthy M, Valadon C, et al. Cell-type-specific effects of age and sex on human cortical neurons. Neuron. 2024; 112:2524–39.e5. https://doi.org/10.1016/j.neuron.2024.05.013 [PubMed]
  • 48. Cai M, Zhou J, McKennan C, Wang J. scMD facilitates cell type deconvolution using single-cell DNA methylation references. Commun Biol. 2024; 7:1. https://doi.org/10.1038/s42003-023-05690-5 [PubMed]
  • 49. Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw. 2010; 33:1–22. [PubMed]
  • 50. Jaffe AE, Gao Y, Deep-Soboslay A, Tao R, Hyde TM, Weinberger DR, Kleinman JE. Mapping DNA methylation across development, genotype and schizophrenia in the human frontal cortex. Nat Neurosci. 2016; 19:40–7. https://doi.org/10.1038/nn.4181 [PubMed]
  • 51. You C, Wu S, Zheng SC, Zhu T, Jing H, Flagg K, Wang G, Jin L, Wang S, Teschendorff AE. A cell-type deconvolution meta-analysis of whole blood EWAS reveals lineage-specific smoking-associated DNA methylation changes. Nat Commun. 2020; 11:4779. https://doi.org/10.1038/s41467-020-18618-y [PubMed]
  • 52. Pihlstrøm L, Shireby G, Geut H, Henriksen SP, Rozemuller AJM, Tunold JA, Hannon E, Francis P, Thomas AJ, Love S, Mill J, van de Berg WDJ, Toft M. Epigenome-wide association study of human frontal cortex identifies differential methylation in Lewy body pathology. Nat Commun. 2022; 13:4932. https://doi.org/10.1038/s41467-022-32619-z [PubMed]
  • 53. Teschendorff AE, Zhu T, Breeze CE, Beck S. EPISCORE: cell type deconvolution of bulk tissue DNA methylomes from single-cell RNA-Seq data. Genome Biol. 2020; 21:221. https://doi.org/10.1186/s13059-020-02126-9 [PubMed]
  • 54. Johnson ND, Wu X, Still CD, Chu X, Petrick AT, Gerhard GS, Conneely KN, DiStefano JK. Differential DNA methylation and changing cell-type proportions as fibrotic stage progresses in NAFLD. Clin Epigenetics. 2021; 13:152. https://doi.org/10.1186/s13148-021-01129-y [PubMed]
  • 55. Teschendorff AE. Avoiding common pitfalls in machine learning omic data science. Nat Mater. 2019; 18:422–7. https://doi.org/10.1038/s41563-018-0241-z [PubMed]
  • 56. Bonder MJ, Kasela S, Kals M, Tamm R, Lokk K, Barragan I, Buurman WA, Deelen P, Greve JW, Ivanov M, Rensen SS, van Vliet-Ostaptchouk JV, Wolfs MG, et al. Genetic and epigenetic regulation of gene expression in fetal and adult human livers. BMC Genomics. 2014; 15:860. https://doi.org/10.1186/1471-2164-15-860 [PubMed]
  • 57. Wang L, Marek GW 3rd, Hlady RA, Wagner RT, Zhao X, Clark VC, Fan AX, Liu C, Brantly M, Robertson KD. Alpha-1 Antitrypsin Deficiency Liver Disease, Mutational Homogeneity Modulated by Epigenetic Heterogeneity With Links to Obesity. Hepatology. 2019; 70:51–66. https://doi.org/10.1002/hep.30526 [PubMed]
  • 58. Horvath S, Erhart W, Brosch M, Ammerpohl O, von Schönfels W, Ahrens M, Heits N, Bell JT, Tsai PC, Spector TD, Deloukas P, Siebert R, Sipos B, et al. Obesity accelerates epigenetic aging of human liver. Proc Natl Acad Sci U S A. 2014; 111:15538–43. https://doi.org/10.1073/pnas.1412759111 [PubMed]
  • 59. Ahrens M, Ammerpohl O, von Schönfels W, Kolarova J, Bens S, Itzel T, Teufel A, Herrmann A, Brosch M, Hinrichsen H, Erhart W, Egberts J, Sipos B, et al. DNA methylation analysis in nonalcoholic fatty liver disease suggests distinct disease-specific and remodeling signatures after bariatric surgery. Cell Metab. 2013; 18:296–302. https://doi.org/10.1016/j.cmet.2013.07.004 [PubMed]
  • 60. Ying K, Liu H, Tarkhov AE, Sadler MC, Lu AT, Moqri M, Horvath S, Kutalik Z, Shen X, Gladyshev VN. Causality-enriched epigenetic age uncouples damage and adaptation. Nat Aging. 2024; 4:231–46. https://doi.org/10.1038/s43587-023-00557-0 [PubMed]
  • 61. Chen YS, Chen SD, Wu CL, Huang SS, Yang DI. Induction of sestrin2 as an endogenous protective mechanism against amyloid beta-peptide neurotoxicity in primary cortical culture. Exp Neurol. 2014; 253:63–71. https://doi.org/10.1016/j.expneurol.2013.12.009 [PubMed]
  • 62. Rai N, Kumar R, Desai GR, Venugopalan G, Shekhar S, Chatterjee P, Tripathi M, Upadhyay AD, Dwivedi S, Dey AB, Dey S. Relative Alterations in Blood-Based Levels of Sestrin in Alzheimer's Disease and Mild Cognitive Impairment Patients. J Alzheimers Dis. 2016; 54:1147–55. https://doi.org/10.3233/JAD-160479 [PubMed]
  • 63. Wang B, Fu C, Wei Y, Xu B, Yang R, Li C, Qiu M, Yin Y, Qin D. Ferroptosis-related biomarkers for Alzheimer's disease: Identification by bioinformatic analysis in hippocampus. Front Cell Neurosci. 2022; 16:1023947. https://doi.org/10.3389/fncel.2022.1023947 [PubMed]
  • 64. Yu M, Nie Y, Yang J, Yang S, Li R, Rao V, Hu X, Fang C, Li S, Song D, Guo F, Snyder MP, Chang HY, et al. Integrative multi-omic profiling of adult mouse brain endothelial cells and potential implications in Alzheimer's disease. Cell Rep. 2023; 42:113392. https://doi.org/10.1016/j.celrep.2023.113392 [PubMed]
  • 65. Lu MH, Ji WL, Xu DE, Yao PP, Zhao XY, Wang ZT, Fang LP, Huang R, Lan LJ, Chen JB, Wang TH, Cheng LH, Xu RX, et al. Inhibition of sphingomyelin synthase 1 ameliorates alzheimer-like pathology in APP/PS1 transgenic mice through promoting lysosomal degradation of BACE1. Exp Neurol. 2019; 311:67–79. https://doi.org/10.1016/j.expneurol.2018.09.012 [PubMed]
  • 66. Hsiao JH, Fu Y, Hill AF, Halliday GM, Kim WS. Elevation in sphingomyelin synthase activity is associated with increases in amyloid-beta peptide generation. PLoS One. 2013; 8:e74016. https://doi.org/10.1371/journal.pone.0074016 [PubMed]
  • 67. Garcia-Esparcia P, Koneti A, Rodríguez-Oroz MC, Gago B, Del Rio JA, Ferrer I. Mitochondrial activity in the frontal cortex area 8 and angular gyrus in Parkinson's disease and Parkinson's disease with dementia. Brain Pathol. 2018; 28:43–57. https://doi.org/10.1111/bpa.12474 [PubMed]
  • 68. Bu S, Lv Y, Liu Y, Qiao S, Wang H. Zinc Finger Proteins in Neuro-Related Diseases Progression. Front Neurosci. 2021; 15:760567. https://doi.org/10.3389/fnins.2021.760567 [PubMed]
  • 69. Deng Y, He T, Fang R, Li S, Cao H, Cui Y. Genome-Wide Gene-Based Multi-Trait Analysis. Front Genet. 2020; 11:437. https://doi.org/10.3389/fgene.2020.00437 [PubMed]
  • 70. Yuan S, Li H, Xie J, Sun X. Quantitative Trait Module-Based Genetic Analysis of Alzheimer's Disease. Int J Mol Sci. 2019; 20:5912. https://doi.org/10.3390/ijms20235912 [PubMed]
  • 71. Arefin AS, Mathieson L, Johnstone D, Berretta R, Moscato P. Unveiling clusters of RNA transcript pairs associated with markers of Alzheimer's disease progression. PLoS One. 2012; 7:e45535. https://doi.org/10.1371/journal.pone.0045535 [PubMed]
  • 72. Popov N, Gil J. Epigenetic regulation of the INK4b-ARF-INK4a locus: in sickness and in health. Epigenetics. 2010; 5:685–90. https://doi.org/10.4161/epi.5.8.12996 [PubMed]
  • 73. Duan RS, Tang GB, Du HZ, Hu YW, Liu PP, Xu YJ, Zeng YQ, Zhang SF, Wang RY, Teng ZQ, Liu CM. Polycomb protein family member CBX7 regulates intrinsic axon growth and regeneration. Cell Death Differ. 2018; 25:1598–611. https://doi.org/10.1038/s41418-018-0064-0 [PubMed]
  • 74. Caselli RJ, Dueck AC, Huentelman MJ, Lutz MW, Saunders AM, Reiman EM, Roses AD. Longitudinal modeling of cognitive aging and the TOMM40 effect. Alzheimers Dement. 2012; 8:490–5. https://doi.org/10.1016/j.jalz.2011.11.006 [PubMed]
  • 75. Lee EG, Chen S, Leong L, Tulloch J, Yu CE. TOMM40 RNA Transcription in Alzheimer's Disease Brain and Its Implication in Mitochondrial Dysfunction. Genes (Basel). 2021; 12:871. https://doi.org/10.3390/genes12060871 [PubMed]
  • 76. Tirozzi A, Quiccione MS, Cerletti C, Donati MB, de Gaetano G, Iacoviello L, Gialluisi A. A Multi-Trait Association Analysis of Brain Disorders and Platelet Traits Identifies Novel Susceptibility Loci for Major Depression, Alzheimer's and Parkinson's Disease. Cells. 2023; 12:245. https://doi.org/10.3390/cells12020245 [PubMed]
  • 77. Zhu Z, Yang Y, Xiao Z, Zhao Q, Wu W, Liang X, Luo J, Cao Y, Shao M, Guo Q, Ding D. TOMM40 and APOE variants synergistically increase the risk of Alzheimer's disease in a Chinese population. Aging Clin Exp Res. 2021; 33:1667–75. https://doi.org/10.1007/s40520-020-01661-6 [PubMed]
  • 78. Cardoso R, Lemos C, Oliveiros B, Almeida MR, Baldeiras I, Pereira CF, Santos A, Duro D, Vieira D, Santana I, Oliveira CR. APOEɛ4-TOMM40L Haplotype Increases the Risk of Mild Cognitive Impairment Conversion to Alzheimer's Disease. J Alzheimers Dis. 2020; 78:587–601. https://doi.org/10.3233/JAD-200556 [PubMed]
  • 79. Battram T, Yousefi P, Crawford G, Prince C, Sheikhali Babaei M, Sharp G, Hatcher C, Vega-Salas MJ, Khodabakhsh S, Whitehurst O, Langdon R, Mahoney L, Elliott HR, et al. The EWAS Catalog: a database of epigenome-wide association studies. Wellcome Open Res. 2022; 7:41. https://doi.org/10.12688/wellcomeopenres.17598.2 [PubMed]
  • 80. Xiong Z, Yang F, Li M, Ma Y, Zhao W, Wang G, Li Z, Zheng X, Zou D, Zong W, Kang H, Jia Y, Li R, et al. EWAS Open Platform: integrated data, knowledge and toolkit for epigenome-wide association study. Nucleic Acids Res. 2022; 50:D1004–9. https://doi.org/10.1093/nar/gkab972 [PubMed]
  • 81. Gaunt TR, Shihab HA, Hemani G, Min JL, Woodward G, Lyttleton O, Zheng J, Duggirala A, McArdle WL, Ho K, Ring SM, Evans DM, Davey Smith G, Relton CL. Systematic identification of genetic influences on methylation across the human life course. Genome Biol. 2016; 17:61. https://doi.org/10.1186/s13059-016-0926-z [PubMed]
  • 82. Oliva M, Demanelis K, Lu Y, Chernoff M, Jasmine F, Ahsan H, Kibriya MG, Chen LS, Pierce BL. DNA methylation QTL mapping across diverse human tissues provides molecular links between genetic variation and complex traits. Nat Genet. 2023; 55:112–22. https://doi.org/10.1038/s41588-022-01248-z [PubMed]
  • 83. Hannon E, Spiers H, Viana J, Pidsley R, Burrage J, Murphy TM, Troakes C, Turecki G, O'Donovan MC, Schalkwyk LC, Bray NJ, Mill J. Methylation QTLs in the developing brain and their enrichment in schizophrenia risk loci. Nat Neurosci. 2016; 19:48–54. https://doi.org/10.1038/nn.4182 [PubMed]
  • 84. Ng B, White CC, Klein HU, Sieberts SK, McCabe C, Patrick E, Xu J, Yu L, Gaiteri C, Bennett DA, Mostafavi S, De Jager PL. An xQTL map integrates the genetic architecture of the human brain's transcriptome and epigenome. Nat Neurosci. 2017; 20:1418–26. https://doi.org/10.1038/nn.4632 [PubMed]
  • 85. Bellenguez C, Küçükali F, Jansen IE, Kleineidam L, Moreno-Grau S, Amin N, Naj AC, Campos-Martin R, Grenier-Boley B, Andrade V, Holmans PA, Boland A, Damotte V, et al, and EADB, and GR@ACE, and DEGESCO, and EADI, and GERAD, and Demgene, and FinnGen, and ADGC, and CHARGE. New insights into the genetic etiology of Alzheimer's disease and related dementias. Nat Genet. 2022; 54:412–36. https://doi.org/10.1038/s41588-022-01024-z [PubMed]
  • 86. Andrade-Guerrero J, Santiago-Balmaseda A, Jeronimo-Aguilar P, Vargas-Rodríguez I, Cadena-Suárez AR, Sánchez-Garibay C, Pozo-Molina G, Méndez-Catalá CF, Cardenas-Aguayo MD, Diaz-Cintra S, Pacheco-Herrero M, Luna-Muñoz J, Soto-Rojas LO. Alzheimer's Disease: An Updated Overview of Its Genetics. Int J Mol Sci. 2023; 24:3754. https://doi.org/10.3390/ijms24043754 [PubMed]
  • 87. Hassan M, Shahzadi S, Alashwal H, Zaki N, Seo SY, Moustafa AA. Exploring the mechanistic insights of Cas scaffolding protein family member 4 with protein tyrosine kinase 2 in Alzheimer's disease by evaluating protein interactions through molecular docking and dynamic simulations. Neurol Sci. 2018; 39:1361–74. https://doi.org/10.1007/s10072-018-3430-2 [PubMed]
  • 88. Sáez ME, García-Sánchez A, de Rojas I, Alarcón-Martín E, Martínez J, Cano A, García-González P, Puerta R, Olivé C, Capdevila M, García-Gutiérrez F, Castilla-Martí M, Castilla-Martí L, et al. Genome-wide association study and polygenic risk scores of retinal thickness across the cognitive continuum: data from the NORFACE cohort. Alzheimers Res Ther. 2024; 16:38. https://doi.org/10.1186/s13195-024-01398-8 [PubMed]
  • 89. Yang X, Wen J, Yang H, Jones IR, Zhu X, Liu W, Li B, Clelland CD, Luo W, Wong MY, Ren X, Cui X, Song M, et al. Functional characterization of Alzheimer's disease genetic variants in microglia. Nat Genet. 2023; 55:1735–44. https://doi.org/10.1038/s41588-023-01506-8 [PubMed]
  • 90. Lee JH, Budanov AV, Karin M. Sestrins orchestrate cellular metabolism to attenuate aging. Cell Metab. 2013; 18:792–801. https://doi.org/10.1016/j.cmet.2013.08.018 [PubMed]
  • 91. Lee JH, Budanov AV, Park EJ, Birse R, Kim TE, Perkins GA, Ocorr K, Ellisman MH, Bodmer R, Bier E, Karin M. Sestrin as a feedback inhibitor of TOR that prevents age-related pathologies. Science. 2010; 327:1223–8. https://doi.org/10.1126/science.1182228 [PubMed]
  • 92. Barreby E, Chen P, Aouadi M. Macrophage functional diversity in NAFLD - more than inflammation. Nat Rev Endocrinol. 2022; 18:461–72. https://doi.org/10.1038/s41574-022-00675-6 [PubMed]
  • 93. Arendt D, Musser JM, Baker CVH, Bergman A, Cepko C, Erwin DH, Pavlicev M, Schlosser G, Widder S, Laubichler MD, Wagner GP. The origin and evolution of cell types. Nat Rev Genet. 2016; 17:744–57. https://doi.org/10.1038/nrg.2016.127 [PubMed]
  • 94. Zhu T, Zheng SC, Paul DS, Horvath S, Teschendorff AE. Cell and tissue type independent age-associated DNA methylation changes are not rare but common. Aging (Albany NY). 2018; 10:3541–57. https://doi.org/10.18632/aging.101666 [PubMed]
  • 95. Aryee MJ, Jaffe AE, Corrada-Bravo H, Ladd-Acosta C, Feinberg AP, Hansen KD, Irizarry RA. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics. 2014; 30:1363–9. https://doi.org/10.1093/bioinformatics/btu049 [PubMed]
  • 96. Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001; 17:520–5. https://doi.org/10.1093/bioinformatics/17.6.520 [PubMed]
  • 97. Teschendorff AE, Marabita F, Lechner M, Bartlett T, Tegner J, Gomez-Cabrero D, Beck S. A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data. Bioinformatics. 2013; 29:189–96. https://doi.org/10.1093/bioinformatics/bts680 [PubMed]
  • 98. Park CS, De T, Xu Y, Zhong Y, Smithberger E, Alarcon C, Gamazon ER, Perera MA. Hepatocyte gene expression and DNA methylation as ancestry-dependent mechanisms in African Americans. NPJ Genom Med. 2019; 4:29. https://doi.org/10.1038/s41525-019-0102-y [PubMed]
  • 99. Bacalini MG, Franceschi C, Gentilini D, Ravaioli F, Zhou X, Remondini D, Pirazzini C, Giuliani C, Marasco E, Gensous N, Di Blasio AM, Ellis E, Gramignoli R, et al. Molecular Aging of Human Liver: An Epigenetic/Transcriptomic Signature. J Gerontol A Biol Sci Med Sci. 2019; 74:1–8. https://doi.org/10.1093/gerona/gly048 [PubMed]
  • 100. Díez-Villanueva A, Sanz-Pamplona R, Carreras-Torres R, Moratalla-Navarro F, Alonso MH, Paré-Brunet L, Aussó S, Guinó E, Solé X, Cordero D, Salazar R, Berdasco M, Peinado MA, Moreno V. DNA methylation events in transcription factors and gene expression changes in colon cancer. Epigenomics. 2020; 12:1593–610. https://doi.org/10.2217/epi-2020-0029 [PubMed]
  • 101. Kirby MK, Ramaker RC, Roberts BS, Lasseigne BN, Gunther DS, Burwell TC, Davis NS, Gulzar ZG, Absher DM, Cooper SJ, Brooks JD, Myers RM. Genome-wide DNA methylation measurements in prostate tissues uncovers novel prostate cancer diagnostic biomarkers and transcription factor binding patterns. BMC Cancer. 2017; 17:273. https://doi.org/10.1186/s12885-017-3252-2 [PubMed]
  • 102. Huan T, Mendelson M, Joehanes R, Yao C, Liu C, Song C, Bhattacharya A, Rong J, Tanriverdi K, Keefe J, Murabito JM, Courchesne P, Larson MG, et al. Epigenome-wide association study of DNA methylation and microRNA expression highlights novel pathways for human complex traits. Epigenetics. 2020; 15:183–98. https://doi.org/10.1080/15592294.2019.1640547 [PubMed]
  • 103. Roos L, Sandling JK, Bell CG, Glass D, Mangino M, Spector TD, Deloukas P, Bataille V, Bell JT. Higher Nevus Count Exhibits a Distinct DNA Methylation Signature in Healthy Human Skin: Implications for Melanoma. J Invest Dermatol. 2017; 137:910–20. https://doi.org/10.1016/j.jid.2016.11.029 [PubMed]
  • 104. Houseman EA, Accomando WP, Koestler DC, Christensen BC, Marsit CJ, Nelson HH, Wiencke JK, Kelsey KT. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics. 2012; 13:86. https://doi.org/10.1186/1471-2105-13-86 [PubMed]
  • 105. Salas LA, Zhang Z, Koestler DC, Butler RA, Hansen HM, Molinaro AM, Wiencke JK, Kelsey KT, Christensen BC. Enhanced cell deconvolution of peripheral blood using DNA methylation for high-resolution immune profiling. Nat Commun. 2022; 13:761. https://doi.org/10.1038/s41467-021-27864-7 [PubMed]
  • 106. Teschendorff AE, Zheng SC. Cell-type deconvolution in epigenome-wide association studies: a review and recommendations. Epigenomics. 2017; 9:757–68. https://doi.org/10.2217/epi-2016-0153 [PubMed]
  • 107. Robinson O, Chadeau Hyam M, Karaman I, Climaco Pinto R, Ala-Korpela M, Handakas E, Fiorito G, Gao H, Heard A, Jarvelin MR, Lewis M, Pazoki R, Polidoro S, et al. Determinants of accelerated metabolomic and epigenetic aging in a UK cohort. Aging Cell. 2020; 19:e13149. https://doi.org/10.1111/acel.13149 [PubMed]
  • 108. Lehne B, Drong AW, Loh M, Zhang W, Scott WR, Tan ST, Afzal U, Scott J, Jarvelin MR, Elliott P, McCarthy MI, Kooner JS, Chambers JC. A coherent approach for analysis of the Illumina HumanMethylation450 BeadChip improves data quality and performance in epigenome-wide association studies. Genome Biol. 2015; 16:37. https://doi.org/10.1186/s13059-015-0600-x [PubMed]
  • 109. Teschendorff AE, Breeze CE, Zheng SC, Beck S. A comparison of reference-based algorithms for correcting cell-type heterogeneity in Epigenome-Wide Association Studies. BMC Bioinformatics. 2017; 18:105. https://doi.org/10.1186/s12859-017-1511-5 [PubMed]
  • 110. Hannum G, Guinney J, Zhao L, Zhang L, Hughes G, Sadda S, Klotzle B, Bibikova M, Fan JB, Gao Y, Deconde R, Chen M, Rajapakse I, et al. Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol Cell. 2013; 49:359–67. https://doi.org/10.1016/j.molcel.2012.10.016 [PubMed]
  • 111. Chen L, Ge B, Casale FP, Vasquez L, Kwan T, Garrido-Martín D, Watt S, Yan Y, Kundu K, Ecker S, Datta A, Richardson D, Burden F, et al. Genetic Drivers of Epigenetic and Transcriptional Variation in Human Immune Cells. Cell. 2016; 167:1398–414.e24. https://doi.org/10.1016/j.cell.2016.10.026 [PubMed]
  • 112. Borenstein M, Hedges LV, Higgins JP, Rothstein HR. A basic introduction to fixed-effect and random-effects models for meta-analysis. Res Synth Methods. 2010; 1:97–111. https://doi.org/10.1002/jrsm.12 [PubMed]
  • 113. Balduzzi S, Rücker G, Schwarzer G. How to perform a meta-analysis with R: a practical tutorial. Evid Based Ment Health. 2019; 22:153–60. https://doi.org/10.1136/ebmental-2019-300117 [PubMed]
  • 114. Pulit SL, Stoneman C, Morris AP, Wood AR, Glastonbury CA, Tyrrell J, Yengo L, Ferreira T, Marouli E, Ji Y, Yang J, Jones S, Beaumont R, et al, and GIANT Consortium. Meta-analysis of genome-wide association studies for body fat distribution in 694 649 individuals of European ancestry. Hum Mol Genet. 2019; 28:166–74. https://doi.org/10.1093/hmg/ddy327 [PubMed]