AlphaGenome does not give the exact same output when the same input is repeatedly fed through the model. I propose measuring these slight differences to test if the distribution of outputs can be mapped to the demographic distribution of the datasets used to train AlphaGenome. If true for humans and mice, this would mean AlphaGenome has inadvertently (or purposely?) characterized the exact epigenetic changes that occur in people and mice as they age down to the base-pair resolution.
Proper verification would require running AlphaGenome enough times, across enough of each species’ genome, in enough tissues to match the size of the datasets while accounting for the different likelihoods of each age submitting different types of tissue samples (everyone sends blood, but not many teens get biopsies during colonoscopies). While GTEx and FANTOMS are already formatted for accessing age and sex, age data could theoretically be gotten for ENCODE and 4D Nucleome by tracing back tissue sample sources. Initial verification would be done on blood—because it has the most data from all kinds of ages—to see if AlphaGenome’s most common outputs match the known epigenomes of the most common blood donors (males over thirty). If this is true, larger-scale studies have reason to gather funding.
Creating a comprehensive map of the human epigenome as it ages means development can begin on identifying the protein-coding and non-coding genes that cause predicted histone modifications or how to “open” and “close” chromatin. Additionally, AlphaGenome’s strength in predicting differences across tissue types means age-related treatments can start as tissue-specific modifications, for example, practicing rejuvenating skin before editing cardiac muscle. This could be done using AlphaGenome’s core functionality of predicting disease-causing mutations by finding which mutations result in the same epigenetic changes found in the elderly.