Predicting variant effects in cell lines without ontology terms

Hi Alpha Genome Team,

I’m trying to use AlphaGenome to assess variant effects in primary cell lines that we established in-house and that are not represented in standard cell ontologies. Since the model outputs tracks for predefined cells, I’m not sure how to best compare model predictions against our experimental data from these cells.

Any guidance would be appreciated. Thanks!

Best,
Peihua

Hi There!

Thanks for reaching out.

The most straightforward approach is to find the closest predefined proxy in the AlphaGenome library. AlphaGenome uses UBERON and CL (Cell Ontology) terms for its metadata. Have a look at our navigating ontologies colab for details on how to search available ontologies by name.

If tracks for similar cell types aren’t available, you could look for high-consensus variants (those that show a strong effect across most cell types that are similar). If a variant has a high effect across 90% of tracks, it is likely a constitutive regulatory element and should show up in your primary cell line. As part of a cell-type agnostic approach, you could extract variant effect predictions across a set of cell types and modalities, then use a model like LASSO regression to aggregate these features and train it against your experimental data.

Kind regards,
Tumi

Dear Tumi,

Thank you for your detailed response.

I noticed that the type of predictable output tracks vary depending on the selected cell type. For example:

CL:2000045 (foreskin melanocyte) includes:
OutputType.DNASE, OutputType.RNA_SEQ, OutputType.CHIP_HISTONE (H3K27ac, H3K27me3, H3K36me3, H3K4me1, H3K4me3, H3K9ac, H3K9me3), OutputType.SPLICE_SITE_USAGE, OutputType.SPLICE_JUNCTIONS

Whereas CL:1000458 (melanocyte of skin) only includes:
OutputType.RNA_SEQ, OutputType.SPLICE_SITE_USAGE, OutputType.SPLICE_JUNCTIONS

Does it mean, if a particular assay (such as ATAC-seq or histone ChIP-seq) was not performed for a given cell type in the training data, the model cannot generate predictions for that output type for that cell type?

Thank you for your time and I look forward to your response.

Best regards,
Peihua Zhao

Hi Peihua,

Thanks for the follow up.

Yes, that’s correct. AlphaGenome’s outputs are tied to the experimental datasets it was trained on. Have a look at our colab on navigating ontologies to find what assay types are available for each biosample - you may be able to find a similar biosample to your target under a different CURIE.

Kind regards,
Tumi