Predicting variant effects in cell lines without ontology terms

Peihua · March 4, 2026, 11:50am

Hi Alpha Genome Team,

I’m trying to use AlphaGenome to assess variant effects in primary cell lines that we established in-house and that are not represented in standard cell ontologies. Since the model outputs tracks for predefined cells, I’m not sure how to best compare model predictions against our experimental data from these cells.

Any guidance would be appreciated. Thanks!

Best,
Peihua

Tumi_Makgatho · March 9, 2026, 4:38pm

Hi There!

Thanks for reaching out.

The most straightforward approach is to find the closest predefined proxy in the AlphaGenome library. AlphaGenome uses UBERON and CL (Cell Ontology) terms for its metadata. Have a look at our navigating ontologies colab for details on how to search available ontologies by name.

If tracks for similar cell types aren’t available, you could look for high-consensus variants (those that show a strong effect across most cell types that are similar). If a variant has a high effect across 90% of tracks, it is likely a constitutive regulatory element and should show up in your primary cell line. As part of a cell-type agnostic approach, you could extract variant effect predictions across a set of cell types and modalities, then use a model like LASSO regression to aggregate these features and train it against your experimental data.

Kind regards,
Tumi

Peihua · March 11, 2026, 1:16pm

Dear Tumi,

Thank you for your detailed response.

I noticed that the type of predictable output tracks vary depending on the selected cell type. For example:

CL:2000045 (foreskin melanocyte) includes:
OutputType.DNASE, OutputType.RNA_SEQ, OutputType.CHIP_HISTONE (H3K27ac, H3K27me3, H3K36me3, H3K4me1, H3K4me3, H3K9ac, H3K9me3), OutputType.SPLICE_SITE_USAGE, OutputType.SPLICE_JUNCTIONS

Whereas CL:1000458 (melanocyte of skin) only includes:
OutputType.RNA_SEQ, OutputType.SPLICE_SITE_USAGE, OutputType.SPLICE_JUNCTIONS

Does it mean, if a particular assay (such as ATAC-seq or histone ChIP-seq) was not performed for a given cell type in the training data, the model cannot generate predictions for that output type for that cell type?

Thank you for your time and I look forward to your response.

Best regards,
Peihua Zhao

Tumi_Makgatho · March 11, 2026, 3:21pm

Hi Peihua,

Thanks for the follow up.

Yes, that’s correct. AlphaGenome’s outputs are tied to the experimental datasets it was trained on. Have a look at our colab on navigating ontologies to find what assay types are available for each biosample - you may be able to find a similar biosample to your target under a different CURIE.

Kind regards,
Tumi

Topic		Replies	Views
Why are there different tracks for different cell line Help & Support	5	1308	March 6, 2026
Understanding the AlphaGenome generated Result of ANRIL(lncRNA) variants' impact Help & Support	2	213	January 28, 2026
Model Coverage Matrix for Cell Types and Assays Help & Support	2	138	February 5, 2026
DNA methylation data for gene expression prediction Feedback & Feature Requests	2	282	March 2, 2026
Why is AlphaGenome working fine with organ and not on it's sub-anatomical structures? Help & Support	1	1369	July 23, 2025

Predicting variant effects in cell lines without ontology terms

Related topics