I’m trying to use AlphaGenome to assess variant effects in primary cell lines that we established in-house and that are not represented in standard cell ontologies. Since the model outputs tracks for predefined cells, I’m not sure how to best compare model predictions against our experimental data from these cells.
The most straightforward approach is to find the closest predefined proxy in the AlphaGenome library. AlphaGenome uses UBERON and CL (Cell Ontology) terms for its metadata. Have a look at our navigating ontologies colab for details on how to search available ontologies by name.
If tracks for similar cell types aren’t available, you could look for high-consensus variants (those that show a strong effect across most cell types that are similar). If a variant has a high effect across 90% of tracks, it is likely a constitutive regulatory element and should show up in your primary cell line. As part of a cell-type agnostic approach, you could extract variant effect predictions across a set of cell types and modalities, then use a model like LASSO regression to aggregate these features and train it against your experimental data.
Whereas CL:1000458 (melanocyte of skin) only includes:
OutputType.RNA_SEQ, OutputType.SPLICE_SITE_USAGE, OutputType.SPLICE_JUNCTIONS
Does it mean, if a particular assay (such as ATAC-seq or histone ChIP-seq) was not performed for a given cell type in the training data, the model cannot generate predictions for that output type for that cell type?
Thank you for your time and I look forward to your response.
Yes, that’s correct. AlphaGenome’s outputs are tied to the experimental datasets it was trained on. Have a look at our colab on navigating ontologies to find what assay types are available for each biosample - you may be able to find a similar biosample to your target under a different CURIE.