Hi AlphaGenome team/community,
We are working on a research project involving genome variant analysis, and we are exploring how to integrate AlphaGenome as an additional annotation layer for variant prioritization in a research-only context.
For practical downstream interpretation, we would ideally like to reduce the AlphaGenome output to a small number of columns per variant, rather than storing thousands of rows across output types, tracks, tissues/cell types and scorers.
We have seen that for splicing there is a recommended merged score combining SPLICE_SITES, SPLICE_SITE_USAGE and SPLICE_JUNCTIONS, using the maximum effect across tracks/genes and combining them into a single alphagenome_splicing score. Would a similar strategy be meaningful for other modalities, such as expression or regulatory outputs?
More specifically, we are considering something like:
-
alphagenome_priority_score: maximum or combined score across selected relevant modalities -
top_modality: splicing / expression / chromatin accessibility / TF / histone, etc. -
top_output_type -
top_trackor tissue/cell type where the maximum effect was observed -
modality-specific scores, e.g.
splicing_score,expression_score,regulatory_score
Our main question is whether quantile_score values are comparable across output types, scorers and tracks, so that taking the maximum quantile score per variant would be a reasonable prioritization strategy. If not, would you recommend summarizing scores separately by modality instead?
We are also trying to avoid generating very large outputs for many tissues or cell types that are not relevant to our disease context. Would the best approach be to restrict ontology_terms / tracks before scoring, or to score broadly and then aggregate/filter afterwards?
In short, what would be the recommended way to obtain a compact, clinically usable AlphaGenome annotation table with one row per variant, while preserving enough information to know which modality and track drove the prioritization?
Any guidance or examples of best practices for pipeline integration would be very helpful.
Thank you!