Best way to aggregate AlphaGenome scores across tracks and modalities for variant prioritization?

Hi AlphaGenome team/community,

We are working on a research project involving genome variant analysis, and we are exploring how to integrate AlphaGenome as an additional annotation layer for variant prioritization in a research-only context.

For practical downstream interpretation, we would ideally like to reduce the AlphaGenome output to a small number of columns per variant, rather than storing thousands of rows across output types, tracks, tissues/cell types and scorers.

We have seen that for splicing there is a recommended merged score combining SPLICE_SITES, SPLICE_SITE_USAGE and SPLICE_JUNCTIONS, using the maximum effect across tracks/genes and combining them into a single alphagenome_splicing score. Would a similar strategy be meaningful for other modalities, such as expression or regulatory outputs?

More specifically, we are considering something like:

  • alphagenome_priority_score: maximum or combined score across selected relevant modalities

  • top_modality: splicing / expression / chromatin accessibility / TF / histone, etc.

  • top_output_type

  • top_track or tissue/cell type where the maximum effect was observed

  • modality-specific scores, e.g. splicing_score, expression_score, regulatory_score

Our main question is whether quantile_score values are comparable across output types, scorers and tracks, so that taking the maximum quantile score per variant would be a reasonable prioritization strategy. If not, would you recommend summarizing scores separately by modality instead?

We are also trying to avoid generating very large outputs for many tissues or cell types that are not relevant to our disease context. Would the best approach be to restrict ontology_terms / tracks before scoring, or to score broadly and then aggregate/filter afterwards?

In short, what would be the recommended way to obtain a compact, clinically usable AlphaGenome annotation table with one row per variant, while preserving enough information to know which modality and track drove the prioritization?

Any guidance or examples of best practices for pipeline integration would be very helpful.

Thank you!

Hello, I’m not part of the AlphaGenome support team — I’m also an AlphaGenome user. However, I noticed that the Terms of Service emphasize: ‘You must not use AlphaGenome API or outputs for clinical purposes or rely on them for medical or other professional advice.’ (https://deepmind.google.com/science/alphagenome/terms) This is deeply frustrating to me too