Hello! I am using AlphaGenome to prioritize noncoding variants with potential regulatory effects. As the docs note, a single genome contains many variants and most will have little impact.
In the preprint’s Variant scoring section, it says the goal is a single informative scalar value per variant. In practice, I’m getting thousands of quantile scores per variant (often ~5k–40k) across output types, ontologies/tissues, and scorer types.
This scoring documentation briefly describes aggregating scores, but I don’t see any meaningful suggestions on best practices to get this single score. If we take a mean of the scores surely the output types that aren’t expected to have an effect will bring down the overall score. Do we take maximum score of each variant? This seems like we may overestimate some variants effects, but maybe not…
Anyway, my question:
- Is there a built in aggregator that implements the single score per variant in the preprint? If so, what is the recommended configuration in terms of which outputs and ontologies to include based on variant type/location.
- If not, what best practices do you recommend to avoid aggregating the scores inappropriately (e.g., taking mean across all scores)? Maybe there is some way to take an adjusted max quantile, or perform a weighted aggregation based on relevant outputs and ontologies, etc.
Thanks!