How to use a different GENCODE version (e.g., v49) with score_variant()?

Hello all,

I am trying to calculate variant scores related to RNA-seq and splicing (splice sites, splice site usage, splice junctions).

It appears that AlphaGenome’s score_variant() function has GENCODE v46 built in by default. However, I would like to use the latest version, GENCODE v49.

I have a few questions regarding this:

  1. Is there a way to change the GENCODE version in the score_variant() function?

  2. If not, should I generate a .gtf.gz.feather file from the v49 GTF file using the script below and then calculate variant scores directly using the predict_variant() function?
    alphagenome/scripts/process_gtf.py at main · google-deepmind/alphagenome · GitHub

If I need to use predict_variant(), is there a formula or example code for calculating each score?

In particular, according to the documentation, the splice_junctions score is calculated as max(|log(ALT) - log(REF)|) from “top-k splice sites” within the gene body:

  • What is the value of k?
  • What criteria are used to select the “top-k” splice sites?

Reference: How variant scoring works — AlphaGenome

Thank you in advance for your help.

Hello,

Apologies for not replying sooner. You should now be able to run the model locally with an updated GTF by using our code + weights.

This should also help clarify your queries re: junction scoring (top-k is 256 for the scorer).

Thanks!

Tom