Hello all,
I am trying to calculate variant scores related to RNA-seq and splicing (splice sites, splice site usage, splice junctions).
It appears that AlphaGenome’s score_variant() function has GENCODE v46 built in by default. However, I would like to use the latest version, GENCODE v49.
I have a few questions regarding this:
-
Is there a way to change the GENCODE version in the
score_variant()function? -
If not, should I generate a
.gtf.gz.featherfile from the v49 GTF file using the script below and then calculate variant scores directly using thepredict_variant()function?
alphagenome/scripts/process_gtf.py at main · google-deepmind/alphagenome · GitHub
If I need to use predict_variant(), is there a formula or example code for calculating each score?
In particular, according to the documentation, the splice_junctions score is calculated as max(|log(ALT) - log(REF)|) from “top-k splice sites” within the gene body:
- What is the value of k?
- What criteria are used to select the “top-k” splice sites?
Reference: How variant scoring works — AlphaGenome
Thank you in advance for your help.