Hello all,
When calculating splicing-related scores (splice junctions, splice sites, splice site usage) using the score_variant() function, it appears that all results are returned as absolute values.
However, I would like to know whether a variant increases (+) or decreases (-) splicing.
I have a few questions regarding this:
-
Is there a way to obtain raw scores with signs (i.e., non-absolute values)?
-
If this is not possible with score_variant(), do I need to calculate the scores directly using the predict_variant() function?
If manual calculation is required, the documentation does not seem to specify the k value or the selection criteria for “top-k splice sites” used in splice_junctions score calculation. Could you clarify this?
Reference: How variant scoring works — AlphaGenome
Thank you in advance for your help.
To respond to the post you posted together, I think your intuition about the scoring is on the right track. It is not possible to get raw scores (with signs) using score_variant because a variant scorer will always be applied to the score output, as you referenced.
I think you can get the raw score with predict_variant or predict_sequence, and then process it however you like. If you find this cumbersome, I think the best way to evaluate whether a variant increases or decreases a splicing event is to look at changes in RNA-seq signals, since splice junctions, sites, and site usage are fundamentally derived from RNA-seq. In that case, use GeneMaskLFCScorer or GeneMaskActiveScorer to get the fold-change of the RNA-seq signal only in exons.
As you pointed out, the variant scorer intrinsically masks exons based on GENCODE V46. You might use GENCODE V49 when visualizing the MANE transcript with the predicted track, as shown here. (after downloading gtf from GENCODE - Human Release 49 and gunzip it)
import pyranges
gtf = pyranges.read_gtf(“gencode.v49.annotation.gtf”, as_df=True, duplicate_attr=True)
I believe you’ll need to implement it yourself to use a custom exon masking and score calculation. So, stick with V46 if it’s not really different from V49.
Thanks Donghyeon_Baek for providing a great answer.
We use top-128 splice sites when scoring variants with splice_junctions.