Splice junction across-tissue accuracy

d_bogdanoff · July 26, 2025, 9:47am

Hello,

Congratulations on the impressive AlphaGenome preprint,

I was interested in comparing how well AlphaGenome predicts tissue specific splice junction usage, but I’m finding this information a bit difficult to parse from the paper. Is Extended Data Figure 2e, third plot the proper metric to focus on? (noted as a measute of tissue specificity of splice junction predictions in the figure caption). This looks similar to tissue specific splicing metric histogram in the MTSplice paper (figure 5B in Cheng et al. 2021), whose author contributed to AlphaGenome, so I was guessing it may be a similar measure. If this is the case, would you be able to provide the thresholds used to select these splice junctions? ie. in MTSplice, exons with a deltaPSI of 0.2 from the tissue mean designated as tissue-specific.

Any help would be much appreciated,
Thanks,
Derek

Jun_Cheng · July 28, 2025, 5:12pm

Hi Derek, the third plot of Extended Data Fig 2e is a histogram of correlation across tissue (PSI3 predicted versus measured). The histogram for PSI5 looks similar which we could have included. Yes, this is a metric we measure tissue specificity of PSI5/3 predictions. We did not filter junctions in this plot as in MTSplice. Another difference is that MTsplice predicts PSI for exons, while AG is predicting junction counts, from which you can derive PSI5/3.

d_bogdanoff · July 28, 2025, 7:11pm

Hi Jun,

Thanks for the response.
Looking at the histogram in question, if I understand it correctly, there must be some filter/threshold on cross-tissue delta PSI in order to deem the junction as tissue-specific, leading generally to the lower predicted-vs.-measured correlations in the histogram. If all junctions were included, I would expect these correlations to be higher based on the fact that the vast majority of junctions aren’t tissue specific and AlphaGenome’s high mean accuracy on predicting all splice junctions (Pearson r of ~0.7-0.75 in human tissues, based on figure 2).

Though, I could be off on my interpretation. Any help clearing this up would be much appreciated, and thanks for contributing to this very impressive work.

Derek

Jun_Cheng · August 4, 2025, 2:58pm

Hi Derek, the correlation on the third plot is calculated across tissues, to measure if the model ranks tissue-specific expression correctly. As you mentioned, most junctions are not tissue-specifically expressed, therefore it is hard to predict tissue-specific splicing for them (as the per tissue splicing level is essentially mean + noise). Only for junctions that do spliced tissue-specifically due to biology, the model is able to make meaningful predictions (since there is signal beyond random noise across tissues). No delta psi filtering means making the problem harder, therefore correlation lower. Does that make sense?

d_bogdanoff · August 4, 2025, 5:19pm

Hi Jun,

Thanks for the clarification, this clears things up.

Derek

Topic		Replies	Views
Use alphagenome prediect wheat genome Feedback & Feature Requests testing	1	670	July 17, 2025
Questions about the splicing part Help & Support	1	31	September 25, 2025
Validation of API usage Help & Support	5	189	September 16, 2025
Can't reproduce alphagenome's benchmarks Help & Support	9	186	September 20, 2025
Looking for Collaborators: AlphaGenome & SMA (Spinal Muscular Atrophy) Project Community	0	222	July 4, 2025

Splice junction across-tissue accuracy

Related topics