Splice junction across-tissue accuracy

Hello,

Congratulations on the impressive AlphaGenome preprint,

I was interested in comparing how well AlphaGenome predicts tissue specific splice junction usage, but I’m finding this information a bit difficult to parse from the paper. Is Extended Data Figure 2e, third plot the proper metric to focus on? (noted as a measute of tissue specificity of splice junction predictions in the figure caption). This looks similar to tissue specific splicing metric histogram in the MTSplice paper (figure 5B in Cheng et al. 2021), whose author contributed to AlphaGenome, so I was guessing it may be a similar measure. If this is the case, would you be able to provide the thresholds used to select these splice junctions? ie. in MTSplice, exons with a deltaPSI of 0.2 from the tissue mean designated as tissue-specific.

Any help would be much appreciated,
Thanks,
Derek

Hi Derek, the third plot of Extended Data Fig 2e is a histogram of correlation across tissue (PSI3 predicted versus measured). The histogram for PSI5 looks similar which we could have included. Yes, this is a metric we measure tissue specificity of PSI5/3 predictions. We did not filter junctions in this plot as in MTSplice. Another difference is that MTsplice predicts PSI for exons, while AG is predicting junction counts, from which you can derive PSI5/3.

Hi Jun,

Thanks for the response.
Looking at the histogram in question, if I understand it correctly, there must be some filter/threshold on cross-tissue delta PSI in order to deem the junction as tissue-specific, leading generally to the lower predicted-vs.-measured correlations in the histogram. If all junctions were included, I would expect these correlations to be higher based on the fact that the vast majority of junctions aren’t tissue specific and AlphaGenome’s high mean accuracy on predicting all splice junctions (Pearson r of ~0.7-0.75 in human tissues, based on figure 2).

Though, I could be off on my interpretation. Any help clearing this up would be much appreciated, and thanks for contributing to this very impressive work.

Derek

Hi Derek, the correlation on the third plot is calculated across tissues, to measure if the model ranks tissue-specific expression correctly. As you mentioned, most junctions are not tissue-specifically expressed, therefore it is hard to predict tissue-specific splicing for them (as the per tissue splicing level is essentially mean + noise). Only for junctions that do spliced tissue-specifically due to biology, the model is able to make meaningful predictions (since there is signal beyond random noise across tissues). No delta psi filtering means making the problem harder, therefore correlation lower. Does that make sense?

Hi Jun,

Thanks for the clarification, this clears things up.

Derek