How to interpret junction_End starting in an intron?

Hello all,

I am wondering how I should interpret the “smaller_splice” in the image / table below.

Some context: this is a gene on the negative strand, so I would understand that the junction_Start is the 3′ splice acceptor site and junction_End 5′ splice donor site.

The “big_splice” seems to make sense to me, starting at the end of exon1 going to the cassette exon, but the “smaller_splice” seems to start in the middle of the intron, does this mean that alphagenome predicts that anything before is an exon, or could it be that it is a second splice donor that is ignored after the one that should be there at the end of exon1?

Thank you in advance for the feedback.

junction_Start junction_End name output_type variant_scorer track_name track_strand Assay title ontology_curie biosample_name biosample_type raw_score quantile_score
173915186 173917218 big_splice SPLICE_JUNCTIONS SpliceJunctionScorer() junction_UBERON:0002107 polyA plus RNA-seq . polyA plus RNA-seq UBERON:0002107 liver tissue 0.20898438 0.993351
173915186 173917218 big_splice SPLICE_JUNCTIONS SpliceJunctionScorer() junction_UBERON:0002107 total RNA-seq . total RNA-seq UBERON:0002107 liver tissue 0.1607666 0.9912486
173915115 173916406 smaller_splice SPLICE_JUNCTIONS SpliceJunctionScorer() junction_UBERON:0002107 total RNA-seq . total RNA-seq UBERON:0002107 liver tissue 0.12902832 0.9890026
173915115 173916406 smaller_splice SPLICE_JUNCTIONS SpliceJunctionScorer() junction_UBERON:0002107 polyA plus RNA-seq . polyA plus RNA-seq UBERON:0002107 liver tissue 0.11065674 0.9868027

Hi, it looks like the “smaller_splice” starts (5’, on the right) from an annotated exon in one of the transcript? The end of the junction (left side) looks like a newly predicted acceptor. This does not mean the model is predicting a big exon before the smaller_splice junction, there might be another smaller junction being predicted, try lower filter_threshold argument and see if it is being visualized in the reference and alternative predictions. Note that, looks like you are looking at the variant prediction annData, there only junctions relevant to variant score is shown.