Interpreting output prediction inconsistency based on the length of input interval

ken_man · August 18, 2025, 10:29am

Hi everyone,

I am trying to test-run and understand alphagenome prediction with the TG in different tissue context and different sizes of intervals (a gene selectively expressed in the thyroid gland). I have found some discrepancy in the predictions where my cumulative 1KB interval predictions across that 1MB interval result in a different prediction value than my 1MB prediction. This led me to believe that regulatory elements/ promoter binding motifs must be included in the prediction interval for a more accurate prediction. But this also led me to a couple of questions:

Would 1MB interval prediction always be better than any smaller interval prediction, in order to include the most potential regulatory motifs? We can then narrow down the window for plotting instead of having a prediction on a shorter interval. But which prediction interval would be more “reliable", or more accurate?
How shall I interpret the y-axis of the graph? Is the prediction a predicted “raw count” of RNA-seq reads, or a normalised reads? If it is normalised, what would it be normalised against? Are these predictions with different intervals comparable?

Thanks so much

Best wishes,

Ken

Guido_Novati · August 19, 2025, 8:43am

Hi Ken,

A 1MB interval prediction is more reliable. As you say, the larger window captures more potential regulatory context, which is why summing smaller interval predictions won’t equal the prediction from a single large interval. Also, the model’s performance may improve when the prediction interval matches the 1MB sequence length it was trained on. Your best approach is to generate the prediction on the 1MB interval and then narrow the window for plotting.
The y-axis represents the predicted read coverage for RNA-seq. The training data, and therefore the model’s predictions, are normalized to a common factor of 1 million reads multiplied by a common read length of 100. The units do not depend on the interval size.

Topic		Replies	Views
How to visualize predictions made with predict_sequence? Help & Support	5	236	August 11, 2025
Predict gene-level expression given sequences Feedback & Feature Requests	1	186	July 18, 2025
Use alphagenome prediect wheat genome Feedback & Feature Requests testing	1	470	July 17, 2025
How can I add the pig genome Help & Support	2	241	July 14, 2025
AlphaGenome prediction requests per minute Help & Support	1	144	August 13, 2025

Interpreting output prediction inconsistency based on the length of input interval

Related topics