Predict_interval throughput/server-side output averaging

eyes_r · September 2, 2025, 5:05pm

howdy! I was hoping to evaluate AlphaGenome’s full output on an external benchmark, GUANinE. Unfortunately, querying all ~5k output tracks at full 1Mb context size with predict_interval seems to be rather slow (at 1 inference/minute).

I believe this is due to the (enormous) amount of data serialized & sent to the client – one vector of tracks per each of the ~ million base pairs. Is there any way to either
a) have AlphaGenome average_pool the tracks at specifiable interval widths (e.g. center, center+/- 1bp, center +/-2 bp, center +/-4 bp, … center +/- 500k bp) or
b) send a lossy, low-rank (i.e. PCA component) approximation of the outputs?

one (or both) of these could substantially improve throughput (which again, appears to be a networking rather than computational bottleneck since variant-based throughput is closer to 24 inferences/min).

I ask because even the smallest task of GUANinE, dnase_propensity, requires ~ 105k inferences for a few-shot evaluation (which would take about 10 weeks of client runtime ; and multi-threading hits the Mb rate limit quota)

eyes_r · September 3, 2025, 6:47pm

as an addendum/clarifier, this could be addressed by enabling the use of concurrent & concentric CENTER_MASK scorers that can apply MEAN or SUM aggregation within the existing BaseIntervalScorer, which currently only supports gene masking

tward · September 5, 2025, 10:44am

Hi @eyes_r, welcome to the forum!

Yes if you’re asking for all outputs with a 1Mb sequence it can be quite large! We do compress predictions, and as predictions are somewhat sparse this give a typical 10x size reduction (~5.5GiB → 760MiB). We contemplated lossy compression, but prioritized reproduction of our results over network bandwidth.

It’s strange that it takes O(1min) for a full prediction: on my home wifi I can get predictions in more like O(15s)… could your network connection be a bit slow?

We’ll add center mask interval scoring to our backlog, but in the meantime you might be better off requesting only the outputs you need for a specific benchmark. e.g. for dnase_propensity I assume you only need the DNase outputs? Just requesting that would be an order of magnitude smaller than asking for everything. Filtering by ontology would further reduce the size.

Hope this helps, and look forward to seeing how things go with AlphaGenome on external benchmark!

eyes_r · September 5, 2025, 3:21pm

@tward thanks for the welcome! And I appreciate the elaboration.

As for the question of local network – this is running on a high-RAM Colab instance, so unless the Compute Engine Region/Zone makes a difference, I don’t believe that’s the bottleneck?

As for filtering tracks/ontologies, this could certainly mitigate the bandwidth issues, and it’s usually fine for the DNase/cCRE tasks (e.g. SEI’s dnase_propensity score is about 99% the same before/after filtration)… however, this both:

a) presumes non-DNase tracks are completely orthogonal to DNase signal (this varies by model – CTCF tracks tend to help in SEI, etc)

b) doesn’t work for conservation tasks in GUANinE (e.g. cons30), since there isn’t a singular ‘conservation score’ in AlphaGenome (or most other models)

Either way, such a restriction would proscribe AlphaGenome from being evaluated in an apples-to-apples (filtered vs non-filtered) setting against other models.

Again, the variant interpretation speed is quite good – while scoring is incomplete, AlphaGenome handily beat Pangolin on the new ClinVar task releasing with v1.1 of the benchmark, but it seems most of the engineering/prototyping has been limited this specific use case of interval prediction.

If it’s any easier, is there a way to just crop interval outputs before sending them? (i.e. inference on 1 Mbp but return results for ~ 200 kbp?).

=======================

P.S., the float32 representations of the 5_168 tracks (excluding contact maps & splice junctions) seems to be sitting at about 9.6 GiB before metadata (am I perhaps using a different subset than your 5.5 GiB?) Matrix sizes are as follows:

(1_048_576, 167) ~= 668 Mib
(1_048_576, 546) ~= 2_184 Mib
(1_048_576, 305) ~= 1_220 Mib
(1_048_576, 667) ~= 2_668 Mib
(1_048_576, 4) ~= 16 Mib
(1_048_576, 734) ~= 2_936 Mib
(1_048_576, 12) ~= 48 Mib
(8_192, 1_116) ~= 34 Mib
(8_192, 1_617) ~= 50 Mib

I have seen the occasional first-time inference run in closer to ~30 seconds, however, I think the API max bandwidth quota kicks in after a while and pushes this down to 1 inf/min.

tward · September 5, 2025, 6:18pm

Thanks for the clarifications! Responses inline:

I have seen the occasional first-time inference run in closer to ~30 seconds, however, I think the API max bandwidth quota kicks in after a while and pushes this down to 1 inf/min.

Ah yes that makes sense, you’re likely getting throttled after the first query. Network bandwidth definitely won’t be an issue if you’re running Colab instances on Google Cloud

the float32 representations of the 5_168 tracks (excluding contact maps & splice junctions) seems to be sitting at about 9.6 GiB before metadata (am I perhaps using a different subset than your 5.5 GiB?) Matrix sizes are as follows:

Ah, this disparity is because we transfer predictions as bfloat16’s, as this is the precision our model computes. We then upcast to float32 on the client, so that users don’t inadvertently do analysis at low precision (e.g. aggregations). So you’re right that predict_interval returns 9.6GiB, but on the wire this is usually more like 800MiB with compression.

If it’s any easier, is there a way to just crop interval outputs before sending them? (i.e. inference on 1 Mbp but return results for ~ 200 kbp?).

They’re about as hard as each other, though cropping non-track data like splice junctions would be a little more involved. We’ll try and add center mask interval scoring, but we’re pretty swamped at the moment.

In the interim, we’ll try and increase the bandwidth quotas to allow for you to make more predictions

Great to hear our model is performing well on ClinVar, look forward to any further results!

Topic		Replies	Views
AlphaGenome prediction requests per minute Help & Support	1	1183	August 13, 2025
Validation of API usage Help & Support	5	1767	September 16, 2025
What are the AlphaGenome API Limits? Help & Support	3	1324	November 12, 2025
Interpreting output prediction inconsistency based on the length of input interval Feedback & Feature Requests testing	1	1301	August 19, 2025
Predict gene-level expression given sequences Feedback & Feature Requests	4	2109	November 17, 2025

Predict_interval throughput/server-side output averaging

Related topics