Predict_interval throughput/server-side output averaging

howdy! I was hoping to evaluate AlphaGenome’s full output on an external benchmark, GUANinE. Unfortunately, querying all ~5k output tracks at full 1Mb context size with predict_interval seems to be rather slow (at 1 inference/minute).

I believe this is due to the (enormous) amount of data serialized & sent to the client – one vector of tracks per each of the ~ million base pairs. Is there any way to either
a) have AlphaGenome average_pool the tracks at specifiable interval widths (e.g. center, center+/- 1bp, center +/-2 bp, center +/-4 bp, … center +/- 500k bp) or
b) send a lossy, low-rank (i.e. PCA component) approximation of the outputs?

one (or both) of these could substantially improve throughput (which again, appears to be a networking rather than computational bottleneck since variant-based throughput is closer to 24 inferences/min).

I ask because even the smallest task of GUANinE, dnase_propensity, requires ~ 105k inferences for a few-shot evaluation (which would take about 10 weeks of client runtime :grimacing:; and multi-threading hits the Mb rate limit quota)

as an addendum/clarifier, this could be addressed by enabling the use of concurrent & concentric CENTER_MASK scorers that can apply MEAN or SUM aggregation within the existing BaseIntervalScorer, which currently only supports gene masking

Hi @eyes_r, welcome to the forum!

Yes if you’re asking for all outputs with a 1Mb sequence it can be quite large! We do compress predictions, and as predictions are somewhat sparse this give a typical 10x size reduction (~5.5GiB → 760MiB). We contemplated lossy compression, but prioritized reproduction of our results over network bandwidth.

It’s strange that it takes O(1min) for a full prediction: on my home wifi I can get predictions in more like O(15s)… could your network connection be a bit slow?

We’ll add center mask interval scoring to our backlog, but in the meantime you might be better off requesting only the outputs you need for a specific benchmark. e.g. for dnase_propensity I assume you only need the DNase outputs? Just requesting that would be an order of magnitude smaller than asking for everything. Filtering by ontology would further reduce the size.

Hope this helps, and look forward to seeing how things go with AlphaGenome on external benchmark!

@tward thanks for the welcome! And I appreciate the elaboration.

As for the question of local network – this is running on a high-RAM Colab instance, so unless the Compute Engine Region/Zone makes a difference, I don’t believe that’s the bottleneck?

As for filtering tracks/ontologies, this could certainly mitigate the bandwidth issues, and it’s usually fine for the DNase/cCRE tasks (e.g. SEI’s dnase_propensity score is about 99% the same before/after filtration)… however, this both:

a) presumes non-DNase tracks are completely orthogonal to DNase signal (this varies by model – CTCF tracks tend to help in SEI, etc)

b) doesn’t work for conservation tasks in GUANinE (e.g. cons30), since there isn’t a singular ‘conservation score’ in AlphaGenome (or most other models)

Either way, such a restriction would proscribe AlphaGenome from being evaluated in an apples-to-apples (filtered vs non-filtered) setting against other models.

Again, the variant interpretation speed is quite good – while scoring is incomplete, AlphaGenome handily beat Pangolin on the new ClinVar task releasing with v1.1 of the benchmark, but it seems most of the engineering/prototyping has been limited this specific use case of interval prediction.

If it’s any easier, is there a way to just crop interval outputs before sending them? (i.e. inference on 1 Mbp but return results for ~ 200 kbp?).

=======================

P.S., the float32 representations of the 5_168 tracks (excluding contact maps & splice junctions) seems to be sitting at about 9.6 GiB before metadata (am I perhaps using a different subset than your 5.5 GiB?) Matrix sizes are as follows:

(1_048_576, 167) ~= 668 Mib
(1_048_576, 546) ~= 2_184 Mib
(1_048_576, 305) ~= 1_220 Mib
(1_048_576, 667) ~= 2_668 Mib
(1_048_576, 4) ~= 16 Mib
(1_048_576, 734) ~= 2_936 Mib
(1_048_576, 12) ~= 48 Mib
(8_192, 1_116) ~= 34 Mib
(8_192, 1_617) ~= 50 Mib

I have seen the occasional first-time inference run in closer to ~30 seconds, however, I think the API max bandwidth quota kicks in after a while and pushes this down to 1 inf/min.

Thanks for the clarifications! Responses inline:

I have seen the occasional first-time inference run in closer to ~30 seconds, however, I think the API max bandwidth quota kicks in after a while and pushes this down to 1 inf/min.

Ah yes that makes sense, you’re likely getting throttled after the first query. Network bandwidth definitely won’t be an issue if you’re running Colab instances on Google Cloud :smiley:

the float32 representations of the 5_168 tracks (excluding contact maps & splice junctions) seems to be sitting at about 9.6 GiB before metadata (am I perhaps using a different subset than your 5.5 GiB?) Matrix sizes are as follows:

Ah, this disparity is because we transfer predictions as bfloat16’s, as this is the precision our model computes. We then upcast to float32 on the client, so that users don’t inadvertently do analysis at low precision (e.g. aggregations). So you’re right that predict_interval returns 9.6GiB, but on the wire this is usually more like 800MiB with compression.

If it’s any easier, is there a way to just crop interval outputs before sending them? (i.e. inference on 1 Mbp but return results for ~ 200 kbp?).

They’re about as hard as each other, though cropping non-track data like splice junctions would be a little more involved. We’ll try and add center mask interval scoring, but we’re pretty swamped at the moment.

In the interim, we’ll try and increase the bandwidth quotas to allow for you to make more predictions :smiley:

Great to hear our model is performing well on ClinVar, look forward to any further results!