Hi. I have been using alphagenome’s score_ism_variants function and just realized that it does quite a lot of unnecessary forward passes:
Forward passes in score_ism_variants
ism_variants generates 3 × L variants (3 non-reference substitutions at each of L positions in the ism_interval).
Each variant scored via score_variant → _predict_variant makes 4 forward passes:
apply_fn(reference_sequence, …) — main model, ref
apply_fn(alternate_sequence, …) — main model, alt
junctions_apply_fn(ref_embeddings, …) — splice junction head, ref
junctions_apply_fn(alt_embeddings, …) — splice junction head, alt
All forward passes use batch size = 1 (sequence is encoded with [np.newaxis] giving shape [1, S, 4]).
Each of the 3L variants gets its own thread, and within that thread _predict_variant runs 4 forward passes serially: main-ref → main-alt → junction-ref → junction-alt. The parallelism is only across variants (threads), not within a single variant’s scoring.
This seems quite wasteful as it does 3L as many forward passes for the reference sequence, and also uses a batch size of 1 when in theory more could fit (at least for smaller context sizes).
I was wondering if you have considered making the ISM workflow more efficient. Is there a way to increase batch size at least, or do one forward pass for the ref allele? I am being quite bottlenecked by this at this time so any help or suggestion (or code changes) would be much appreciated
Thank you!