Versioning for AlphaGenome API and model

Hello,

First, thank you for your amazing contributions with this project!

My feature request is to offer clearer support for querying specific model and API versions. I have a set of benchmark questions which I frequently re-run, and I’ve noticed the answers change over time with changes to the AG model and API. For example, I have one query: Analyze variant chr22:36201698:A>C with ATAC-seq for B cells (CL:0000236). What is the quantile_score for B cell chromatin accessibility?
When I ran this query ~ 3 weeks ago, the returned answer was ~ 0.7818. When I re-ran today, the score was ~ 0.7379. I suspect the difference may be due to changes in the AG API or model over time. Ideally, I’d love to be able to point to specific versions of the API or underlying model when running these queries, e.g. something like:
``
dna_model = dna_client.create(os.getenv(‘ALPHAGENOME_API_KEY’), model_version=”1.0.0”)
```

This feature would hopefully allow a more apples-to-apples comparison between queries over time. This is just a suggestion; if there’s an existing feature that would resolve my problem or a better solution, I defer to your direction.

Thank you in advance!

Joe

Hi Joe,

Thanks for your question! We have not explicitly made any changes to the served model since launch back in June.

However, unfortunately our model outputs are not fully deterministic due to factors outside our control. For example, a common cause of our model’s non-determinism comes from XLA’s GPU compilation which, even with enabling deterministic ops, can still cause small differences even between machines running the exact same model.

This is somewhat amplified in the variant scoring use-case, when aggregating these small differences across the sequence dimension can yield larger differences. For your specific example, the raw score for that specific variant is 0.0315, which is relatively small and thus more likely to be affected by these small deltas.

I hope that helps to explain what you have observed, and that it isn’t causing any issues in your downstream research!

Thank you for the explanation!