Limiting Batch SNP Predictions to Specific Biosamples or Assays to Reduce Runtime and File Size

Hi @Maryam_Dashtiahangar , welcome to the forum!

We currently don’t have a way to filter score_variant requests, but you should be able to filter the responses to reduce the dimensionality. There’s no compute overhead per se (we compute all assays in parallel regardless), but you’re right in that we do send back predictions that you don’t necessarily want, but they should be pretty small.

Example of filtering by ontology CL:0000084:

from alphagenome import colab_utils
from alphagenome.data import genome
from alphagenome.models import dna_client
from alphagenome.models import variant_scorers


model = dna_client.create(colab_utils.get_api_key())
variant = genome.Variant.from_str('chr10:120714877:G>T')
interval = variant.reference_interval.resize(2**20)

scores = model.score_variant(interval, variant)
scores = variant_scorers.tidy_scores(scores)

filtered_scores = scores[scores['ontology_curie'] == 'CL:0000084']

Hope this helps!