Limiting Batch SNP Predictions to Specific Biosamples or Assays to Reduce Runtime and File Size

Maryam_Dashtiahangar · July 22, 2025, 7:10am

Hello Team,

I recently ran a batch prediction on 3,000 autoimmune‑disease–associated SNPs. I know that when predicting individual variants we can specify which assays or biosamples to include, but I was wondering how to narrow down the output for a batch run to only the relevant cell types—specifically T cells. The job took over an hour and generated a 9 GB dataframe containing many assays and biosamples that I don’t need. Is there a way to specify which biosamples or assays to include when scoring a batch of variants, so that we can avoid computing predictions for irrelevant data?

Thank you!

Best regards,
Maryam

tward · July 23, 2025, 9:33am

Hi @Maryam_Dashtiahangar , welcome to the forum!

We currently don’t have a way to filter score_variant requests, but you should be able to filter the responses to reduce the dimensionality. There’s no compute overhead per se (we compute all assays in parallel regardless), but you’re right in that we do send back predictions that you don’t necessarily want, but they should be pretty small.

Example of filtering by ontology CL:0000084:

from alphagenome import colab_utils
from alphagenome.data import genome
from alphagenome.models import dna_client
from alphagenome.models import variant_scorers


model = dna_client.create(colab_utils.get_api_key())
variant = genome.Variant.from_str('chr10:120714877:G>T')
interval = variant.reference_interval.resize(2**20)

scores = model.score_variant(interval, variant)
scores = variant_scorers.tidy_scores(scores)

filtered_scores = scores[scores['ontology_curie'] == 'CL:0000084']

Hope this helps!

Topic		Replies	Views
Is there a list of supported ontologies? Getting error for "UBERON:0001826" Help & Support	2	228	June 26, 2025
I made a easy to use screening method for larger variant dataset Community	0	68	July 22, 2025
Variant effect prediction seems to have a random component Help & Support	3	71	July 18, 2025
Why is AlphaGenome working fine with organ and not on it's sub-anatomical structures? Help & Support	1	50	July 23, 2025
Identifying Regions of Allelic Disruption Help & Support	1	71	July 10, 2025

Limiting Batch SNP Predictions to Specific Biosamples or Assays to Reduce Runtime and File Size

Related topics