Help with filtering bisample_life_stage tracks in variant prediction

Audrey · November 4, 2025, 3:13pm

Hello,

I am interested in heart expression in embryonic tissue. I am using the variant prediction scores in this way:

variant = genome.Variant(
    chromosome=chrom,
    position=pos,
    reference_bases=ref,
    alternate_bases=alt,
)

interval = variant.reference_interval.resize(dna_client.SEQUENCE_LENGTH_1MB)

variant_scores = dna_model.score_variant(
    interval=interval,
    variant=variant,
    variant_scorers=[variant_scorers.RECOMMENDED_VARIANT_SCORERS['RNA_SEQ']],
)

df_scores = variant_scorers.tidy_scores(variant_scores)

ontology_terms = ['UBERON:0000948'] # heart

However, looking at the df scores, there is no filter for biosample_life_stage which we can see elsewhere.

I can see this info in variant_scores:
variant_scores[0].var[variant_scores[0].var['ontology_curie'] == 'UBERON:0000948']

	name	strand	Assay title	ontology_curie	biosample_name	biosample_type	biosample_life_stage	data_source	endedness	genetically_modified	nonzero_mean
182	UBERON:0000948 polyA plus RNA-seq	+	polyA plus RNA-seq	UBERON:0000948	heart	tissue	adult	encode	paired	False	0.340610
183	UBERON:0000948 total RNA-seq	+	total RNA-seq	UBERON:0000948	heart	tissue	adult	encode	paired	False	0.087664
453	UBERON:0000948 polyA plus RNA-seq	-	polyA plus RNA-seq	UBERON:0000948	heart	tissue	adult	encode	paired	False	0.340610
454	UBERON:0000948 total RNA-seq	-	total RNA-seq	UBERON:0000948	heart	tissue	adult	encode	paired	False	0.087664
595	UBERON:0000948 polyA plus RNA-seq	.	polyA plus RNA-seq	UBERON:0000948	heart	tissue	embryonic	encode	single	False	0.467385

So from the df_scores output I can infer that if I filter to non stranded, I am always looking at the embryonic tissue track:

df_scores[(df_scores['ontology_curie'] == 'UBERON:0000948') & (df_scores['track_strand'] == '.')]

However, I’m not sure how robust this is for any result. Is there a better way to directly filter to the track I want based on life stage? Or am I misunderstanding the output. Please let me know. Thanks!

tward · November 5, 2025, 4:22pm

Hi @Audrey ,

Thanks for the report! Yes it looks like we’re inadvertently dropping columns in the tidy_scores function. Will try and get this fixed ASAP.

tward · November 5, 2025, 7:22pm

This should now be fixed with this commit.

We’ll push a new package version later this week, but in the meantime you can install from HEAD using the following: pip install git+https://github.com/google-deepmind/alphagenome

Thanks again for the bug!

Audrey · November 6, 2025, 7:22pm

Great, thanks for the fix!

Topic		Replies	Views
Limiting Batch SNP Predictions to Specific Biosamples or Assays to Reduce Runtime and File Size Feedback & Feature Requests	1	1035	July 23, 2025
Predict gene-level expression given sequences Feedback & Feature Requests	4	2120	November 17, 2025
Eqtl analysis using alphagenome Help & Support	3	1261	October 15, 2025
What is the correct ontology_curie (UBERON ID) to pull gtex_tissue metadata for GTEx Skeletal Muscle? Help & Support	2	20	June 8, 2026
Exporting output data Help & Support	3	118	March 19, 2026

Help with filtering bisample_life_stage tracks in variant prediction

Related topics