Help with filtering bisample_life_stage tracks in variant prediction

Hello,

I am interested in heart expression in embryonic tissue. I am using the variant prediction scores in this way:

variant = genome.Variant(
    chromosome=chrom,
    position=pos,
    reference_bases=ref,
    alternate_bases=alt,
)

interval = variant.reference_interval.resize(dna_client.SEQUENCE_LENGTH_1MB)

variant_scores = dna_model.score_variant(
    interval=interval,
    variant=variant,
    variant_scorers=[variant_scorers.RECOMMENDED_VARIANT_SCORERS['RNA_SEQ']],
)

df_scores = variant_scorers.tidy_scores(variant_scores)

ontology_terms = ['UBERON:0000948'] # heart

However, looking at the df scores, there is no filter for biosample_life_stage which we can see elsewhere.

I can see this info in variant_scores:
variant_scores[0].var[variant_scores[0].var['ontology_curie'] == 'UBERON:0000948']

name strand Assay title ontology_curie biosample_name biosample_type biosample_life_stage gtex_tissue data_source endedness genetically_modified nonzero_mean
182 UBERON:0000948 polyA plus RNA-seq + polyA plus RNA-seq UBERON:0000948 heart tissue adult encode paired False 0.340610
183 UBERON:0000948 total RNA-seq + total RNA-seq UBERON:0000948 heart tissue adult encode paired False 0.087664
453 UBERON:0000948 polyA plus RNA-seq - polyA plus RNA-seq UBERON:0000948 heart tissue adult encode paired False 0.340610
454 UBERON:0000948 total RNA-seq - total RNA-seq UBERON:0000948 heart tissue adult encode paired False 0.087664
595 UBERON:0000948 polyA plus RNA-seq . polyA plus RNA-seq UBERON:0000948 heart tissue embryonic encode single False 0.467385

So from the df_scores output I can infer that if I filter to non stranded, I am always looking at the embryonic tissue track:

df_scores[(df_scores['ontology_curie'] == 'UBERON:0000948') & (df_scores['track_strand'] == '.')]

However, I’m not sure how robust this is for any result. Is there a better way to directly filter to the track I want based on life stage? Or am I misunderstanding the output. Please let me know. Thanks!

Hi @Audrey ,

Thanks for the report! Yes it looks like we’re inadvertently dropping columns in the tidy_scores function. Will try and get this fixed ASAP.

This should now be fixed with this commit.

We’ll push a new package version later this week, but in the meantime you can install from HEAD using the following: pip install git+https://github.com/google-deepmind/alphagenome

Thanks again for the bug!

Great, thanks for the fix!