Hey there,
I have been setting up a pipeline for variant prediction and would like to export the values underlying the track visualization.
However, I ran into a problem when trying to acess the RNA-Seq data, as there are multiple assays stored in output.reference.rnaseq and I don’t understand how to access/export only the values of one RNA-Seq assay at a time (Eg. I just want the values for total RNA-Seq on the positive strand).
Would really appreciate some help on this!
Hi There!
Thanks for reaching out.
For any given biosample you may have up to four output tracks for RNAseq: ‘polyA plus RNA-seq’ and ‘total RNA-seq’, each for both the positive and negative strand. To extract values for a specific RNA-seq assay and strand, you can use the filtering methods available on the TrackData object. First, isolate the strand using .filter_to_positive_strand(). Next, apply .filter_tracks() with a boolean mask on the .metadata DataFrame to select the desired assay (e.g., ‘total RNA-seq’). Finally, access the .values attribute to get the raw numpy array for exporting.
# 1. Filter to positive strand
pos_tracks = output.reference.rna_seq.filter_to_positive_strand()
# 2. Filter to specific assay
mask = pos_tracks.metadata['Assay title'] == 'total RNA-seq'
specific_track = pos_tracks.filter_tracks(mask.values)
# 3. Export values
exported_values = specific_track.values
Kind regards,
Tumi