Hi,
I’m a researcher working on DNA regulation, and we generated and inserted some DNA sequences around the AXIN2 gene.
I’d like to run the predictions of RNA-seq signal for our sequences, however I struggle to use the output of predict_sequence.
Here is my code (an extract from the relevant part):
output = dna_model.predict_sequence(
sequence=seq_string.center(dna_client.SEQUENCE_LENGTH_500KB, 'N'), # Pad to valid sequence length.
organism=dna_client.Organism.HOMO_SAPIENS,
requested_outputs=[dna_client.OutputType.RNA_SEQ],
ontology_terms=['CL:0000236'], # B-cell
#interval=myinterval
)
print(f'RNA-SEQ predictions shape: {output.rna_seq.values.shape}')
seq_string
is a DNA sequence of 393216 bp that I pad to the ~500kb length of the model
The prediction works correctly, but I’d like to visualize the output as a track, and I’m not sure how to do that.
Basically the plot_components.plot
function fails because it does not know the Interval in the human genome, but:
- I tried to add a corresponding interval in the predict_sequence() function (as mentionned in your docs, but it does not work: alphagenome.models.dna_client.DnaClient — AlphaGenome
- I tried to “make a fake interval” but apparently the output.rna_seq.interval field is immutable
- I saw the
predict_variant
alternative, and I really like the visualization output, but it does not apply in my case because the sequences are too different (and do not really correspond to different ‘variants’, the way they are usually defined)
How should I proceed?
Thanks in advance