In my recent exploration of genomic prediction tools, I utilized AlphaGenome, an advanced deep learning framework designed for modeling DNA sequences and predicting functional genomic elements such as RNA-seq profiles, splice sites, and chromatin accessibility. My goal was to apply this tool to wheat genome sequences, particularly focusing on gene regions annotated in chromosome 5A (e.g., chr5A:587411454-587423416
), which contains key genes involved in developmental or stress-response pathways.
The workflow began with interval normalization: since AlphaGenome only accepts specific sequence lengths (such as 2KB, 16KB, 100KB, etc.), I used the .resize()
method to extend my region of interest to exactly 16KB, ensuring that the center point remained unchanged to preserve biological context. I then retrieved the corresponding DNA sequence and passed it into the model via the predict_sequence()
API, requesting outputs like RNA_SEQ
, SPLICE_SITES
, and SPLICE_SITE_USAGE
.
One of the most powerful features of AlphaGenome is its ability to return high-resolution track data, including predictions for multiple tissues or conditions. This allowed me to compare predicted expression patterns across different assays and identify potential alternative splicing events through splice junction predictions. I visualized these results using built-in plotting components like plot_components.Tracks()
and plot_components.TranscriptAnnotation()
, which helped overlay transcript structures with predicted RNA expression levels.
However, there were some challenges. The requirement for fixed-length intervals sometimes led to the loss of biological context if the original region was much smaller than the target length. Also, while the API is well-documented, the lack of native support for plant genomes (especially wheat) meant that I had to manually map gene annotations and ensure correct strand handling. Additionally, the need to manually assign .interval
fields to TrackData
objects was cumbersome and could be error-prone.
Suggestions
-
Support for Plant Genomes Out-of-the-Box
Currently, AlphaGenome seems optimized for mammalian genomes (e.g., human, mouse). Adding pre-trained models or annotation mappings for common crop species like wheat, rice, and maize would greatly enhance usability in agricultural genomics. -
Flexible Interval Handling
Allow dynamic padding or trimming strategies (e.g., “pad_left”, “center”, “trim_start”) to better preserve biological relevance when resizing intervals. -
Mutable TrackData Objects
Consider makingTrackData
mutable or providing helper functions to easily update attributes like.interval
, rather than requiring manual use ofdataclasses.replace()
. -
Batch Processing Improvements
While batch prediction is possible usingThreadPoolExecutor
, adding higher-level wrappers or integration with Dask/Biopython pipelines could streamline large-scale analyses. -
Visualization Enhancements
Improve visualization components to allow easier customization of color schemes, labels, and multi-track comparison — especially useful for comparing variants or splice isoforms.
Conclusion
Overall, AlphaGenome provides a robust and flexible platform for genomic prediction tasks. Its integration of deep learning with genomic interval manipulation makes it ideal for studying gene regulation, splice variation, and expression dynamics. With minor enhancements to support non-model organisms like wheat, it could become a go-to tool for both basic research and applied crop breeding programs.
code I used in colab:
from IPython.display import clear_output
! pip install alphagenome
clear_output()
from alphagenome.data import gene_annotation, genome, track_data, transcript
from alphagenome.models import dna_client
from alphagenome.visualization import plot_components
from google.colab import userdata
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
dna_model = dna_client.create(userdata.get('ALPHA_GENOME_API_KEY'))
sequence = 'ACATCATAACTATGCGCTCTTCTATTAATTGTTCAACAGTAATTTGTTTACCCACAGTTACTACGCTCATGAGAGAGATGCCTCTAGTGAAAGCTATGGCCCCCGCGTCCATTCATAGTATATTACTAAAATCCTAAATACCTTGCTGCAATTTATTTAATTGTTTTGTTTTACAATTTATCTATCTATCATTACCAGAATTAATCTTGCAATTAATGAGTACAAGGGGATTATTAACCCTCTTGCCTGCATTGGGTGCAAGTATTTGTTTTGTGTGTGTGCAATTTATCTTTGTTTGCGTGAATCTTCTATTGGTTCGATAAACATTGGTTCTTAACCGAGGGAAATACTATCTGCTACTATACTACATCACCCTTCCTCTTTGGGGAAATCCCAACGCCTGTCACAAGTAGTAGAAGAATTTCCGACGCCGTTGACGAGGAGGTTTCACCAAAAAAATTAGGTACCTGCACACACACACACACACCTTATTTTCTTGCTTTATTTATGCTTTCTTTCATGATGACTCAATAAAATAAAAATAAAGACATATGGATCCTCATCCACTTGCTAATCTTTTCAAACTTGCTGCTTATTGGGATACATTTATTGAATGAAAAACATGATTGCAATGTTGTTAATATTAATTCCATGAAAGTTAATTGTGCTAATGACAATGACTGGGGTGACGATCATAATGCTATGAATATTAAAAGTGGGTTTGGAATCTCACTATTTTGGAGAACAATCAATATTACGAAATTTCTGATAAAAGTTGGTTTGGAGAGGTCATGACTTTAGTTGATGTTAATATAATCCCACTATCTTGGAAGATGATAAAATTTGCATGCATGTGAATCATGGAGAGAATATTTTATATGATAGCTATATTGTTGAATTTGATTGCGATTCTACATGTAATTATTTTGAGAAAGGAAAGTATAGCTATAGAAATCTTCATGTTACTCAATTACCTCTCTTGATGTTCAAATTAGTAATGTCTCATCCTTCTTTCTTGCATTTGCATATGCTAAATATTGCTTAGTTTGATAATTAGTTTCATTATGAAATGCATTTACATAGGAAGTGGATTAGACTTAAATGTGATCATTACATGTTATATGACGCTCTCTTTGTGTTTAAATTCTTGTATTTTATGTGAGTATCATTGAAATCTAAAGCCTATCTTAATGACTATAAAGAGCGAGCTTGTTGGGAGACAACCCGATAGTTATCTTTATTTTTCTTTTCTGCCTTGTGAGTCAACATGGTTATTGCTACTGTAATGATTGTGTTTTATCTTTTATTTTGAGTCTGTGCTAAGTAAAGTCTTTGTGATGATTTAGATGATAGTTGAATTGATTCTGTGCAAAAACAGAAAGTTTCGCGCCCAGTATTTGAATTTCACAATAAATACTTGAGTTCTTATGACTAATGTGAATTTGTGGACTTTAAGTACTTCCACCCTTCCATATTTTGCTAGCCTCTTTGGTACCATGCATTGCTCGTTCTTACCTTGAGACTTGGTGCAAACTTCACCGGTGCATCCAAATCCTGTGGTATGACACACTTTATCACACATAAACTTTATTGCACCCTTCGTCAAAACAAACACCATACCTACCTATCATGACATTTTCATAGTCATTCAGAGATATATTGTCATGCAACTTCCACGATTACTATTCACATGACTCGAGTGTTCATTGTCATTTTACTTTGCATGATCATATAGAGCTGACATGATATTTGTGGCAAAGCCACCGTTCGACATTGTTATACATGTTACGCTAGATCATTGCACATCCTGACACACTGCCAGAGGCATTCATATGTAGTGATATCATTCGGTTTATCGAGTTGTAAGTAAAAAGAAGTGTGATCATCATTATAAGAGCGTTGTCCCAGTGAGGGAAAGAATGATGGAGACTAATGAGTCCTCGAAAAAGTGGGAATGAGGTTCATGATATTGTTCTAAAAAAGGCCAAAATATCGTCAGCGGGCTCAAACTGCATAGTTCATTGGTGTTGATTAAAAAGTGTGCTGTTTAGAACTAGTGCAGAGTATCCATATCAATCATTCCTTTTCTTTTATGGCATGTTTCTCTCGGTTTATGATGGCAGGAAATGTATGGGCGGCAGAACCTTTTGCGTTGGTGACTCTGATGTCATCCCTATATACTCTCTTTCAAGTCCAGATGGGAAGACGACGAAGACATAAAGGTCCTGATTTATTTATTTTCAAGAATCACAATATACACAATGCTCATATTTGATAGGTTAATTCACATAAACAACACCCAGTAGAGACGGGTATCATGGAAACTTGAAAATTCATCACTCAAGCTACATATAGATTACATGGTAAATTACTCGTACAGCCATCTCAGCCCAGCTGGGAGGGAAACTGAGGTGGACAAAGTGAAATATACACATCGCTGCAGCTTGCTACTTTACTCTGATTTCTTTTCCTTTCCCCTCCAAAGGGGTCAGGCGTGCTAGCAACCGCAACATACACCAGGCTGGCCGGTGCAACTTGTTACCCTCTACTGAATAGTACGCCTGTATGGGCTGGATGCCCTTCACCCGTTGATGTGGCTCACCATCCACGGTGGAAGCCCCGTCCGGGGTGGGGCCTGCGGCTGCACTGCCGCATCCTCTGCCCTCTCGCCTGTTGCCGCTGGATGAATGCTGCACAACCAAAGCGAACACAAGCATATTAGTATTATGTTACTTAAATGTGGCCCGACAGAACTGCATAGAGACCAAAAGTGGGCTGCAAGGAACAACCCGACCAATTCAAAAGATGGTTACTTGAACTCTGAATAAGATTAGACTTGTGATGTACATCATCACCTGGTATTTGCGGCAGGGGGAGCATCCCTCAGCATGAAGGAAGAAGATGAAGAGCTGGTTTGAGGCTGAGTTTGATCTTGCTGCGCCGCATGGGCCTTCTGCTTCTCCACGAGCTGTGGGGAAGGGACACGGACCTCTGTCAATAAAATTTGCTATACGGAACAATTTAGACCGGTTGGACCATAAGGTGATATAACGGGCTTACTTCCTTCTGGAGAACTTTATTCTCCTCCTGCAGTGACCTCTCCTGCAGCAAGAACGATGTAATGAGGTTACGTGCATGTAGACCAGTTACTTGCATACATGTTATAATGTCACAAATCTTGAAACAAGCTAAGGCTTCATGACAAGTTGACCAGTTCGAATACCGAATAAGGTATGCAAGGTTGACAGCTTACCTTCTTCTGAAGCTCAGAAATGGATTCGTGCATAAGTTGGTTCTGCAAGTACAAAAAATGAAAAAAATTAGTCATTTTTTTAACTATGAAGAGCATATGACTAAAATGCATATACATATTTTGAACATCTCAGTCTAGAATCTGATTTCAGATATACTTTGCTGAACTTCTCTGCAAGTGTTTTTCTTGTTTTTTTATATATTGTGCTGCTGTATCAAATCATTTAAATCAGTACCTTCCTGGATCTGATATGTTTCAGTGAGCTTTCCAGCTGCTGCTCCAGTTGCTGCAACTCCTTGAGATTCAAAGATTCAAGATCCTCTCCCATGAGATGCCTGCAGATGCAGGATGTAGAAATTCAGTCCCATTTATCTTCAGAGTGTATACAATATACTGGCAATCAACCAAAATCGTTACAAATTACTTTTGACATTTCTGTATTGTCTCAACCTTCGCCTTCAGTTTCCTATATTCGTGACACCAGTTTCCCTGCACAAACAAATAACAATCGCATTGCTAACTTTGGGTGGGACAGGAAGGAACAATGACAGAAATGCTACAAATAGGCTAAGTTAGTTCAGAAATGAACAAAGAGACTTTAACAACATTGACACCACCACCACCAACAACAACAACAAAGCCTTTAGTCCCAAACAAGTTGGGGTAGGCTAGAGGTGAAACCCATAAGATCTCGCAACCAACTCATGGCTCTGGCACATGGATAGCAAGCTTCCACGCACCCCTGTCCATAGCTAGCTCTTTGGCTTTTGGGTTTCATCTCCATAAGAGTGGCTGAGTTTTTACGTTGGCTCGCCAAGCCTATCACAACCCTCCTCCTTTACCCCGGGGGCATAAATCAGGAAGGATTGTTCTGTGCAGTGAACTTGAATTTTCATAACTTCAAATCAATTATGTCACTGCAACAAGTGTTACTATCAGATGTAAAAATCATATTTCGTAATATTAAAAATGCTGCAGCGGAGGTAAATATTTAAATAAAATTTGGAATATTTTTTGGGTTAGTAATTACATGTTTAAATAGTCCTCGTCATACGTACATGCTGAATCTGTTTGTCAGCAATCAGATAGAACTGGTTGGATCCCTCAGTTGGAGCTACAACTTGGTTTGTGCAAAAATTGGCCAAAAAATTGAAGCGAGGTCCTGGAGTTCCGAGTTACAAAAATAAGAATATTGAGCCTAACTGGATCTTCTAAGTCTGCAGGATCTACTTTTTATCCTAGATGGTTAGCCTTAGACAAGCTGTAATAAGAAACTGCAGCAATATCGCAAACAATACAGAAAAATTGGAGATGAGAACTAATGTCATGGGGCTATCCTTAATGGCCATTAGTAGCATAATACAATCCGAAGGACTGGAGATTCAACAGTTTATCCTAAGAACTTAAGATCAATGACCAATCTAGTAACAAGTCTTAATAATCTAGGATAGCATTTTGTTATTGCAATCAAAATATTGTGATTTTCACTGATGCACACAAGTATTTTAAAGCCATTATAATATATAAGCTTGTCAATTTTTGTTGCTAAGCCCTTCAAAAACTCAGCCGGCTAATCCAGAATTGCTGATCTGGCCATGCATTGCAGTTTACTACGTACTAATTTAGTAACTCTTAAAGTCAGATCACATTTTTGTAGTTCTTCCAAAGGCAGTATGTACGAATCAGCACGCTAGTCTAGTTCAGTAGCCATGAAAGTAAATTGGTCAAGTGGTATTTGGTTACATTATTCGACCATGCCTAAGCATTCATCGAAGAAAAACTGATAATAGGTTACAAGAGAATCAAATAGACCAACAATCCATAGTTGCAAGTCAATTAACTTCATAGCGCATGGAAATCATTCAAAATATAGTGGCCTACAAATAAATTTCAGAACATCTGATATCGGCAACAATTTATTTCATAGCTAAAGGAAAGCAAACCGCTTGTTTTTCATTTTTACCTGAATTTCAGATTCACTTGAAACGAGAACCTTTTCTGCATAAGAATAGCGCTCATACCGTTCAAGAATTTTGTCCATACTGCATAAGAGAAAGAAATATCTAGTCATTGGAAGTACATGTAAACAAGATTAATGTCCAGTATATGCACAATCTCAAGAATTGATATATGATACATTTTGACATATCAGAAGCTTTTCTGTAATGCGTTGTTTCATGTCAGTATATGCAAATGAGATTCAAATGTGGTATACAGCATCTGGTCAAATAGTTTGCAAAGCATTTCACTGTGACGTGTAAATGATAGAAAGAGCACGAAACTGACTTCAAATTAAGGTTCTGGTGTTGTTCAAGTTTTGTTGGGAAAATAGAGTGACCCTTTTCTTCTGTCTATAGACTTAACGAAATTTGCAAAACAGAAGTTTAGGGGTGGAATTGTGCTAGGCAAGGTTAGTATAATTTCTGCCAGATTCAAATTATTTAACTATAGTGCTAGAGCTTCAGTTTAAGAAACGTTCCTTGATGTTGTATTTAAGTCATTAACTTTTTTACTACTCATGTTATTCTCCTCCTATACCTTCTCCCTGTTTTCCCGTAGGCTTTACCCAGCATCCTAACTATTTGTGGAGGTCAACGTGTTCTCTGTCACCACAAACATCGTGTCCTCAATTGTCACAAGTACCCACAACACATGGAAAATTGTCAAACGCTACCCTATAACCTATAGTAGGAATGAACCAACAATTAGTTATAGTGTTGAACTATGTGCTAGCCTATAACCCGCAAATGGTTACAATGGTCATCAAGTCAAAATCAGTTCAAGAGTGTGCAACAACTCACGTATCCAATCAGCTAGACTCACAAACCGATAATATAGAAGGGTGTGACCAACAACAATTATCTGGGCAAGAGCCATGCCTAGATAAGTAAGACAACACGAATGTGAGAACCATTTTAAAACCTAATGTGTGGGTCCATGCAAAGGTATTGTCAGTACAAATATAGGTGGTAGTAGCAAATTTTCAAATGAAAGCTTTGATGCCAAAGCAGATAAATGCCCGTGTGAAAAGAAAACATGCACATAGTAAGGTGGTCCAAGGGATCCCAAGGGTTAGGCAACCCTAGAAAATAGGCGCGGCTCTTATCGGGGTGACGACACACCGGGTGGTCTTGCTGGTTTAGTTTTGCCTTTGACGAGTGGCTTGTCCACCGGGGTTCCTTATTGTAAGCATCGTGTTATCATGGTGTGAGGATGTTCCATCGGATGCATGTGGTAATTTATAATGGACTACTTGGTGCACCCCTGCAGGGCTAAATCTTTTCGGAAGCCGTGCCCGCGGTTATGTGGCGACTTAAAAATTTATAATATCCGATTTTAGAGAACTTGACACTGAACCCAATTAAAATACACCAACCGCGTGCGTTACGTGATCGTCTCTTTTCCAAGGAGTTCGGGAAGTGAACACGGTGGGGTTATGACTGACTCATAAGTAGTTCAGGATCACTTCTTGATCATTAATAGTTTGCGACCGCTATGTGTAGTTTACTCTTCTTACTCTTGTACTCGTAAGTTAGCCACCATACAAATGCTTAATGCTTCCAGCCTCCTCACCACTTAACCCTTCCATACCCATTAATCTTTGCTAGTTTTGCTACCCTTGGTAATGAGATTGCTGGGTCCCCATGGCTCATAGATTACTACAACAGGTGCAGGTATAGATAAAGCAATGCTTGACGTGAGAGCGATGCATGTTTGCTTTTGGAGTTCTTCTTCTGCTTCTTCGTCGATCATAGGATGGGTTCCAGTTCAGGAGCCCGGGATTAGCAGGGTAGATGTCGTTCTTCTTTTTGTTTGATTTCATCCATAGTCGGATCCTGCTCTCCTGTATGATGATTGTTGTGTATTGATGTATTCATGTTGTAGCTTGTGGCGAGTGTAAACCTTTATCTGGTATTCTCATCTATTTAGTGCATGGTATGTTGTAATGATATCCACCTCGCTATGCGCTCGAAATGCGATTCTGCTCCGATCATGAATTCATCACGTGATCGGGATAGAATCTTGGGTGCTACAAGTTGCTCTGCGCAGCCCCCCTCCAGGACTAGGTCCGTCTGGTCCCTAAAGGGGGGCAAAGGATCCGCGGGCTACTCCTCCACCTTATGCCAAAAATTGAAGGCGAGAGGGAGCTCCTCTGGATGCGCTGTGGACGACAAAGGGGATGGTGGATTGGAGGCGTGTGAGATTGGAATAAGAGGCGTCGAAGATAGCATTGAAGCTAGAGAGCGGAGGGCGCCGGCCTTTTACAGGCCACCGCAGGCGCCATTAGGGAGGCACTTGGGGACGCGGGTGGCAGCTAATGGCGGGTAGTCGAAGGGCGCGCCTCCCTATGAGAAATGCTGCAGCGACCTACTCGCACACCTCCAGTCTGCTCACATGCAAGCAAGAAGATGATGCCGCATGGCGGTCATCCAATGGACCGCCATAAGAATCCGAGGGCGCGGGTTGCTTCAAATCCGACAGGAGGCGCGAGGAGGAAGCTTAAAACTTTAGCGGTTGGTTTTGGGGCGAGCTCCTTACTGGACCTCTCCAACCTCCAAATATGGAGGCCACGAGACCTGTGTATGGAGCGAGATAAAAGAAATATAGCGACGGAAGCCATGAAGACTCTGCTAGAGTTGCTCTAAGACCTAGTTGTTGCATCATCATGAGATGGGAAATCTGATGTACCGTGGAGCAAGGACATCAAATTGTACAATGTACAACTTCAGGAAATACTTGGCATTCAATTTTCCTTTTTGAGAAAAGACATGCACATGGTAAGTTGCTAAGACCAAGGCAAGAGATACAACAACAAAGCGTTTAGTCCCAAATAAGTTGGATAGGGTAGAGCTGAAATCCATAAGATCTCAAAACCAAGTAATGGTTCTGGCGTGTGGATAGCTAACTTCCACACACCCCTGTCCATGACTAGTTCCGTGTGATACTTTAGTGCTTCAGATCTCTCTTCACGGACTCCTTCCATGTCAAGTTTGGTCTAGCCCAACCTCTCTTACATTATTAGCATGTATTAACCGCCTGCTATGCACGAAAGTTTCTAAAGGCCTACTTTGTATATATCCAAGCCATCTCAGACGATGTTCGACAAGCTTCCCTTCAATCGGTGCTACCCCGACTCTACATGTATATCATCATTTTGGACTCGATCCTTTCTTGGGTGACCACACATCCATCTCAAGATGCGCATTTCTGCTACACCCAATTGTTGAATATGTCTCATTTTAGTCGGCCAATACTCAGCGCCATACATTGCGGGGCGCCGTCCTATCGAACCTGCCTTTTAGCTTTCGTGGCACTATCTTGTCAAGAGAGAACACCAGAAACTTGGCATCATTTCCTCCATCCGACTTTGATTTAATGGCTCACATCTTCATCAATATCGGCATCCTTTTCCAACACTGATCCCAAATATCAAAAGGTGCCCTTCTGAGGTACCACCTACCCATTGAGGCTAACCTCCTCGTGCCAAGTAGTACTGAAACCGCACCTCATGTACTCAATTTTAGTTCTACTAAGCCTAAAAACTTTGATCCAAAGTTTGTCTCCATAGCTCTAACTTTCCATTATGCCTCGAGCTACTATCATTGACTAGCACCACATCAATAGTAAAGAGAATACACCATGGAATATCTCCTTGCATATCCCTTGTGACCTCATCCATCATCAAGGCAAAAAGATAAGGGCTCGAAGTTGACCCTTGGTGCTGTCCTATTTTATGCGGAAGGTCATCAGTGTCGCCATCACTTTTTTGAACACTTGTCATGAGATCATTGTACATGTCCTTGATGAGAGTAATGTACTTTATTGGGACTTTGTGTTTCTCCAAGGACCACCACATGACATTTTGCAGTATCTTATTGCAGGACTTTTTCAAGTCAATGAACATCATATGCAAGTCTTCTTTTTGCTCCCTGTATTTCTCCATAAGTTTTCATACCAAGAAAATGACTTCCATGATCAACCTCTCAGGCATGAAAACAAACTGATTTTTGGTCACGCTTGCCATTACTCTTAAGTGGTGCCCAATGACTCCTATAACATTATTCTTTGTAAAGCCTCCCTAATCTCAGACCCCTAGATTCATCTCTCCATTGAATAGCTTGTCGGAGTACTCCTGTCATCTATGCTTGATCTCCTTGTCCTTCACCAGGAGTTGATCTGCTCTATCCTTGATGCATTTGACTTGTTTGACATCCCTCGCCTTCCTCACGATCTTAGCCATCTTATAGATGTCCCTTGCGCCTTCCTTCCCGACCCTCATATGCCCGGGCACAATGTGGTCTTTTGTACTCCCTGGTAAACATTAGTGCACAATGTCCCTTGCGGTTATTCTCTCACTAATTTTTACTAGATGTCCCTTGCGCCTTCCTTCCCGACCCTCATATGCCCGGGCATAACCACTAGTAAAAATTAGTGCAAGAATATGGTCTTTTGTACTCCCTGGTAAACAAATATAAGAGCGTTCAGATCACTCAAGTAGTGACCTAAACGCTCTTATATTTCTTTACGGAGGGAGTATCTTGGTATGGCAAACAAAGATATATTTTTTGCCCTTGACAAAACACATTTTTCAAAGGATCTTCAGAGACTTGAATTTTGAATTTTGATTATCAAAGTTGTAATACATGGCTCTCCACCACAATACCCTCGCGTTTGTAAGATAAAAGTGGTACAGAAAGTACTAGATGATAACCAATCTATGATTAGTGGGTTAGGAGTGCGGTTGTACTCCCAACATGGAAAAAAGGACAATGAAAAGAAGCCACTTAGGCGCTAAGCAGACTACCGCCTCACTTTTGCCATAGGCCAAGGTTTAGGCGCCCGACTATGTTCTCCATGACCGATGAGATCTTTTTCCGGCGACGGTAGGAAAGGAGAAGCGAATGCAGGGAAGAAGCTGCAGACACCACCATTATCGCTACATACAAATGACGAATGAAAGAGACCTAACGATACAAAGAGACGGGAAAAGGGGGATGCAAAGAAAATGGACGGTCTTGTTTGGGCTATGTTGTCAAGAAAGGTTGAATCAGAGGAGGGATTTCTCACTCCAACTCACCACCGATACCTCCCTCTAGGCTCCGCACCATTGATTTCATTGTTGTAAAATCTCCTTCTTTGTCATACGAAGGCATAAGTGCGCCTAGGCGGTCATTGCACTAGGCATAGGCAAAATTCAGATTAGCGCCTACCGCTTTTTTAGTACCTTGACTCACAGCCCACCAAGGAAGATCGAACCAATGGTTTGACACACTGTGCGGATTATTCGTTCCGTGGGAGGCGACATTCCTGACGTTCCAGAGATCTTGAAGCTCCACAATTTTAGTCCTTACTAAACATTGTGTGAGTGTGTACGTGTGGTGGTAATGTGCACTTGTGTGTGCATCTACGTTGTAATCGGTGTTTCTCAAAAAAGAGATAGTCCTACTACCTTAAGCAGGAAAAGCACACATGTGATATGAAGACTGCATGTATATGTCTTTAATGCGCGTCAATTTGAAAATCAAACGCAAACTATTTGAGTGATGGGTCATAAGGTTTTGCTTAAAAACACACTTTCGTTATTGCTTGTGGTTTCATAATAGTTCATACATAAATTTATGCCATCCAAACTTTTTTATAATAAAAAAGAAATTCAAAAAGTTTTAAAATAAAATGTTAATATTAAAATTTCAAATTATAAAATTTGAGTTCAGAGAATTGTCAGTTGGAAAAAGTAGAAACATAACTTTTGTTCTTAATATTTAAATTTTGAGAATATGAAATTTATGAATAGTGACTGTGCATGTTTGTTGATGCATAGTTTGGTAAGATAAATGGTTGAACAAGAAGGGGGGCACAAAGGTACAAAATAACTAATTTTAACTAATGTAAAGCAATCAAGTTGTAACATAAATAATTATTCATTAATAGACTTCATGGTAAAACCCTTTTTGGCATACAGTCGATACCAACAACAATGACTGTGAGATTACATTTTAAATACATAACATTTCTATCATCTGGGTGGGCAAACACAATGTAAAATAAATAAATAAAGTAACAACCAGTTCGTGACATATCAAGACTATGCCATGAAATCCTGAGGTGTGATTCCATCAGCGGATACGCGAAAGTCGATAGGATCTGCTTACATGTACGTGCCCAGGATCTCCATATATTAGTTAACTTGTAACTGGGAGCTAACAAAATATCACTGCAGCAAAAATGTAACAATATGTGAAATGGTTAATAGATATGACACTTATGAAGTCATCTGCCGAGCTTCAACTATATTGTGTGCCTATTCAGTTTTTCTTCTATTTCATGGAATTAATACATGTAATGAGCAGATATAAACTCTAAATAACGGTTCTAGCTGATCATCTATATGTCGTTTGTGGACTAAGTAAAGATGTGGCAGGAGCAAAAATGGAAGTTGACTATCCCATAGCTTAGTCAGATATTCCTAGGAATAACTTCTGTTTTACTCTCAAGTATATAGGTTTAGGGACACAATAGGTGTAGAGGATAACTGGTTCCCAGACTTGGTTAAGTACTAGACCGACAACAAATGCATGTAACATAGACCCATCACACAGTCCCTAACAGCTTACTTCGTGTTTACTAGTCTAGAGATGAGTAAAGTTGGTTGCAATGTCAGCTTTGGACCTTTGCTATGTGTGGTAAATTAATCGACCGATAAGGAGATGGAGGATATAGAGTGAATAACAATTGGGTACATTAACCATTATCCAAGCCAGGTCCAGAAAATCAGCAGGCTACATCATTAGCTGCATTTTAGTGTATGCATATGATGCCTGCTCGATCAATGCTGATTTTGTGCCTGAGTCGGTTATATGCAGGCTATAGATGCCTTTTTAGAATATTGTAAGATCTCAAGATTTTAGTTCCGATCCTAAACCATGGAATATCGTGGTGCATCAGCGTGCTGTCTCATAGGCTCTAGACAAAGCATAGCAGTTACTTGTCCCCGTGAGCTACTTACACGTGCCTATCGTGCAGAGGTAGGCAAAGCAAACCAAATGACAAGGAACGACAGAGAAGAAAATAAAGCCGGCAAGCTGATTAATGCATGGTTACCAATTCGGATGTTCATGGCAGTCTTTTCCAAATGAACAGCACGGAAACAGGGGGTAATTTTATTCAGTACTTTGATCTGAGATACTGAAGTTCACAGTGCAGCACTAGCTTAAAATATGACTGCATCTATTTATATCTTAATTTAACCGACTTTGATTCTTACGCGTTTTCATCACGTGACCAGCTTGATACATGCATGCAAGCACCTATGCGTCAAACAACCAAGGAAGCTGAAGGCAAGCCAATAAAACTTTACCTACCCTTTTCCGTTAGCGCCGCGCATACCCTCAGTGGTACAAAACTAGAAGAGATTTAGCACCTCAACATACAGGTCCTTGCCATGAAGCTGATGGTAATGTTGCTTTCTCAGCAAGATTACAAACAACAATGGATATTTACTAAGCAGTAGCGATCCTTTCGACGGTATGATATATCTTACTTTCAAACCGTGGAGGCTGATCTAGTACTTCGCCGGACGGGAATTATGGAAGCAGACAGAAATGGGAAACTAACATGGACTAATGTCAGCGTGTGGGGGAAATAATGGATCAGTCTCCGTGATTTTATTGACTCCTTTACTTTGTGAAGAACCATCGCCAGCACCAACAAATATCACGGGCAAACTATGATGGTTGAAAGGGAAAAAAAGTATGCAGAAACTCTTATTTTATATTTGGAAGGAACTCTCTACTTTTTGGTTTGACTCTTCATGCCGGAGTTTCATCAGCTGCAGACAATAATTCCATGCATTAGCTGGTCAAGTGACTCTTTTCTCCGTGCGTGCCGTAGACAGTATCTCCAAGCAAGCAACAAAATGGCCGCATGGCCGGAGCTACCAATATGAATGATAGCAAAGCCTGTGGAACAAAATTCTGGAAATAACCGAGATGACGAGGTTCATTTGTTCAAGCAGCAAGCACAGATTTGATGAGGGAGACAACAAGGTATTGTAGCGTCTAGTTAAGATTCCAAATGAGAAGATGAGGTAGCATGCTAAATTGCGATGTCACTTTAACCCCATATATATTTTTTACGTAAATAACCCCAGCCCGATAGAAATATCAAGAGATTACCGTCTTAACCCTTCCACTTGGGGTGCCACCTGGCGCGTCCGGGACCAAAATTTCCCCAAAATACGCCAGATGAACAACCTTCCCGGCGCTGGCGGAGCATAAAATAGCCCGGCGACCCATCGCGCGCCCTACCCCCCATAGCAAGAAATCGCGGCCGCGCGGAACGCCAGCTCCCCCATCGATTCTAAGAAAAACGAGCAAGCAGCCACGGCAGAATCGGATGGAAACAGCGAGCGGGCAAACGGAATCTACCAAACAGCAGCGCGGAAATGGAGAAGGCCCTGGGGTCGGCCGGGTCGTCTCGCAGGACCAGAACGGTGCGCAGGAAATCGAAATCGAAGGCGTATTGGGGAACAAATTTAAAGACAGCGCGTGCTTAATTTACCATGACTCGGTGGAGAACTCGTAGAGCTTTCCCTTGGTGGAGAAGATGATGAGGCCGACCTCGGCGTCGCAGAGCACGGAGATCTCGTGCGCCTTCTTGAGAAGCCCCGAGCGGCGCTTGGAGAAGGTCACCTGCCGGTTGATCTTGTTCTCGATCCGCTTCAGCTGCACCTTCCCCCGCCCCATCTCCGCTCGAGAACCGGGCCAACCCTACGCCCCTACCCTCCAACACCGGCAGACCGCGACGGCTACTCCGACTGGCGCAGGCGGAGGCGAGGCGGCGGAGCCATGGCTATCAGGTGGTTGGGTGAGGACGTGAGGTGGAAGAGAGGGGAGGAGAGGGAGGATGGCCAGGCCAAAACGAGGATTCCGGCAGGGGGGGAGGGGTTTTTAAAGGGATCTGGCCCCGAGCGCGGTATGCAATACCGGGCTGGTCTGGAGCATAGCCGCTGTCGAGCACCCATTGGCCCGGCCCGCTTTGCGGGGCCCCGCGGTCCTGCAGCCACACGATGCCCCACCCCGGGCCCGCCGGGCGTCGACGTGTCGAACGATCGTACGTGCGAATCTCCGGATTTTGCTTTCCCCAAATCATTTTCTCTCGCCACAAACACACACCACAGAGCAAAAAAACGAGCAGAATTTTTCCTTTCATAATATTCTACTGTACCCTAGGCCTAGAAGAAGGGAAAGAGCGGAGTTTTTTCCTTTAGCATTTCCCCTTTTTCCTTTTTTGCTTTTTTGGTCACGGCGGCGGGGGACGAAAGAGGAAATGCTGGCTGGCTAGGTCAGGGCGGCTCGGGACGGGTCTGGTCCGTCCAGGGAAAATGGAATGGAAGGCGAGATGACGGTCGTGGGGGCGGGCTGCCACAGTAGGAGGCTGCTTTCCTTGCTCGTTTTGGGCCGTCTCGCTTGCCCCGTTTGGGAGAGGTCGGCTGCGGGGCGGGCGGGCGGGCGCGGCGGGCCGCCCGGGTCGTTCGCGCCGCGTCCATCGCGTGCATGATTGAGCTCGACTTTTGACATGAAGCGTTGCTGCAGTGCAGTGCAGTGCGCTGCGTGGATCGAGTGGGCGCAGTGTATTTTTCTACTTTTGGCCGACCCGCGTATTTCTGGAATGAATGAAAGATACTCGCCAGTCCTAGGAGTAGGGGTACTGTGCTGTGCGTGTATACCGACCAGCCATCCATCAATACACTGTTGTCAAGCGTCTCATCGTGGGGTTGTCCGACTCCATACCAATTATATACTCCAGAAACGGTTCGTGTATTTTTCTTTATATATGGAGAAAAAAAACGCTCTATAGAGAACTGTACATGTGATATATCTGCACTTCTGCAGTGACCAGCTTCTCGATATATAGGGCCGTGCTGAAACCATTTGCAGCACTAAAATGGTTTCTACACTAAAACTGCTGTTGGGCCTAATACAACATTTAGTAGTTCAGTGTTCGAACATGTGAATCATGACCCATTTTATTGTTTATGCTGACAAACAAACAAAAAGCCCATATGCATGCTTGTCAAATGACCACTTTGCAACCATTTGACGTAACCAGCAACCACCAATGCTGCTCTTTATGACCGATGATGACAGATCTCGCAATCATATCATGCACATCTCTTTTCTTTTCTTTTTTTTCTTTGCGAATATGCCATGTACATCTTTTTTCTTTTTCTTTATTCTGCAAATATGCCATGTATTAACTCCGTCCCAAATTATTTATCTTAAATTTGTCTAGATACCGAGATATCTGACACTAAAACGTGTCTAGATACTATATCTAGACAAATGTAAGACAACTAATTCGAAACATCTCTGCGATTGAAAATATACAAACATGAACATCCCTAAACTTTGCGGTGTATCTCCAAGAATGCAAAAACGATATTTAGATCACACGCACTCTATATAATCAGTACTTCAAAAACATGTCAGAGTTCCAAATGCTTGTGAAAAAATGGGTATAGCAGATATAATAATTTCTAGTTTACTGTAACTTAAAGACTATTTCAGGCAAGTTAGTGATTACTTGGTTTCCTATTTATCTTTAGCTCACGGATGATCCTTTTATTGCAAATTTTGCATGTATATGGACCTACCTATTATCTAAGCTAAGTTTTCACAATTGTTAGAACATCTCCACTCGTTCGGCCCCAGGGGCTAGAAATAGCGCCGTCCTGGGGGAGTGCCGGCTGAAATATCGGCCTGGGGGCGATCTGGTTCCCAGCCGCCGTCCTCAGGTGCTGATTTTGGCCCACTTTCAGTGCAAATTGGCCCACTTTTCAGCTCATTTTGGGCGCAAATTACCCCACTATCGGCGCAAATTGGCCCACTATCGGCCGGTATTCGGCGTGCTTCGGCACAAATTCAACAGAAGCTATTTTTTTGTCACGTAGTTCATCATAGAAAATCAATAGAAATCAAATAGTTCAATACAAATAATATAGTTCAACAAATAAAACATCACACGTCGAACTAGGCGTTACCCTTGAGCCTCCATAGGTGCTCCACTAGATCCTGCTGCAGTTGTTGATGCACCAGTGGGTCTCGGATCTCCTGACGCATATTGAGGAAG'
# Define the tissues/cell-types to predict expression for.
# [I don't have this data of wheat, so i used the data in tour of alphagenome ]
ontology_terms = [
'UBERON:0001159', # Colon - Sigmoid.
'UBERON:0001155', # Colon - Transverse.
]
output = dna_model.predict_sequence(
sequence=sequence,
requested_outputs=[
dna_client.OutputType.RNA_SEQ,
dna_client.OutputType.SPLICE_SITES,
dna_client.OutputType.SPLICE_SITE_USAGE,
dna_client.OutputType.SPLICE_JUNCTIONS,
],
ontology_terms=ontology_terms
)
exons = [
genome.Interval('chr22', 17411454, 17411932, '-'),
genome.Interval('chr22', 17412098, 17412210, '-'),
genome.Interval('chr22', 17412303, 17412344, '-'),
genome.Interval('chr22', 17412496, 17412537, '-'),
genome.Interval('chr22', 17412731, 17412830, '-'),
genome.Interval('chr22', 17412920, 17412984, '-'),
genome.Interval('chr22', 17414459, 17414537, '-'),
genome.Interval('chr22', 17423056, 17423416, '-'),
]
# build a Transcript
cds = [
genome.Interval('chr22', 17411824, 17411932, '-'),
genome.Interval('chr22', 17412098, 17412210, '-'),
genome.Interval('chr22', 17412303, 17412344, '-'),
genome.Interval('chr22', 17412496, 17412537, '-'),
genome.Interval('chr22', 17412731, 17412830, '-'),
genome.Interval('chr22', 17412920, 17412984, '-'),
genome.Interval('chr22', 17414459, 17414537, '-'),
genome.Interval('chr22', 17423056, 17423240, '-'),
]
start_codon = [genome.Interval('chr22', 17423240, 17423243, '-')]
stop_codon = [genome.Interval('chr22', 17411824, 17411827, '-')]
my_transcript = transcript.Transcript(
exons=exons,
cds=cds,
start_codon=start_codon,
stop_codon=stop_codon,
transcript_id='TraesCS5A02G391700.1',
gene_id='TraesCS5A02G391700',
protein_id='TraesCS5A02G391700.1',
uniprot_id=None,
info={'gene_name': 'VRN-A1'}
)
#chr5A:587409243-587425627
interval = genome.Interval('chr22', 17_409_243, 17_425_627).resize(
dna_client.SEQUENCE_LENGTH_16KB
)
import dataclasses
new_rna_seq = dataclasses.replace(output.rna_seq, interval=interval)
new_splice_sites = dataclasses.replace(output.splice_sites, interval=interval)
new_splice_site_usage = dataclasses.replace(output.splice_site_usage, interval=interval)
new_output = dataclasses.replace(output, rna_seq=new_rna_seq, splice_sites=new_splice_sites, splice_site_usage=new_splice_site_usage)
# Build plot.
plot = plot_components.plot(
[
plot_components.TranscriptAnnotation([my_transcript]),
plot_components.Tracks(
tdata=new_output.rna_seq,
ylabel_template='RNA_SEQ: {biosample_name} ({strand})\n{name}',
),
plot_components.TranscriptAnnotation([my_transcript]),
plot_components.Tracks(
tdata=new_output.splice_sites,
ylabel_template='SPLICE_SITES'
),
plot_components.TranscriptAnnotation([my_transcript]),
plot_components.Tracks(
tdata=new_output.splice_site_usage,
ylabel_template='SPLICE_SITE_USAGE'
)
],
interval=interval,
title='Predicted RNA Expression (RNA_SEQ) for the sequence',
)