Uploading sequences to AlphaGenome

CRUZ_DE_CASAS_Paulin · July 1, 2025, 5:35pm

Hi everyone! I am wondering whether it is possible to upload a WT vs a modified sequence (e.g. mutation or missing 500-1000bp) to analyze the gene expression and open chromatin predictions?
I have been reading the tutorials, quick guide, etc., but I get the impression that I cannot upload the sequence itself?
Thank you!

tward · July 2, 2025, 10:09pm

Hello,

If you can form the modification as a Variant you can make a prediction via predict_variant.
Otherwise you should be able to use predict_sequence to make two predictions (the WT and modified sequence) and then compare and changes in expression. For example:

from alphagenome.models import dna_client

model = dna_client.create('my_key')

reference = 'A' * dna_client.SEQUENCE_LENGTH_2KB
alternate = 'G' * dna_client.SEQUENCE_LENGTH_2KB

ref_predictions = model.predict_sequence(
    reference,
    requested_outputs=[dna_client.OutputType.RNA_SEQ],
    ontology_terms=['UBERON:0001496'],
)

alt_predictions = model.predict_sequence(
    reference,
    requested_outputs=[dna_client.OutputType.RNA_SEQ],
    ontology_terms=['UBERON:0001496'],
)

You can then do e.g. ref_predictions.rna_seq - alt_predictions.rna_seq to compute the diff.

Hope that helps!

CRUZ_DE_CASAS_Paulin · July 16, 2025, 8:40pm

Hi! Thanks a lot for the response
I basically have a 500 000 bp sequence as WT and an altered one of 499 600 bp. Shall I just upload them as follows?:
WT_reference = “copy paste the 500 000 bp sequence” * dna_client.SEQUENCE_LENGTH_0.57KB
P_altered = “copy paste the 499 600 bp sequence” * dna_cliente.SEQUENCE_LENGTH_0.49MB

Thanks a lot again!

tward · July 16, 2025, 10:10pm

Not quite: you’ll need to pad the sequence strings to be a supported length, which you can do using Python string methods. So for your example:

WT_reference = "GATTACA"
WT_reference = WT_reference.center(dna_client.SEQUENCE_LENGTH_500KB, 'N')

P_altered = 'GATTACA'
P_altered = P_altered.center(dna_client.SEQUENCE_LENGTH_500KB, 'N')

Will pad the WT and P_altered references with N’s to be a supported 500KB length. You can then make the two predictions and compare using e.g. OverlaidTracks (see the Visualizing predictions — AlphaGenome for examples).

Riccardo_Fratti · July 17, 2025, 8:56am

Thanks again for your great work! Just a quick clarification:

If I format modifications as Variant, is it possible to combine multiple variants together for a single prediction (i.e., one forward pass within same context window)? Or in that case, is it better to use predict_sequence instead?

Also, when using Variant, does the model still reconstruct the full sequence internally? I assume yes — but if not, is there any known difference in either speed or output performance compared to predict_sequence?

Thank you again

tward · July 17, 2025, 1:05pm

If I format modifications as Variant , is it possible to combine multiple variants together for a single prediction (i.e., one forward pass within same context window)?

No, our variant pipeline can only support single position edits. For more complicated variants, like you said it’s best to use predict_sequence which gives you full control of the DNA to pass to the model.

Also, when using Variant , does the model still reconstruct the full sequence internally?

Yes, internally we read the sequence from the organism’s fasta file, apply the variant bases and then run the forward pass twice (for REF and ALT).

There is a subtle difference for splicing, where we use the union of predicted splice site positions and RNA-seq data for the specific interval to generate splice junction predictions, which we can’t do for predict_sequence where we don’t know the genomic region. Practically this doesn’t make a huge difference to splicing performance, but just to be aware.

Topic		Replies	Views
Support for Multi-Base Mutations in AlphaGenome Feedback & Feature Requests	2	219	September 8, 2025
Coordiate change when predicting long variants Feedback & Feature Requests	3	422	August 26, 2025
How can I add the pig genome Help & Support	2	389	July 14, 2025
Translocation variants/structural variants input Help & Support	1	335	July 30, 2025
Use alphagenome prediect wheat genome Feedback & Feature Requests testing	1	671	July 17, 2025

Uploading sequences to AlphaGenome

Related topics