How could AlphaGenome predict Transcriptome from a Genome ❓

jiehuang001 · September 6, 2025, 11:47pm

Hi, guys:

Based on the central dogma, it seems quite easy and straightforward to infer the mRNA sequence and therefore amino acid sequence from a DNA sequence. However, due to alternative splicing, this task becomes uncertain.

I have been using this https://github.com/ikmb/**vcf2prot** tool to infer amino acid sequence from DNA variation data (in VCF format). For example, below is my script to infer the ABO protein amino acid sequence for the ~3,000 individuals with high coverage sequencing data from the 1000 genomes project.

chr=9; pos1=133233278; pos2=133276024; prot=ABO # GRCh38 position for ABO gene
loc=$chr:$pos1-$pos2

bcftools view 1kg/high_coverage/chr$chr.vcf.gz -Ou --regions chr$loc | bcftools annotate -Oz -o $prot.vcf.gz --rename-chrs chr.prefix; tabix -f $prot.vcf.gz

bcftools csq $prot.vcf -Ov -o $prot.csq.vcf.gz -f Homo_sapiens.GRCh38.dna.chromosome.$chr.fa -g Homo_sapiens.GRCh38.114.chromosome.$chr.gff3.gz; tabix -f $prot.csq.vcf.gz

vcf2prot -f $prot.csq.vcf -r software/vcf2prot/examples/reference_sequences.fasta -v -g st -o out

The output .fasta files for the first two individuals are posted at my Github foler 001/alphag at master · jielab/001 · GitHub .

Somehow, the second person has an empty HG00097.fasta file. The VCF2PROT software developer said that this is because this person has NO mutations in any of the missense SNPs in the ABO gene, compared to the reference genome. Therefore, his/her amino acid sequence for ABO is identical to that of the reference. But in reality, how could this be even possible, since this HG00097 person is certainly not a twin of the human genome reference?

As shown in HG00096.fasta, the first person has 4 transcripts in two sets. One set has 353 amino acids, and the other has 373 amino acids. So far, I have no way to be sure whether it is 353 or 373. Should I do a RNAseq to confirm?

Will AlphaGenome be able to perform this task and potentially output a more definite result?

Thank you & best regards,

Jie

Jun_Cheng · September 18, 2025, 7:02am

Hi,

We don’t predict which transcript isoforms will be produced. Please read what AlphaGenome predicts GitHub - google-deepmind/alphagenome: This API provides programmatic access to the AlphaGenome model developed by Google DeepMind. or read the paper for more detailed information.

jiehuang001 · September 20, 2025, 3:05am

Thanks, Jun!

I managed to use Chat-GPT to write the following code 001/alphag/alphaG.py at master · jielab/001 · GitHub

It actually works, to convert an input DNA .fa file to a AA .fa file.

I hope other users might find this code useful, once you see no problem with it.

Best regards,

Jie

Topic		Replies	Views
DNA methylation data for gene expression prediction Feedback & Feature Requests	2	270	March 2, 2026
How can I add the pig genome Help & Support	2	1485	July 14, 2025
Use alphagenome prediect wheat genome Feedback & Feature Requests testing	2	1654	February 5, 2026
Uploading sequences to AlphaGenome Help & Support	5	1973	July 17, 2025
Can AlphaGenome cover the function of AlphaFold? Help & Support testing	1	655	November 16, 2025

How could AlphaGenome predict Transcriptome from a Genome ❓

Related topics