Greetings dear AlphaGenome community,
First of all, I want you all to know that I’m currently an undergraduate biology student, so I hope this question isn’t “dumb” due to the possibility of lack of knowledge in general or bioinformatics…
I’m trying to test the effects of a single variant (SNP) from ClinVar using AlphaGenome, but I noticed something confusing. Usually, to locate a gene, I use the web “Genome Browser”. For example, if you search “HOXA13”, the location is: chr7: 27_194_364 - 27_200_091.
However, if you put the following code in AlphaGenome, you will find another result…
interval = gene_annotation.get_gene_interval(gtf, gene_symbol='HOXA13')
interval
OUTPUT RESULT: Interval(chromosome='chr7', start=27193502, end=27200091, strand='-', name='HOXA13')
The location given from AlphaGenome is a bit different, so I was wondering if I need to write the location as for example in the following script bellow, which location should I use? My SNP variant in ClinVar is located at: chr7: 27_102_230 (GRCh38). Should I use that exact position and assume AlphaGenome will evaluate the exact locus I want?
variant = genome.Variant(
chromosome='chr7',
position=27102230,
reference_bases='A', # Can differ from the true reference genome base.
alternate_bases='G',
)
One more question: HOXA13 is transcribed from the minus strand. If ClinVar reports the variant as A>G, should I enter it exactly as A>G, or should I use the plus-strand equivalent (T>C) when creating the Variant object?
Thank you all for your time and help, and again, sorry if my question is a bit basic.