Hello AlphaGenome community,
I am currently using the library to model genetic variants and have a question regarding the coordinate system and strand orientation used in the genome.Variant function.
I am working with a gene located on the reverse strand (negative strand). When defining a multi-base substitution (e.g., 20bp), I want to ensure I’m following the library’s expected conventions. Could you please clarify the following:
-
Position Indexing: Does the position parameter always default to the forward strand coordinate system, specifically requiring the minimum numeric index of the affected range, regardless of the gene’s actual orientation?.
-
Reference Bases: For a gene on the negative strand, should the reference_bases be the sequence as it appears on the forward strand (i.e., the reverse complement of the negative strand sequence)?
Thank you very much for your time and help.
Hi There,
Thanks for reaching out.
Yes, the position parameter in genome.Variant relates to the forward strand using 1-based indexing. For a 20bp substitution, the position is the coordinate of the first base of that block on the forward strand. Note that AlphaGenome uses 0-based indexing for its output tracks and intervals (like BED files or Python), whereas the position attribute for specifying variants is 1-based to maintain compatibility with common public variant formats (see documentation here). Both the referenceand alternative sequences of genome.Variant also relate to the forward strand.
You can still create a genome.Interval on the negative strand - the internal logic of AlphaGenome will flip the interval to the positive strand, insert the genome.Variant reference/alternative bases, then reverse complement back to the negative strand to pass into downstream functions like predict_variant or score_variant
Kind regards,
Tumi