MFASS evaluation clarification

Hi all,

In the methods section of the AlphaGenome pre-print within the subsection MFASS (Massively Parallel Assay of Splicing Sequences) on page 47 it is stated that “Variants with mislabelled strand information were removed”. Variants are usually recorded in relation to the forward strand, thus no strand information is needed. However the exons tested in MFASS do have strand information. Does this mean that the variants provided by MFASS were not in typical VCF style format and instead already oriented for the respective strand?

Additionally it seems that these variants were not removed in the MMsplice analysis that is linked in the same methods section of the pre-print. Could some clarification be provided on which variants were mislabeled.

Thanks,
Matthew

1 Like

Hi Matthew,

The MFASS data is not on vcf format, but a table with variant and the exon tested, including the strand. We remove the variant if the sequence we get based on the genomic coordinates provided by MFASS and the reference fasta file differ from the nat_seqcolumn they provided. The predictions are made with the reference sequences generated from the standard reference fasta and MFASS coordinates. This is different from MMSplice, which directly used the sequence from nat_seqcolumn. Let me know if you have further questions.

1 Like