Seeking Guidance on Relevant AlphaGenome Features for RNU4-2 Variant Interpretation

Hi all,

We are working on variant interpretation for RNU4-2, a very small spliceosomal snRNA where pathogenic variants cluster within very short structural elements. Our current AlphaGenome-based model uses only five features and fails to correctly classify known ClinVar-pathogenic variants in this gene.

From the scoring script, the features we currently derive are:

  1. center_mask_atac (CenterMaskScorer on OutputType.ATAC)

  2. H3K4me3 (from OutputType.CHIP_HISTONE, histone_mark = “H3K4me3”)

  3. H3K27ac (from OutputType.CHIP_HISTONE, histone_mark = “H3K27ac”)

  4. gene_lfc (GeneMaskLFCScorer on OutputType.RNA_SEQ)

  5. splice (CenterMaskScorer on OutputType.SPLICE_SITES)

For a compact ncRNA like RNU4-2, are these modalities and scorers expected to carry meaningful signal?
And would you recommend additional AlphaGenome OutputTypes or scorer configurations that might better capture ncRNA-specific constraints or highly local structural effects?

Thanks!

Do you have any hypothesis on the mechanisms of the variants? AG can predict expression, splicing, chromatin, it can not predict missense variant or variant affect RNA structure.