Seeking Guidance on Relevant AlphaGenome Features for RNU4-2 Variant Interpretation

Hi all,

We are working on variant interpretation for RNU4-2, a very small spliceosomal snRNA where pathogenic variants cluster within very short structural elements. Our current AlphaGenome-based model uses only five features and fails to correctly classify known ClinVar-pathogenic variants in this gene.

From the scoring script, the features we currently derive are:

  1. center_mask_atac (CenterMaskScorer on OutputType.ATAC)

  2. H3K4me3 (from OutputType.CHIP_HISTONE, histone_mark = “H3K4me3”)

  3. H3K27ac (from OutputType.CHIP_HISTONE, histone_mark = “H3K27ac”)

  4. gene_lfc (GeneMaskLFCScorer on OutputType.RNA_SEQ)

  5. splice (CenterMaskScorer on OutputType.SPLICE_SITES)

For a compact ncRNA like RNU4-2, are these modalities and scorers expected to carry meaningful signal?
And would you recommend additional AlphaGenome OutputTypes or scorer configurations that might better capture ncRNA-specific constraints or highly local structural effects?

Thanks!

Do you have any hypothesis on the mechanisms of the variants? AG can predict expression, splicing, chromatin, it can not predict missense variant or variant affect RNA structure.

Our working hypothesis is that RNU4-2 pathogenic variants act primarily through disruption of RNA structure and spliceosome assembly, rather than through classical regulatory mechanisms like transcriptional control.

We fully agree that AlphaGenome does not directly model RNA structural disruption or ncRNA-specific molecular mechanisms, and that this is likely why the current features fail for RNU4-2.

What we are trying to understand is whether any indirect or proxy signals within AlphaGenome (e.g. local chromatin accessibility, promoter-associated histone marks, RNA expression perturbation, or splicing-related outputs in nearby genes) are expected to carry any meaningful signal for such a gene — or whether RNU4-2 should be considered largely out of scope for AlphaGenome-based variant scoring.

If the latter is the case, that would already be a valuable conclusion for us in terms of defining model boundaries and motivating complementary approaches.