Hi all,
We are working on variant interpretation for RNU4-2, a very small spliceosomal snRNA where pathogenic variants cluster within very short structural elements. Our current AlphaGenome-based model uses only five features and fails to correctly classify known ClinVar-pathogenic variants in this gene.
From the scoring script, the features we currently derive are:
-
center_mask_atac (CenterMaskScorer on OutputType.ATAC)
-
H3K4me3 (from OutputType.CHIP_HISTONE, histone_mark = “H3K4me3”)
-
H3K27ac (from OutputType.CHIP_HISTONE, histone_mark = “H3K27ac”)
-
gene_lfc (GeneMaskLFCScorer on OutputType.RNA_SEQ)
-
splice (CenterMaskScorer on OutputType.SPLICE_SITES)
For a compact ncRNA like RNU4-2, are these modalities and scorers expected to carry meaningful signal?
And would you recommend additional AlphaGenome OutputTypes or scorer configurations that might better capture ncRNA-specific constraints or highly local structural effects?
Thanks!