Hi there! In my project, I have some a priori genes of interest, and a collection of variants aggregated across a whole genome sequencing cohort. My goal is to see if any of the affected individuals have variants predicted to lower expression of these a priori genes. So for each gene of interest, I ran score_variant with the RNA_SEQ scorer on the ~10k variants within the 1 Mb window surrounding each gene.
I plotted raw scores by genomic position, expecting to perhaps see some peaks indicating promoters and enhancers. What I saw instead looks like some sort of artefact.
A gene on the minus strand (black lines indicate exons):
It’s not letting me include more than one image since I’m a new user of the forum, but this pattern is happening over and over, across a couple dozen genes. It’s always to the left of the gene; if the gene is on the plus strand, the high impact variants are upstream, and if the gene is on the minus strand, the high impact variants are downstream. I can provide my code if needed to help reproduce the issue.
