Validity of raw vs. quantile scores for promoter variant effects when gene body extends beyond 16 kb sequence window

Clementine_Chen · May 20, 2026, 11:05am

We are using the batch variant scoring function to predict the effects of promoter variants. For our analyses, we set sequence_length = 16 kb, so each variant is evaluated within a 16 kb sequence window. Our goal is to estimate the effect of a promoter variant on the closest gene, where the gene’s promoter and TSS are within the 16 kb window. We are mainly using the RNA-seq predictions for this purpose.

However, in some cases, the target gene’s full gene body extends beyond the 16 kb input window. Given this setup, we wanted to ask:

Are the raw RNA-seq scores or the quantile scores still valid for interpreting the variant’s effect on the closest gene, even if the full gene body is not contained within the 16 kb sequence window?

More specifically, should we be concerned that the RNA-seq prediction may be incomplete or biased when the promoter/TSS is inside the window but the gene body extends outside of it? In this case, is one score type, the raw score or the quantile score, more appropriate or reliable for interpretation?

Thank you very much for your guidance.

Nicolene_Pillay · June 2, 2026, 4:22pm

Hi there,

Thank you for reaching out. Regarding your questions on sequence length and scoring interpretation, please see the guidance below:

I recommend using the full 1Mb context window for your analyses. Only bases within the context window can be predicted; to predict transcription for their gene effectively, the full gene should be included in the context window if possible. Also, model performance is best when using the full 1MB.

The raw score is a measure of the different between the reference and alternative predictions. Its calculated using various scorer methods, that are specific to the modality of interest. The quantile score is a standardized measure of predicted impact across all modalities and biosamples, calculated by calibrating raw variant scores against a background distribution of 348k common human SNPs. The quantile score represents a variant’s specific percentile rank relative to this common variant baseline and can be used to compare variant effect between biosamples and modalities.

For example, if you are interested in how a variant effects transcription, you may use the RNA recommended scorer, which will yield a variant raw and quantile score for each gene within your context window across biosamples. If you choose to include other scoring methods (E.g for ATAC), the quantile score will allow for comparison between the modalities.

Kind regards,

Nicolene

Topic		Replies	Views
Question about the difference between RNA-SEQ Quantile Score and predict_variant visualization Help & Support testing	1	971	September 17, 2025
Can the raw_score values be compared across different cell/sample types, transcription factors, and variants? Help & Support testing	2	97	May 5, 2026
RNA-seq effects tend to be higher for variants to the left of the gene Help & Support	2	89	May 21, 2026
Raw score calculation mismatch Help & Support testing	2	80	April 22, 2026
Variant scorer quantiles not implemented in ag_research Help & Support	4	159	May 15, 2026

Validity of raw vs. quantile scores for promoter variant effects when gene body extends beyond 16 kb sequence window

Related topics