Hi,
I am evaluating the effects of a set of variants across all available transcription factors via chip_tf scorer. For each variant, I obtained raw_score values across multiple transcription factors and biosamples.
I would like to ask whether these raw_score values are directly comparable:
across different biosamples,
across different transcription factors, and
across different variants.
Or are there recommended normalization or comparison strategies for these scenarios?
Best regards,
Chao Li
Hi There!
Thanks for reaching out.
You should avoid using the raw_score for comparisons across biosamples or modalities and instead use the quantile_score which was specifically implemented to make raw variant scores more interpretable and comparable across different assays and genomic contexts.
The quantile_score is calibrated by calculating the raw effect scores for a reference set of ~350,000 common SNPs to establish an empirical background distribution for every biosample and modality. Your input variant’s raw score is then mapped to its percentile rank within that distribution.
A quantile_score of 0.9 means that the predicted change for your variant is greater than 90% of the effects predicted for the reference set. This can be compared against quantile_scoreresults from other modalities and biosamples.
Kind regards,
Tumi