GeneMaskLFCScorer AggregationType

Joshua_Park · October 10, 2025, 4:21pm

[CenterMaskScorer(requested_output=ATAC, width=501, aggregation_type=DIFF_LOG2_SUM), CenterMaskScorer(requested_output=DNASE,width=501,aggregation_type=DIFF_LOG2_SUM),
CenterMaskScorer(requested_output=CHIP_TF,width=501,aggregation_type=DIFF_LOG2_SUM),
CenterMaskScorer(requested_output=CHIP_HISTONE,width=2001,aggregation_type=DIFF_LOG2_SUM),
CenterMaskScorer(requested_output=CAGE, width=501,aggregation_type=DIFF_LOG2_SUM), CenterMaskScorer(requested_output=PROCAP,width=501,aggregation_type=DIFF_LOG2_SUM),
GeneMaskLFCScorer(requested_output=RNA_SEQ),
GeneMaskSplicingScorer(requested_output=SPLICE_SITES, width=None),
GeneMaskSplicingScorer(requested_output=SPLICE_SITE_USAGE, width=None),
SpliceJunctionScorer(),
PolyadenylationScorer()]

These are the recommended scorers provided. Notably the last five scorers don’t have an aggregation_type provided. Do you have any recommendations on what aggregation_type is best? I am curious because if I look at the contingency table of gene strand and track strand, it’s not immediately obvious what is going on within each track. I would also be curious to know what the track_strand = ‘.’ indicates?

tward · October 11, 2025, 6:31pm

Hi @Joshua_Park, welcome to the community!

So the GeneMaskLFCScorer, SpliceJunctionScorer and PolyadenylationScorer don’t have any configurable aggregation which is why they’re not defined: we only have the ones we used in the paper available (see supplementary figures 12 and 13 for schematics on what aggregation is applied for each scorer).

As for the contingency table, the . strand indicates unstranded, and if you’re using the tidy_scores function, this by default removes scores with mis-matched genes and track strands, which would be why you have zeros for the mis-matched +/- entries.

Hope that helps!

Joshua_Park · October 15, 2025, 9:40am

Hi @tward,

Thank you so much for your answer. I understand why aggregation methods aren’t provided. But, if I did still want to aggregate, do you have a suggested approach? Where I’m having trouble is extending the batch variant scoring tutorial: Batch variant scoring — AlphaGenome

After scoring multiple variants, I’m not exactly sure how one might compare them to each other. And so my initial thought was to try to get a single scalar value for each output_type for every variant. Which is why I am interested in aggregating all gene scores for every track into a single track score, and ultimately compute a single ‘output_type’ score. Is this something that’s been attempted/done previously?

tward · October 18, 2025, 5:23pm

So I guess you could take the ABS_MAX over the gene axis to attain the largest score for a gene region, and then I guess do the same for the track axis? It’s not something we typically do, but should somewhat provide a single score per output type.

You might also want to consider using the quantile scores in order to better compare scores across tracks and variant scorers, depending on the kind of analysis you’re performing.

Hope this helps!

Topic		Replies	Views
Suggestion for a new functionality of variant scorer Feedback & Feature Requests	3	921	January 27, 2026
Reproducing TraitGym results Help & Support	7	1843	August 20, 2025
How should I summarize thousands of per-variant quantile scores into one prioritization score? Help & Support	1	886	November 6, 2025
How to combine gene expression raw scores? Help & Support	1	485	December 17, 2025
How to get signed (non-absolute) raw scores for splicing variants? Help & Support	2	123	January 27, 2026

GeneMaskLFCScorer AggregationType

Related topics