Raw score calculation mismatch

Sanjana_Tule · April 22, 2026, 3:02am

Hi, I want to calculate raw score for ATAC using different window width. The scorer function in AlphaGenome uses 501 width, so I thought to write my own calculation. But I want to check my logic (using 501 width) against what the scorer returns but it does not match.

Using score variant:

# @title Score variant { run: "auto" }
organism = 'human'  # @param ["human", "mouse"] {type:"string"}
organism_map = {
    'human': dna_client.Organism.HOMO_SAPIENS,
    'mouse': dna_client.Organism.MUS_MUSCULUS,
}
organism = organism_map[organism]

# @markdown Specify the variant:
variant_chromosome = 'chr22'  # @param { type:"string" }
variant_position = 36201698  # @param { type:"integer" }
variant_reference_bases = 'A'  # @param { type:"string" }
variant_alternate_bases = 'C'  # @param { type:"string" }

variant = genome.Variant(
    chromosome=variant_chromosome,
    position=variant_position,
    reference_bases=variant_reference_bases,
    alternate_bases=variant_alternate_bases,
)

# @markdown Specify length of sequence around variant to predict:
sequence_length = '1MB'  # @param ["2KB", "16KB", "100KB", "500KB", "1MB"] { type:"string" }
sequence_length = dna_client.SUPPORTED_SEQUENCE_LENGTHS[
    f'SEQUENCE_LENGTH_{sequence_length}'
]

interval = variant.reference_interval.resize(sequence_length)

# @markdown Additional settings:
variant_scores = dna_model.score_variant(
    interval=interval,
    variant=variant,
    variant_scorers=list(variant_scorers.RECOMMENDED_VARIANT_SCORERS.values()),
)
df_scores = variant_scorers.tidy_scores(variant_scores)
row_series = df_scores.iloc[[0]]
row_df = row_series.reset_index()
row_df[['variant_id','raw_score','ontology_curie','output_type']]

	variant_id	raw_score	ontology_curie	output_type
0	chr22:36201698:A>C	-0.100227	CL:0000084	ATAC

This is gives raw_score = -0.100227

Using Predict Variant

# @markdown Additional settings:
sequence_length = '1MB'  # @param ["2KB", "16KB", "100KB", "500KB", "1MB"] { type:"string" }
sequence_length = dna_client.SUPPORTED_SEQUENCE_LENGTHS[
    f'SEQUENCE_LENGTH_{sequence_length}'
]

interval = variant.reference_interval.resize(sequence_length)

# @markdown Specify the variant:
variant_chromosome = 'chr22'  # @param { type:"string" }
variant_position = 36201698  # @param { type:"integer" }
variant_reference_bases = 'A'  # @param { type:"string" }
variant_alternate_bases = 'C'  # @param { type:"string" }

variant = genome.Variant(
    chromosome=variant_chromosome,
    position=variant_position,
    reference_bases=variant_reference_bases,
    alternate_bases=variant_alternate_bases,
)

variant_output = dna_model.predict_variant(
    interval=interval,
    variant=variant,
    requested_outputs = [dna_client.OutputType.ATAC],
    ontology_terms=['CL:0000084']
)
import numpy as np
variant_output = dna_model.predict_variant(
    interval=interval,
    variant=variant,
    requested_outputs = [dna_client.OutputType.ATAC],
    ontology_terms=['CL:0000084']
)

ref = variant_output.reference.atac.values
alt = variant_output.alternate.atac.values

snp_pos = 36201698
half = 501 // 2  # 250
idx = snp_pos - interval.start

start = max(0, idx - half)
end   = min(len(ref), idx + half +1)

start,end
ref_window = ref[start:end]
alt_window = alt[start:end]
# Sum over spatial axis (axis=0 if multi-track)
ref_sum = np.sum(ref_window, axis=0)
alt_sum = np.sum(alt_window, axis=0)

score = np.log2((alt_sum)) - np.log2((ref_sum))
print(score)

This is gives raw_score = -0.10555601

Have I done something incorrectly when I calculate this manually in the second step ? I want to make sure my calculation matches. Is there an eps value used?

Thanks,

Sanjana

tward · April 22, 2026, 9:34am

Hi @Sanjana_Tule,

You can see how our variant scoring works in our open-source model.

For ATAC center-masking, take a look at alphagenome_research/src/alphagenome_research/model/variant_scoring/center_mask.py at main · google-deepmind/alphagenome_research · GitHub where we do np.log2(1+alt) - np.log2(1+ref).

Adding the +1 to each log2 gets to within 1e-4 of the API

Hope that helps!

Sanjana_Tule · April 22, 2026, 12:19pm

Thanks that works now.

Topic		Replies	Views
Can we change the width in the scorer function? Help & Support	4	50	April 29, 2026
Reproducing TraitGym results Help & Support	7	1882	August 20, 2025
REF and ALT tracks look identical even with high raw_score and quantile_score Help & Support	1	90	April 22, 2026
Predicted ALT expression and the REF expression separately Help & Support	1	134	February 24, 2026
How to get signed (non-absolute) raw scores for splicing variants? Help & Support	2	160	January 27, 2026

Raw score calculation mismatch

Related topics