Raw score calculation mismatch

Hi, I want to calculate raw score for ATAC using different window width. The scorer function in AlphaGenome uses 501 width, so I thought to write my own calculation. But I want to check my logic (using 501 width) against what the scorer returns but it does not match.

  1. Using score variant:
# @title Score variant { run: "auto" }
organism = 'human'  # @param ["human", "mouse"] {type:"string"}
organism_map = {
    'human': dna_client.Organism.HOMO_SAPIENS,
    'mouse': dna_client.Organism.MUS_MUSCULUS,
}
organism = organism_map[organism]

# @markdown Specify the variant:
variant_chromosome = 'chr22'  # @param { type:"string" }
variant_position = 36201698  # @param { type:"integer" }
variant_reference_bases = 'A'  # @param { type:"string" }
variant_alternate_bases = 'C'  # @param { type:"string" }

variant = genome.Variant(
    chromosome=variant_chromosome,
    position=variant_position,
    reference_bases=variant_reference_bases,
    alternate_bases=variant_alternate_bases,
)

# @markdown Specify length of sequence around variant to predict:
sequence_length = '1MB'  # @param ["2KB", "16KB", "100KB", "500KB", "1MB"] { type:"string" }
sequence_length = dna_client.SUPPORTED_SEQUENCE_LENGTHS[
    f'SEQUENCE_LENGTH_{sequence_length}'
]

interval = variant.reference_interval.resize(sequence_length)

# @markdown Additional settings:
variant_scores = dna_model.score_variant(
    interval=interval,
    variant=variant,
    variant_scorers=list(variant_scorers.RECOMMENDED_VARIANT_SCORERS.values()),
)
df_scores = variant_scorers.tidy_scores(variant_scores)
row_series = df_scores.iloc[[0]]
row_df = row_series.reset_index()
row_df[['variant_id','raw_score','ontology_curie','output_type']]
variant_id raw_score ontology_curie output_type
0 chr22:36201698:A>C -0.100227 CL:0000084 ATAC

This is gives raw_score = -0.100227

  1. Using Predict Variant
# @markdown Additional settings:
sequence_length = '1MB'  # @param ["2KB", "16KB", "100KB", "500KB", "1MB"] { type:"string" }
sequence_length = dna_client.SUPPORTED_SEQUENCE_LENGTHS[
    f'SEQUENCE_LENGTH_{sequence_length}'
]

interval = variant.reference_interval.resize(sequence_length)

# @markdown Specify the variant:
variant_chromosome = 'chr22'  # @param { type:"string" }
variant_position = 36201698  # @param { type:"integer" }
variant_reference_bases = 'A'  # @param { type:"string" }
variant_alternate_bases = 'C'  # @param { type:"string" }

variant = genome.Variant(
    chromosome=variant_chromosome,
    position=variant_position,
    reference_bases=variant_reference_bases,
    alternate_bases=variant_alternate_bases,
)

variant_output = dna_model.predict_variant(
    interval=interval,
    variant=variant,
    requested_outputs = [dna_client.OutputType.ATAC],
    ontology_terms=['CL:0000084']
)
import numpy as np
variant_output = dna_model.predict_variant(
    interval=interval,
    variant=variant,
    requested_outputs = [dna_client.OutputType.ATAC],
    ontology_terms=['CL:0000084']
)

ref = variant_output.reference.atac.values
alt = variant_output.alternate.atac.values

snp_pos = 36201698
half = 501 // 2  # 250
idx = snp_pos - interval.start

start = max(0, idx - half)
end   = min(len(ref), idx + half +1)

start,end
ref_window = ref[start:end]
alt_window = alt[start:end]
# Sum over spatial axis (axis=0 if multi-track)
ref_sum = np.sum(ref_window, axis=0)
alt_sum = np.sum(alt_window, axis=0)

score = np.log2((alt_sum)) - np.log2((ref_sum))
print(score)

This is gives raw_score = -0.10555601

Have I done something incorrectly when I calculate this manually in the second step ? I want to make sure my calculation matches. Is there an eps value used?

Thanks,

Sanjana

Hi @Sanjana_Tule,

You can see how our variant scoring works in our open-source model.

For ATAC center-masking, take a look at alphagenome_research/src/alphagenome_research/model/variant_scoring/center_mask.py at main · google-deepmind/alphagenome_research · GitHub where we do np.log2(1+alt) - np.log2(1+ref).

Adding the +1 to each log2 gets to within 1e-4 of the API :smiley:

Hope that helps!