Reproducing TraitGym results

Gonzalo_Benegas · July 7, 2025, 9:12pm

Thank you for your great contribution to the field! I’m trying to reproduce the TraitGym results:

I was able to score variants using score_variant and recommended scorers, but did not get close to the reported performance.
I then tried to reproduce the TraitGym protocol described in the supplement, using the lower-level predict_sequence . However, it was very notably slow and throwing errors related to data transfer quotas.

It would be amazing if you could support the “TraitGym” protocol (e.g. L2 norm, reverse-complement averaging) in a manner similar to the score_variant API.

tward · July 11, 2025, 8:46pm

Hi @Gonzalo_Benegas,

Thanks for the question! Yes we will add support for L2_DIFF_LOG1P aggregation type, which should be sufficient to reproduce the paper’s traitgym results (reverse complement should already be supported by passing an interval with negative strand).

Gonzalo_Benegas · July 11, 2025, 10:13pm

Great to hear, thank you very much!

Gonzalo_Benegas · July 25, 2025, 6:19pm

Hello!

Thanks for the recent addition of L2_DIFF_LOG1P aggregation! I still have some questions about the scorer:

Specifically, for each track predicted by a model, we first computed the predicted log-fold change in activity per position (or bin) due to the variant and then calculated the L2 norm across the sequence.

Does this mean you took L2 norm across the entire 1Mb for all assay types? Or did you use a center mask for some assays?

I’m guessing the protocol cannot be reproduced with CenterMaskScorer as it seems to support a max width of 200kb, which would probably lead to underperformance for RNA-seq.

Thank you for your help!

tward · August 13, 2025, 12:57pm

Apologies for slow reply, I’ve been OOO.

Yes I believe we took the L2 norm for the entire 1Mb region… I didn’t notice that we don’t support requests for full center mask scores, I’ll get that added ASAP.

Gonzalo_Benegas · August 13, 2025, 3:37pm

That would be great, thanks a lot!

tward · August 19, 2025, 8:29pm

Hi @Gonzalo_Benegas,

With v0.2.0 you should now be able to make full center mask scores by providing a width=None to the center mask scorer. E.g.:

from alphagenome.data import genome
from alphagenome.models import dna_client
from alphagenome.models import variant_scorers

model = dna_client.create('API_KEY')

variant = genome.Variant.from_str('chr1:10000:A>G')
interval = variant.reference_interval.resize(2**20)

model.score_variant(
    interval,
    variant,
    variant_scorers=[
        variant_scorers.CenterMaskScorer(
            requested_output=dna_client.OutputType.ATAC,
            width=None,
            aggregation_type=variant_scorers.AggregationType.L2_DIFF_LOG1P,
        )
    ],
)

do let us know if we’ve missed anything else!

Gonzalo_Benegas · August 20, 2025, 6:23pm

Perfect, thank you very much!

Topic		Replies	Views
GeneMaskLFCScorer AggregationType Help & Support	3	1180	October 18, 2025
Suggestion for a new functionality of variant scorer Feedback & Feature Requests	3	921	January 27, 2026
Predict gene-level expression given sequences Feedback & Feature Requests	4	2041	November 17, 2025
The recommended gene expression (RNA-seq) scorers, GeneMaskLFCScorer Help & Support	1	874	October 7, 2025
Predicted ALT expression and the REF expression separately Help & Support	1	75	February 24, 2026

Reproducing TraitGym results

Related topics