Val and test in FOLD_1 compared to Borzoi

bio_decode · February 9, 2026, 6:57am

Hi AlphaGenome team,

Quick clarification question regarding the train/val/test splits exposed by the AlphaGenome API versus the Borzoi splits used in benchmarks.

Using the API, I get the following fold assignments:

from alphagenome.data import fold_intervals
from alphagenome.models import dna_client

for mv in [dna_client.ModelVersion.FOLD_0,
           dna_client.ModelVersion.FOLD_1,
           dna_client.ModelVersion.FOLD_2,
           dna_client.ModelVersion.FOLD_3]:
    train = fold_intervals.get_fold_names(mv, fold_intervals.Subset.TRAIN)
    valid = fold_intervals.get_fold_names(mv, fold_intervals.Subset.VALID)
    test = fold_intervals.get_fold_names(mv, fold_intervals.Subset.TEST)
    print(f"{mv.name}: train={train}, valid={valid}, test={test}")

which yields:

FOLD_0: train=[fold2–fold7], valid=['fold0'], test=['fold1']
FOLD_1: train=[fold0,1,2,5,6,7], valid=['fold3'], test=['fold4']
FOLD_2: train=[fold0,1,3,4,6,7], valid=['fold2'], test=['fold5']
FOLD_3: train=[fold0–fold5], valid=['fold6'], test=['fold7']

In Borzoi, the convention (per the paper / released setup) is split3 = test and split4 = validation, whereas from the API it appears these are swapped (e.g. FOLD_1 uses fold3 as val and fold4 as test).

My question is simply:

Is this inversion intentional in AlphaGenome, or should the API folds exactly match Borzoi’s test/validation assignment?

Thanks a lot in advance !

bio_decode · February 10, 2026, 5:15am

more precisely, I’m wondering, for the test evaluation of the fold-wise model compared to Borzoi, was the AG reported test score obtained on fold4 or fold3 ? (I guess it should have been fold3 for to be comparable with Borzoi, but according to what the API returns seems it might have been fold4 ?)
Thanks a lot in advance!

Amanda_Stafford · February 24, 2026, 5:21pm

It is possible that the validation and test set were swapped between Borzoi and AlphaGenome (fold 3 vs fold4).

I don’t think this changes any of the results in the paper. The most important thing is that the training set was the same. Furthermore, we have performed model iterations using the fold0 model. The fold1 model was trained without any consideration of the metrics in either validation or test set (we also haven’t used any early stopping). This means that both should be representative of the test-set performance.

Hope this helps!

Topic		Replies	Views
Borzoi folds Correspondence to AlphaGenome folds Help & Support	1	868	September 25, 2025
AlphaGenome – now available through an API Announcements	5	2533	July 9, 2025
Run alphagenome locally Help & Support	4	548	February 12, 2026
Share some training scripts and evaluation usage notebook Help & Support	3	246	February 20, 2026
Can't reproduce alphagenome's benchmarks Help & Support	9	2809	September 20, 2025

Val and test in FOLD_1 compared to Borzoi

Related topics