Hi AlphaGenome team,
Quick clarification question regarding the train/val/test splits exposed by the AlphaGenome API versus the Borzoi splits used in benchmarks.
Using the API, I get the following fold assignments:
from alphagenome.data import fold_intervals
from alphagenome.models import dna_client
for mv in [dna_client.ModelVersion.FOLD_0,
dna_client.ModelVersion.FOLD_1,
dna_client.ModelVersion.FOLD_2,
dna_client.ModelVersion.FOLD_3]:
train = fold_intervals.get_fold_names(mv, fold_intervals.Subset.TRAIN)
valid = fold_intervals.get_fold_names(mv, fold_intervals.Subset.VALID)
test = fold_intervals.get_fold_names(mv, fold_intervals.Subset.TEST)
print(f"{mv.name}: train={train}, valid={valid}, test={test}")
which yields:
FOLD_0: train=[fold2–fold7], valid=['fold0'], test=['fold1']
FOLD_1: train=[fold0,1,2,5,6,7], valid=['fold3'], test=['fold4']
FOLD_2: train=[fold0,1,3,4,6,7], valid=['fold2'], test=['fold5']
FOLD_3: train=[fold0–fold5], valid=['fold6'], test=['fold7']
In Borzoi, the convention (per the paper / released setup) is split3 = test and split4 = validation, whereas from the API it appears these are swapped (e.g. FOLD_1 uses fold3 as val and fold4 as test).
My question is simply:
- Is this inversion intentional in AlphaGenome, or should the API folds exactly match Borzoi’s test/validation assignment?
Thanks a lot in advance !