Recapitulating tissue-specific patterns

Varun_Mulchandani · November 23, 2025, 3:46am

Hi everyone!

I‘m trying to understand why it‘s challenging for AlphaGenome to pick up on tissue-level patterns extremely well. Especially considering the training signals are available. I had the following questions:

Beyond training for 15,000 batches, what was the impact on cell/tissue level tracks vs. other tasks? Could it be that more training is required to fit tissue-level patterns (only)?
If additional training doesn‘t help, could it be possible that the model is cueing into causally irrelevant signals to minimize loss? Perhaps due to simplicity bias or limited data that isn‘t enough to learn these causally relevant (but complex) signals?

Thank you for your time!

Daniel_Scott · December 23, 2025, 9:58pm

Hello,

Thank you for reaching out, and apologies for the delayed response.

The training duration of 15,000 steps was selected to balance performance on both reference genome prediction tasks and zero-shot variant effect prediction tasks, as evaluated on validation data subset. Training for longer would improve reference genome predictions, including the tissue-level patterns you mention.
It is very plausible that the model is investing some representation capacity to reproduce causally irrelevant signals. We have not investigated this, but for example removing assay-specific enzyme biases from the training data is a promising and exciting research direction (e.g. see ChromBPNet). As you mention, increasing the amount of data could also potentially help.

Thank you!

Topic		Replies	Views
My journey through the biology and data of AlphaGenome Community	0	121	February 18, 2026
Did AlphaGenome Use Reference Genome DNA or Sequencing Data to Predict Functional Genomic Tracks during training? Help & Support	2	73	March 4, 2026
Can't reproduce alphagenome's benchmarks Help & Support	9	2736	September 20, 2025
DNA methylation data for gene expression prediction Feedback & Feature Requests	2	290	March 2, 2026
How to fine-tune for new downstream tasks (such as identifying polyA sites) and extract embeddings? Help & Support	1	78	February 24, 2026

Recapitulating tissue-specific patterns

Related topics