Recapitulating tissue-specific patterns

Hi everyone!

I‘m trying to understand why it‘s challenging for AlphaGenome to pick up on tissue-level patterns extremely well. Especially considering the training signals are available. I had the following questions:

  1. Beyond training for 15,000 batches, what was the impact on cell/tissue level tracks vs. other tasks? Could it be that more training is required to fit tissue-level patterns (only)?

  2. If additional training doesn‘t help, could it be possible that the model is cueing into causally irrelevant signals to minimize loss? Perhaps due to simplicity bias or limited data that isn‘t enough to learn these causally relevant (but complex) signals?

Thank you for your time!