I was wondering about some of the preprocessing done with the training data for alpha genome, particularly around blacklisted regions.
From what I understand, the sequence regions come from Borzoi’s splits (extended to 1 Mb), and if they intersect with an unmappable region, they’re removed. Borzoi also excluded certain sequences that heavily overlapped unmappable regions.
What I’m less clear on is how blacklisted regions are handled. In Borzoi’s preprocessing, the output tracks in blacklisted regions were set to a background baseline. But I couldn’t see any reference to a similar processing step with alpha genome. I was wondering if something was done with this, or if it may not be necessary for training these types of models.