Dear AlphaGenome team,
Thank you for the impressive work on AlphaGenome. I have a question regarding the processing of contact maps during training.
In the paper, it is mentioned that Micro-C contact maps at 1000 bp resolution were interpolated to 2048 bp resolution to align with the pairwise representation blocks. However, I was wondering why the training pipeline does not instead reprocess the original Micro-C .pairs files directly into contact maps at 2048 bp resolution.
From a data fidelity perspective, would contact maps generated by interpolating 1000 bp data be equivalent to contact maps directly aggregated from .pairs files at 2048 bp resolution? Specifically:
- Are there known differences in signal quality, noise, or smoothing artifacts between the interpolated maps and those binned natively at 2048 bp?
- Would reprocessing
.pairsfiles potentially improve accuracy or consistency in downstream model predictions?
I would greatly appreciate any insights into this design choice. Thank you again for making your work available to the community.
Best regards,
Yusen