NN Architecture

I think, given the long-sequence dynamics of DNA, it would be worthwhile to try the SigFormer Architecture (instead of a pure Transformer architecture), combining Mamba and Transformer Architecture and keeping the computational costs manageable.