Run alphagenome locally

Hi, thanks for developing alphagenome. Because the GPU servers in our lab do not have internet access, we are unable to use the API to call AlphaGenome. Therefore, I am trying to run AlphaGenome locally.

I have downloaded the model parameter files as well as the reference files, and constructed the dna_model using the code below.

from alphagenome_research.model import dna_model

model = dna_model.create(  

    checkpoint_path='/home/alphagenome_research-main/model/alphagenome-all-folds/',  

    organism_settings={  

        dna_model.Organism.HOMO_SAPIENS: dna_model.OrganismSettings(  

            fasta_path='/home/alphagenome_research-main/Hs_ref/GRCh38.p13.genome.fa',  

            gtf_feather_path='/home/alphagenome_research-main/Hs_ref/gencode.v46.annotation.gtf.gz.feather',  

            pas_feather_path='/home/alphagenome_research-main/Hs_ref/polyadb_human_v3_exon3_contiguous_gtfv46.feather',  

            splice_site_starts_feather_path='/home/alphagenome_research-main/Hs_ref/gencode.v46.splice_sites_starts.feather',  

            splice_site_ends_feather_path='/home/alphagenome_research-main/Hs_ref/gencode.v46.splice_sites_ends.feather',  

        )  

    },  

    device=jax.local_devices()[0],  

)

 

Then I run the example codes:

interval = genome.Interval(chromosome='chr22', start=35677410, end=36725986)
variant = genome.Variant(
    chromosome='chr22',
    position=36201698,
    reference_bases='A',
    alternate_bases='C',
)

outputs = model.predict_variant(
    interval=interval,
    variant=variant,
    ontology_terms=['UBERON:0001157'],
    requested_outputs=[dna_model.OutputType.RNA_SEQ],
)

However, I encountered the following error:

ValueError                                Traceback (most recent call last)
Cell In[9], line 9
      1 interval = genome.Interval(chromosome='chr22', start=35677410, end=36725986)
      2 variant = genome.Variant(
      3     chromosome='chr22',
      4     position=36201698,
      5     reference_bases='A',
      6     alternate_bases='C',
      7 )
----> 9 outputs = model.predict_variant(
     10     interval=interval,
     11     variant=variant,
     12     ontology_terms=['UBERON:0001157'],
     13     requested_outputs=[dna_model.OutputType.RNA_SEQ],
     14 )

File ~/softwares/alphagenome_research-main/src/alphagenome_research/model/dna_model.py:625, in AlphaGenomeModel.predict_variant(self, interval, variant, organism, requested_outputs, ontology_terms)
    615 alternate_sequence = jax.device_put(
    616     np.asarray(self._one_hot_encoder.encode(alternate_sequence))[
    617         np.newaxis
    618     ],
    619     device,
    620 )
    621 organism_indices = jax.device_put(
...
    692       f"{fq_name!r} with retrieved shape {param.shape!r} does not match "
    693       f"shape={shape!r} dtype={dtype!r}")
    695 return param

ValueError: 'alphagenome/embed/embeddings' with retrieved shape (2, 1536) does not match shape=[1, 1536] dtype=<class 'jax.numpy.float32'>

Could you please advise what might be causing this issue? Thank you very much.

Hi @chujie_sun, welcome to the community!

Ah, this is because our model parameters assume that you have 2 organisms, whereas you’re only passing 1 organism into the settings. We’ll look to make this better, but in the meantime, can you try this:

from alphagenome_research.model import dna_model

model = dna_model.create(
    checkpoint_path='/path/to/checkpoint',
    organism_settings={
        dna_model.Organism.HOMO_SAPIENS: dna_model.OrganismSettings(),
        dna_model.Organism.MUS_MUSCULUS: (
            dna_model.OrganismSettings()
        ),  # Add empty mouse settings
    },
)

Specifically, adding an empty MUS_MUSCULUS organism setting entry will ensure the model params are loaded correctly.

Hopefully this unblocks you!

Tom

Hi Tom, thanks for your reply. Now it works correctly.

Chujie