Predict gene-level expression given sequences

Hi, Thank you for the great work!
I’m interested in how mutated sequences affect gene expression, and I read there is a post showing how to do the RNA expression prediction given a sequence (Uploading sequences to AlphaGenome). I found that the RNA-seq expression predictions are given for each nucliotide.

I’m wondering if it is possible to predict the RNA-seq expression on gene level, given a sequence? If not, could you share how to aggregate the predictions per basepair on gene level?

Thank you!

Hi @pumpkinguagua,

Assuming you don’t want to provide a custom sequence, you can calculate the expression by gene by using score_interval, which takes an interval and generates a “score”, which if you use the GeneMaskScorer will give you a score per gene in the region. Example:

from alphagenome.data import genome
from alphagenome.models import dna_client
from alphagenome.models import interval_scorers

model = dna_client.create('MY_KEY')

interval = genome.Interval('chr1', start=2**20, end=2**20 + 2**20)

scores = model.score_interval(
    interval,
    interval_scorers=[
        interval_scorers.GeneMaskScorer(
            requested_output=dna_client.OutputType.RNA_SEQ,
            width=200_001,
            aggregation_type=interval_scorers.IntervalAggregationType.MEAN,
        )
    ],
)[0]

If you want to do this on a custom sequence, unfortunately you’ll have to manually aggregate the predictions for each gene region in the prediction. We don’t have much in the way of helpers here, but you’d make a predict_sequence request, get the RNA_SEQ response and then slice and aggregate for each gene.

Hope that helps!