Hi, Thank you for the great work!
I’m interested in how mutated sequences affect gene expression, and I read there is a post showing how to do the RNA expression prediction given a sequence (Uploading sequences to AlphaGenome). I found that the RNA-seq expression predictions are given for each nucliotide.
I’m wondering if it is possible to predict the RNA-seq expression on gene level, given a sequence? If not, could you share how to aggregate the predictions per basepair on gene level?
Assuming you don’t want to provide a custom sequence, you can calculate the expression by gene by using score_interval, which takes an interval and generates a “score”, which if you use the GeneMaskScorer will give you a score per gene in the region. Example:
from alphagenome.data import genome
from alphagenome.models import dna_client
from alphagenome.models import interval_scorers
model = dna_client.create('MY_KEY')
interval = genome.Interval('chr1', start=2**20, end=2**20 + 2**20)
scores = model.score_interval(
interval,
interval_scorers=[
interval_scorers.GeneMaskScorer(
requested_output=dna_client.OutputType.RNA_SEQ,
width=200_001,
aggregation_type=interval_scorers.IntervalAggregationType.MEAN,
)
],
)[0]
If you want to do this on a custom sequence, unfortunately you’ll have to manually aggregate the predictions for each gene region in the prediction. We don’t have much in the way of helpers here, but you’d make a predict_sequence request, get the RNA_SEQ response and then slice and aggregate for each gene.