Why are there different tracks for different cell line

Lei_Xia · September 25, 2025, 12:46pm

Hi, I’m trying to predict gene expression using AlphaGenome. However, when I tried to predict specific gene of different cell line, I got different output track numbers. Here is an example:

For gene “SLC40A1“, I tested on PANC1 (‘EFO:0002713‘) and GM12878 (“EFO:0002784“), I followed the “quick start“ as mentioned officially. Within the module named “Predict outputs for a genome interval (reference genome)“, my command was as follows:

interval = gene_annotation.get_gene_interval(gtf, gene_symbol='SLC40A1')
interval = interval.resize(dna_client.SEQUENCE_LENGTH_1MB)
output.1 = dna_model.predict_interval(
    interval=interval,
    requested_outputs=[dna_client.OutputType.RNA_SEQ],
    ontology_terms=['EFO:0002784'],
)  
output.2 = dna_model.predict_interval(
    interval=interval,
    requested_outputs=[dna_client.OutputType.RNA_SEQ],
    ontology_terms=['EFO:0002713'],
)

However, the output.rna_seq.values.shape were (1048576, 5) and (1048576, 3), repectively. Also, in the plotting part, the title of each track were different:

, for EFO:0002784 results, while only total (+), total (-), polyA+ (in order) for EFO:0002713.

My problems are:

Why there are different output track types? I suppose maybe because of the different types of RNA-seq data for training in different cell lines?
How could I identify which track relates to total RNA-seq (-) and total RNA-seq (+)? Because the order is not always the same in different cell line, as exampled above where polyA+ (+) came first for EFO:0002784, while total (+) came first for EFO:0002713.

Thank you very much!

tward · September 26, 2025, 9:07am

Hi @Lei_Xia ,

Thanks for your questions! Please find responses below:

Why there are different output track types? I suppose maybe because of the different types of RNA-seq data for training in different cell lines?

I’m not sure about this specific cell line, but our paper methods section “ENCODE RNA-seq Data” details the selection process of RNA-seq tracks. Most likely the EFO:0002713 stranded polyA tracks didn’t pass our QC measures.

How could I identify which track relates to total RNA-seq (-) and total RNA-seq (+)? Because the order is not always the same in different cell line, as exampled above where polyA+ (+) came first for EFO:0002784, while total (+) came first for EFO:0002713.

The easiest way is to use the TrackData filter options to filter tracks. So for your example you can do:

predictions = model.predict_interval(
    interval=interval,
    requested_outputs=[dna_client.OutputType.RNA_SEQ],
    ontology_terms=['EFO:0002713'],
).rna_seq
predictions = predictions.filter_tracks(
    (predictions.metadata['Assay title'] == 'total RNA-seq').values
)

To filter to only “total RNA-seq” assays.

Hope that helps!

Lei_Xia · September 30, 2025, 10:26am

Hi,

Thank you very much! And by the way, could you please tell me whether this model could be used to predict gene expression of some other cell lines which are not recorded in the track metadata table (Suppl Table 2), such as HPDE, OE19 and so on?

Best,
Lei

tward via AlphaGenome <notifications@watermelon.discoursemail.com> 于2025年9月26日周五 12:17写道：

tward · September 30, 2025, 8:16pm

Thank you very much! And by the way, could you please tell me whether this model could be used to predict gene expression of some other cell lines which are not recorded in the track metadata table (Suppl Table 2), such as HPDE, OE19 and so on?

Unfortunately not: the model can only make predictions for cell lines that it has seen during training.

Lei_Xia · October 1, 2025, 7:34am

Thank you. I’m wondering if the foundation model could be finetuned by ourselves through feeding gene expression level as the output for specific cell line?

For example, I’d like to use this model to predict gene expression in OE19, I build my own dataset with target importance genome intervals (such as promoter region, TSS, enhancer region, etc) and the corresponding expression level of the gene, and finetune the model.

Amanda_Stafford · March 6, 2026, 9:39am

Yes, you can fine-tune the model on your custom OE19 dataset, but not directly through the API. As the model’s source code and weights are public, you can download the model and train it on your custom expression data using your own computational resources. Have a look at this colab which demonstrates how to fine-tune AlphaGenome on custom genomic tracks. Please note that you’ll probably need at least one H100 to fine-tune the model.

Topic		Replies	Views
How to predict TF binding across genome Help & Support	0	34	June 5, 2026
Predicting variant effects in cell lines without ontology terms Help & Support	3	125	March 11, 2026
Predict gene-level expression given sequences Feedback & Feature Requests	4	2120	November 17, 2025
How to visualize predictions made with predict_sequence? Help & Support	5	2348	August 11, 2025
DNA methylation data for gene expression prediction Feedback & Feature Requests	2	334	March 2, 2026

Why are there different tracks for different cell line

Related topics