Estimate the functional impact of deletions (in Kb) using the alpha genome

Venkatbioinfo · March 17, 2026, 11:17am

Hi,
I am working with a ~1400 bp deletion identified from ONT long-read sequencing (hg38). I want to evaluate its functional impact on local genomic features such as regulatory elements and gene activity.

Given that score_variant is designed for single variants, would it be appropriate to model this deletion as a single contiguous structural variant by defining the full reference span and corresponding deleted sequence as alternate_bases?

Alternatively, would you recommend generating a custom sequence with the deletion applied and using predict_sequence() followed by manual comparison with the reference predictions?

How reliable are these approaches for capturing the functional consequences of deletions of this size?

Tumi_Makgatho · March 18, 2026, 11:05am

Hi There!

Thanks for reaching out.

The main difference here is if you want to predict biological activity or run a comparative analysis to measure the impact of a variant - both will work natively with your deletion.

You can define a genome.Variant with the 1400 bp deletion in alternate_bases and input into score_variant. Please note that while score_variant only considers a single genome.Variant, genome.Variant can be composed of multiple mutations/SNPs/indels. Or you can construct a custom sequence with the deletion removed and predict using predict_sequence().

AlphaGenome fully supports analyzing indels; insertions, deletions, and inversions of random lengths between 1 and 20 base pairs were used during distillation. There is no limit on the size of SV/indels that can be analyzed, however as your 1400 bp deletion exceeds this training distribution, the model’s predictions may be less reliable.

Kind regards,
Tumi

bb25 · March 20, 2026, 6:41pm

Hi Tumi,

Thank you for your answer re. estimating the functional impact of large deletions (in Kb) using AlphaGenome.

I am also interested in applying AlphaGenome to predict effects of a very large deletion (100 Kb) on gene expression of nearby genes. I’ve tried the function score_variant and got conflicting results for RNAseq with different cell lines. I understand that the model’s prediction may be less reliable due to the size of the deletion, however I’m curious to know whether different RNAseq quality may affect the prediction or not. For example, in the supplementary method, you mentioned “normalized RNA-seq tracks from ENCODE and GTEx were grouped by their ontology CURIE and assay type… Within each group, the normalized signals were averaged across all included experiments or individuals”.

Could this create some bias, for example if a cell lines (such as fibroblasts) are present with multiple high quality experiments, the prediction for this cell line will be more accurate than the ones with limited data in GTEX/ENCODE?

Tumi_Makgatho · March 25, 2026, 11:08am

Hi There!

Thanks for reaching out.

Yes, varying data availability across cell lines can affect prediction accuracy. AlphaGenome generates training tracks by averaging normalized signals across all available experiments or individuals for a given biological context. While the model predicts overall expression levels well, accurately capturing cell-type-specific expression deviations and condition-specific variant effects remains a known challenge and is a priority for future work. However, as you pointed out, the conflicting results are likely due to the massive context loss induced by a 100kb deletion.

Kind regards,
Tumi

Topic		Replies	Views
How to properly predict gene expression and histone changes for a large deletion using predict_sequence function Help & Support	3	106	April 2, 2026
Multiple Deletions Help & Support	1	175	February 6, 2026
Uploading sequences to AlphaGenome Help & Support	5	1994	July 17, 2025
Using AlphaGenome to predict the cooperativity of multiple SNPs? Community	6	1818	August 5, 2025
Support for Multi-Base Mutations in AlphaGenome Feedback & Feature Requests	3	1262	April 29, 2026

Estimate the functional impact of deletions (in Kb) using the alpha genome

Related topics