Hello everyone,My research involves the regulatory mechanisms of promoters and enhancers. I am currently using alphagenome to predict the activity of a synthetic promoter and optimize it using genetic algorithms.In my approach, I virtually constructed a dual-luciferase reporter vector and used the mean signal from the CDS region. However, the predicted activity showed a significant discrepancy from established literature data. This led me to question whether the RNA-seq signal strength provided by alphagenome is a relative score or an absolute value.To address this, I switched to a dual-reporter system with an internal reference normalization. Unfortunately, the results are still not ideal. I have read the original alphagenome paper in Nature and searched the community forums, but I couldn’t find a definitive answer.My core question is: How should we interpret the RNA-seq track output? Is it a relative measure, an absolute value, or a processed absolute value? Any insights from the community or the official team would be greatly appreciated.Additionally, for reference, here is the source code for my prediction tool based on the virtual dual-luciferase assay: https://github.com/JiGuzhai/VirDLA”
and some primarily test data in the README.md “supplyment”
Hi There!
Thanks for reaching out.
The AlphaGenome RNA-seq track output represents a processed absolute value, specifically base-resolution normalized read coverage (Reads per Million, scaled so each track sums to 100 million). The predicted coverage is relative, as it reflects the normalized RNA-seq coverage in the training data.
While computing the mean signal across your CDS is the standard approach to estimate expression, discrepancies in virtual reporter assays are expected. Synthetic vectors lack native chromatin contexts and long-range interactions, which AlphaGenome heavily relies on for its 1-Mb input windows. To accurately predict synthetic promoter activity (similar to MPRA benchmarks), the AlphaGenome team recommends using a LASSO regression model to aggregate predicted features across multiple modalities (e.g., DNase, ChIP-seq, and RNA-seq) rather than relying solely on RNA-seq.
Kind regards,
Tumi