Deletion of exon

Hello,

I am interested in looking at how RNA-seq expression changes when I delete the first exon entirely. I expected that on plotting, there should be a stark difference between the RNA-seq output of the ref vs variant, however, on plotting both the graphs of the reference and variant look the same.

I believe there is a problem in obtaining the sequence of that gene in that first exon region, which does not match with the ref genome. I have attached the code below. I would appreciate some input, as to how to get the sequence to match the ref genome/ where I can verify the sequence matches the ref genome.

The sequence I obtained for the beta actin gene on the negative strand on chr7’s first exon is from UCSC. I tried the following the code in the quickstart as much as I could.

Code:

// # Loading the gtf file

gtf = pd.read_feather(
https://storage.googleapis.com/alphagenome/reference/gencode/’
‘hg38/gencode.v46.annotation.gtf.gz.feather’
)
gtf_transcripts = gene_annotation.filter_protein_coding(gtf)
gtf_transcripts = gene_annotation.filter_to_longest_transcript(gtf_transcripts)
transcript_extractor = transcript_utils.TranscriptExtractor(gtf_transcripts)

// # declaring variables

exon1_start = 5530601
exon1_end = 5530524
exon1_sequence = ‘ACCGCCGAGACCGCGTCCGCCCCGCGAGCACAGAGCCTCGCCTTTGCCGATCCGCCGCCCGTCCACACCCGCCGCCAG’

variant = genome.Variant(
chromosome=“chr7”,
position=exon1_start,
reference_bases=exon1_sequence,
alternate_bases=“”
)

interval = variant.reference_interval.resize(dna_client.SEQUENCE_LENGTH_1MB)

variant_output = model.predict_variant(
interval=interval,
variant=variant,
requested_outputs=[dna_client.OutputType.RNA_SEQ],
ontology_terms=[‘UBERON:0001114’])

// # plotting the data
longest_transcripts = transcript_extractor.extract(interval)

plot_components.plot(
[
plot_components.TranscriptAnnotation(longest_transcripts),
plot_components.OverlaidTracks(
tdata={
‘REF’: variant_output.reference.rna_seq,
‘ALT’: variant_output.alternate.rna_seq,
},
colors={‘REF’: ‘dimgrey’, ‘ALT’: ‘red’},
),
],
interval=variant_output.reference.rna_seq.interval.resize(2**15),
annotations=[plot_components.VariantAnnotation([variant], alpha=0.8)],
)
plt.show()

Based on the start and end positions of your genomic region, it appears you’re working with the negative strand.
To ensure proper alignment with the forward reference sequence, follow these two steps:

  1. Swap the start and end positions, then reverse-complement the sequence.

  2. The variant must be left-aligned to ensure it is represented at the most upstream position possible.