Hello,
I am interested in looking at how RNA-seq expression changes when I delete the first exon entirely. I expected that on plotting, there should be a stark difference between the RNA-seq output of the ref vs variant, however, on plotting both the graphs of the reference and variant look the same.
I believe there is a problem in obtaining the sequence of that gene in that first exon region, which does not match with the ref genome. I have attached the code below. I would appreciate some input, as to how to get the sequence to match the ref genome/ where I can verify the sequence matches the ref genome.
The sequence I obtained for the beta actin gene on the negative strand on chr7’s first exon is from UCSC. I tried the following the code in the quickstart as much as I could.
Code:
// # Loading the gtf file
gtf = pd.read_feather(
‘https://storage.googleapis.com/alphagenome/reference/gencode/’
‘hg38/gencode.v46.annotation.gtf.gz.feather’
)
gtf_transcripts = gene_annotation.filter_protein_coding(gtf)
gtf_transcripts = gene_annotation.filter_to_longest_transcript(gtf_transcripts)
transcript_extractor = transcript_utils.TranscriptExtractor(gtf_transcripts)
// # declaring variables
exon1_start = 5530601
exon1_end = 5530524
exon1_sequence = ‘ACCGCCGAGACCGCGTCCGCCCCGCGAGCACAGAGCCTCGCCTTTGCCGATCCGCCGCCCGTCCACACCCGCCGCCAG’
variant = genome.Variant(
chromosome=“chr7”,
position=exon1_start,
reference_bases=exon1_sequence,
alternate_bases=“”
)
interval = variant.reference_interval.resize(dna_client.SEQUENCE_LENGTH_1MB)
variant_output = model.predict_variant(
interval=interval,
variant=variant,
requested_outputs=[dna_client.OutputType.RNA_SEQ],
ontology_terms=[‘UBERON:0001114’])
// # plotting the data
longest_transcripts = transcript_extractor.extract(interval)
plot_components.plot(
[
plot_components.TranscriptAnnotation(longest_transcripts),
plot_components.OverlaidTracks(
tdata={
‘REF’: variant_output.reference.rna_seq,
‘ALT’: variant_output.alternate.rna_seq,
},
colors={‘REF’: ‘dimgrey’, ‘ALT’: ‘red’},
),
],
interval=variant_output.reference.rna_seq.interval.resize(2**15),
annotations=[plot_components.VariantAnnotation([variant], alpha=0.8)],
)
plt.show()