How ISM used to negative stand?

Hello, thank you for sharing such an excellent model. I have been experimenting a lot with it recently and have high expectations for ISM.

The gene I am interested in has the following interval format:
Interval(chromosome='chr17', start=, end=, strand='-', name='')
where the strand is (-).

When I try to run ISM using the following code:

variant_scores = dna_model.score_ism_variants(
interval=sequence_interval,
ism_interval=ism_interval,
variant_scorers=[dnase_variant_scorer],
)

I get the following error:
ValueError: ISM interval must be on the positive strand.

Does this mean that the current ISM interval functionality only works for positive strand genes?
I am concerned that simply changing the strand to (+) might cause issues in the analysis, so I wanted to ask for clarification.

The ISM code requires to specify two intervals: inteval and ism_interval. ism_interval is the region in the genome that will get mutated. Because the genetic variants are unstranded, we don’t allow ism_interval to contain the strand. I thinkinterval on the other hand can be from the negative strand. In this case, the sequence from the negative strand will be used to make the prediction, but the results will still be reported relative to the positive strand.

Most importantly, variant scores will always be returned with respect to all the genes, both on the positive and negative strand. Note that in your case, you are trying to use a DNase variant scorer which is unstranded.

1 Like

Thank you for the helpful response. I just learned for the first time that the variant_scorer always evaluates both alleles. If the transcription region of the protein I am interested in is relatively small, would it be correct to understand that the raw_score in RNA-SEQ may be smaller than the quantile_score because both alleles are being evaluated?

Both raw and quantile scores use both alleles to evaluate. In the case of RNA-seq, log-fold change is computed. Quantile scores are a non-linear, but monotonic transformation of raw scores. This means that yes, there might be cases where one is smaller than the other.

1 Like