Soon, I will celebrate my first anniversary. Nearly a year of participating to this strange experirment, EteRNA, which supposedly is about "crowdsourcing the scientific method"... It never ceases to amaze me though, when I realize how many fundamental questions are left unanswered, or when certain things are just kept unquestioned. For instance, do we really understand what is SHAPE, how it works, and most importantly, what it means?
One of the most common simplifications in the EteRNAverse is the following one: in lab results, blue means paired, yellow means unpaired. This is of course insufficient to explain the various shades that we can observe in many results. Many will explain that our RNAs are synthesized by millions in a test tube, that depending on (let's call it) the "quality" of the sequence, some of those strands may fold a certain way (the MFE for instance), other may adopt another suboptimal shape, and that the results collected are simply a statistical average. Which is true. But, that doesn't explain everything. There has been numerous cases where the results were indicative of a very low ensemble diversity, and still, some bases tended to have shades. And there are also other strange discrepancies, like those "mismatched" pairs with one base apparently paired, the other not, which are making very litle sense...
So what's the (real) deal with SHAPE? Well, there comes inevitably a moment in your EteRNA life when you get past the simplification and thanks to a reading or the random chance to talk to one of the few knowledgeable players in chat, you get to understand that SHAPE signal is not directly linked to the pairing status of the nucleobase. The conclusion is an infered one. SHAPE actually tells us whether the base is rather mobile or rather static. In the cases where a base appears to be "protected" from the chemical probe, the conclusion is that the base is "constrained" into a specific position, and it is assumed that this constraint placed on the mobility is caused exclusively by an actual canonical Watson-Crick pairing. And if the chemical probe manages to bind the O2' on the ribose, the natural conclusion about this demonstrated reactivity is that the base was mobile, which automatically excludes the possibility of a canonical pairing. In the end, it would seem that the "system" still behaves just like the over-simplification was stating, even if things work indirectly, but... are you guys really sure?
Personally, something always bothered me about this causality chain ("constrained" implies "Watson-Crick pairing"). The question in my head was simply: are we, or how can we be sure that base pairing is the only factor capable of causing a constrained mobility? Or differently said: are base pairing interactions (basically, hydrogen bonds) the only stabilization factors in folded RNA? And there was my problem, as I knew already a long time that, however small or big their influence actually are, stacking interactions do play an important role.
How did I know that? Very simple actually. EteRNA blasted my face with that incongruity in the very first days I started playing. Some brilliant scientists made a lot of UV-melting experiments to come up with those free energy values that we see every day in our puzzles and labs. The totality of these parameters form the Turner model which is in use in a very large number of secondary structure prediction softwares, including ViennaRNA, which is the current engine EteRNA is based on.
And this picture tells us immediately and very directly two important things:
- there is more going on in each quads than just the hydrogen bonds linking the pairs enclosing them, otherwise free energies would be identical
- this "more" is worth at least 30% of the free energy, which obviously cannot be considered negligible
Apparently, nobody ever asks "how come?"...
The only logical culprit seems to be stacking. And while it is rather easy to obtain informations online about hydrogen bonds (probably because their properties have been thoroughly studied in the context of water), it took me a very long time to find specific informations related to aromatic rings and pi stacking. If you just look at the Wikipedia article on the subject, you will realize that the topic is quite debated in scientific circles, and you will also notice the absence of specific parameters describing in detail the possible geometries of these conformations, like distances, plane angles, etc.
A note about the following picture: there are about 4000 examples of stems including a 5'-GCG--CGC-3' sequence in the PDB. I have no way to tell if this one could be considered representative or not. But I'll make the case that this particular model doesn't seem to be exotic in any way (the bases are cleanly parallel and coplanar), so it shouldn't be very far from the average.
Same as in the 2D rendering, the 5' is on the right, 3' on the left. The purple sticks represent the pi stacking interactions (mind you, they are only symbolic and do not try to look anything like the combined electron orbital that happens in reality). This seems to explain a few things, wouldn't you say? In the lower quad, the stability of the local geometry is apparently reinforced by all bases having at least one stacking interaction. In the upper quad, we can clearly observe that at least two bases, the two cytosines, are lacking those stabilizing interactions. Currently, the wall I'm banging my head on, is the quantification of the single stacking interactions or lack thereof. If someone has published a related paper in the scientific literature, I'd rather wait until I gain access to it. But maybe nobody has done it, and in that case, I wonder if given enough different cases, we couldn't come up with some sort of formula. That would certainly be an interesting study.
Now, you probably wonder what this all has to do with SHAPE. Let me give you a recent example. I was looking at the results of my latest little experiments, and in the Can we do it in 10 lab, I had submitted a sequence testing a multiloop. I had let Vinnie fill up the rest of the shape, and it turns out that the outcome of the loop wasn't actually the interesting part in the SHAPE data. For some strange reasons (sometimes I suspect this son of a bot to be a little sentient :P), Vinnie had chosen to use the exact same tetraloop sequence, GUAA, but with swapped GC closing pair. And the SHAPE signatures of these tetraloops turned out to be a lot more interesting (at least, to me) than the multiloop.
A few observations:
The hairpin loop with the better free energy is the one where both bases in the closing pair have stacking interactions, as opposed to the case where only one of them is stacked.
The signal on the first Guanine of the hairpin loop seems directly related to the stacking interactions. I will try to use an analogy: imagine you have a few platforms in front of you, large enough for you to stand on them, but these platforms are a little unstable and shaky. Now consider the two following cases: in the first case, you have both feet and one hand to help you keep your balance on a single platform, in the second case, each foot is on a different platform, no hands are helping, and the platforms move indepedently... which case looks more stable to you?
Now, look at the overall SHAPE data. Obviously, the ensemble diversity was very low, in other words, the vast majority of the sequences folded the exact same way (which had some flaws, like the terminal GU and the AA/GA 2-2 loop, hence the lower overall score). So we can be relatively confident that G25 did pair with C30. Considering G26, can you still say "blue = paired"?
Update: an exchange in PM with Eli prompts me to provide a more detailed analysis. So this is what I believe is happening here:
G26 is very strongly stacked with G25, and it's almost the stacking alone that produces the SHAPE signal on it. Unfortunately, this solid position for the base itself is unfavorable, both for C30, which lacks a stacking on one side, and for A29 which doesn't seem to be able to form a proper sheared G.A with G26 and only gets one weak H-bond. In the end, the entire tetraloop is actually less stable.
In the other tetraloop, the stacking for G41 is less favorable, hence the fainter signal on this base, but the hairpin loop gains 2 advantages: the closing pair gets a stacking interaction for both bases, and the A44 base manages to properly form a more solid sheared G.A with G41. Globally, the loop is much more stable. Notice that the second Hbond between G41-A44 does not compensate for the loss in stability for G41 due to its less favorable stacking position in that hairpin loop...
Finally, if we take a look at the purple-marked quad, and given the pieces of knowledge I've presented above, I believe that I can find a likely explanation as to what happened there... Can you?