User talk:ElNando888/Blog/SHAPE?

== General ==

Great topic! I agree with the basic premise that the SHAPE results are affected by numerous things besides the presence or absence of the three base pairings that the game's energy model acknowledges. Working to figure out those additional factors is perhaps the most cool aspect of EteRNA.

Omei (talk) 20:34, 21 July 2013 (UTC)Reply[reply]

== Toolset ==

What tools did you use to create these images? Something other than RNA Composer to predict the 3D structure? And the renderings don't look like options I have seen in Chimera, though there could be plugins I don't know about. I'm especially interested in what you used to predict stacking interactions.

Omei (talk) 20:34, 21 July 2013 (UTC)Reply[reply]

----

The 3D renderings are no simulations, they are all segments of solved structures from PDB. First, I used FRABASE to identify the PDB entries containing the sequences I was interested in. Then I fetched the structures in Chimera and worked them with it.

Atoms and "normal" bonds are easy to color any way you want, but you need to define your own color to get the the translucid white I applied on the backbone (select/structure/backbone/full). For the hydrogen bonds, I currently like to make them look like springs, but it's an option relatively hard to access: tools/general controls/pseudobond panel/hydrogen bonds/attributes/component pseudobond attribute/bond style

And you're right, there are no built-in command to detect and represent stacking interactions. First, and as I was saying, it took me a very long while (months) before I could find the parameters that validate a stacking interaction. I found them on the website of another 3D rendering tool, PyMol, at http://www.schrodinger.com/kb/1556

Then it's an arduous work in Chimera, but at least it's possible. There are tools in Structure Analysis which allow to define centroids, planes, axis, show or hide these objects, and also to calculate distances and plane angles. Once a pi stacking interaction is validated, I represent the interaction by defining an axis based on the atoms of both aromatic rings.

The other difficulty I had related to this topic was to ascertain whether the imidazole (pentagonal ring in purines) is aromatic or not. I don't recall exactly where I read that they actually are, but considering that reference and the various examples I saw in solved structures, I believe that they are.

-- ElNando888 (talk) 05:15, 22 July 2013 (UTC)Reply[reply]

----

The 3D renderings are no simulations, they are all segments of solved structures from PDB. First, I used FRABASE to identify the PDB entries containing the sequences I was interested in. Then I fetched the structures in Chimera and worked them with it.

Does that mean the association of the specific 3D structures with the 2 hairpins is simply your choice? If so, could I equally well switch them, and attribute the lower SHAPE reactivity at 26 to the fact that the G has two hydrogen bonds with the opposing A?

I'm not claiming the latter is the right explanation; I'm just trying to better understand your chain of reasoning.

Omei (talk) 16:51, 22 July 2013 (UTC)Reply[reply]

----

No, you could not switch them, because the sequences are clearly different. GGUAAC vs CGUAAG (closing pair swapped)

FRABASE returned in each case a few hits, all of them presenting a similar GNRA-like pattern (which is natural). I didn't verify if the ones I selected were more representative than the others, but I'm fairly sure that all GGUAAC looked very much alike, that all CGUAAG also looked alike, and that the main difference between the two sets was indeed to be found in the position of the first G in the loop relative to the bases forming the closing pair, in other words, their stacking relationships.

And using solved structures seems to me like the best possible option here. Would you rather trust a software simulation than whatever we can manage to measure with XRD or NMR?

-- ElNando888 (talk) 17:45, 22 July 2013 (UTC)Reply[reply]

----

Sorry. As phrased, that was a really dumb question. You stated very clearly that you were comparing two different sequences, but somehow I managed to forget that as I was writing my comment.

What I was actually wondering was how you had figured out that the configuration in the two 3D models you show represented the configuration the sequence took in your design. I too have spent a lot of time looking at the 3D structure of tetraloop hairpins in PDB, and my general impression is that a 6-base sequence will take on quite different configurations (for reasons unknown), just as the SHAPE results for the same sequence in the same position in the same lab can vary widely. So while I have hypothesized that certain 3D configurations correspond to certain SHAPE score patterns, I really have no way of confirming or denying this. I was hoping you had come up with something to help do that.

Have you gathered many instances of the GGUAAC and CGUAAG loops in lab to see how consistent the SHAPE patterns are? If they are reasonably consistent, and the PDB structures are consistent for this sequence across distinct molecules (there are a lot of duplicatations of the same molecule in PDB ), then your explanation bears a lot of weight.

----

Ah, I understand better now. In that case, I say ensemble diversity. When you say:

the SHAPE results for the same sequence in the same position in the same lab can vary widely

I would tend to think that the ensemble diversity of these designs were the cause of the discrepancies. When a shape is pretty stable, like a GNRA tetraloop for instance, specific sequences will stay stable in a very specific 3D conformation.

In the case of the design I'm presenting on the page we're discussing, I have only my experience, my intuition and a few clues (the familiar tetraloop signatures, and the multiloop) that tell me that the ensemble diversity had to be very low for this design. In which case, I can trust that there were no misfolds of any kind and that the SHAPE signatures are associated with only one possible 3D structure. (sidenote: I use Vinnie to "finish" designs, precisely for creating them with the lowest possible ensemble defect)

And for the reproducibility and verifiability, I need time (always a rare resource), and tools, so I probably need to get my head into your software, rather sooner than later ;)

-- ElNando888 (talk) 21:37, 22 July 2013 (UTC)Reply[reply]

----

As an example of what I consider to be typical, here's the SHAPE results from the barcode hairpin for a lab that I just happened to have open (Triloop Buffet).

I filtered the query to only include designs with an overall score of 85 or better, to rule out any gross misfolds. So this shows the SHAPE scores for the 14 designs that were reasonably good overall and used the same UC/GA assignments for the two pairs closing the barcode hairloop. As you can see, of the 8 positions displayed, position 76 is really the only one that is solidly consistent.

Now I have been thinking that ensemble diversity couldn't account for this variation, because (I presumed) there must be large numbers of copies (thousands? -- I really don't know) of the RNA molecules present for each design. So while any one molecule might stay in a particular configuration for many seconds, there would be enough molecules that when averaged over all the instances, the error due to ensemble diversity would be small.

Do I need to rethink this?

----

I don't pretend to know everything that is to be known about SHAPE and RNA, and for instance, I'm a lot less familiar with UUCG than I am with GNRA. But examining the example(s) you're providing, I would observe that it's probably extremely difficult to draw conclusions when you're lacking part of the data. In the case discussed on this wiki page, the whole stacks and the multiloop are clearly visible in the SHAPE data. For the hairpin loop you're presenting, you're missing a very important amount of informations, namely an entire half of a (presumed) stack.

This picture includes all results with the pattern UCUUCGGA, and it also seems to me that many records show clear signs that the stack didn't form as planned, even for highly scored designs...

Restricting the view to the pattern itself may also miss important global clues. In this lab for instance, which I haven't analyzed in detail, I know that the basic 2D structure is supposed to create a specific 3D pattern, namely a double coxial stacking. Instead of a cross, the result is typically 2 helices which are bound by their center. Often, these helices are even parallel and engage in a large number of tertiary interactions which cannot exclude the formation of pseudoknots or kissing hairpins. Given the proximity of the barcode with this structure, I believe we also can't exclude that the barcode may have interacted with the central shape in some cases.

For all these reasons, I would be extremely careful before drawing conclusions about the results of this lab.

-- ElNando888 (talk) 07:27, 23 July 2013 (UTC)Reply[reply]

For the case of the UUCG tetraloop in the barcode hairpin, there is typically a higher error rate in the barcode of the design as you start to near the end of their sequencer limit. This was a problem in the first few rounds of the cloud lab, as the SHAPE reactivity signals could range all the way from full protection to fully exposed. This problem may be minimized by the recent extension of three bases for the sequencer, which moves the last three and most error prone sequenced bases, into a stack.

As for the GUAA tetraloop, I have two examples of their use in other designs (Talk about lack of GUAA tetraloops)

It looks like a complete role reversal, doesn't it? The CGUAAG tetraloop that is supposed to be considered a bit more stable has fallen apart. Then, we see that ViennaUCT used a GGUAAC tetraloop in a different design, and from what it appears, both 35 and 38 are showing minor hints of protection. The conclusion to all of this? Hairpin loops and their closing base pairs are notoriously difficult to predict and duplicate reactivities for.

For Instance, let's take your average GAAA tetraloop, and let us say, when a base is protected, we'll mark it as P, and when it is exposed we'll mark it as R for reactive. I have observed GAAA tetraloops with PRRR, PRRP, PRPR, PRPP, and very rarely you will see other, far more confusing signals, like RRRP. Repeatability of loop motifs is difficult, and a frustrating ordeal, to say the least.

Granted, these screenshots do have identical helices to those in the Can we do it in 10 lab, therefore, reliability can be called into question. However, can't every lab with a different length/classification helix be called into question then?

-- Brourd, 00:41:12, 2013-07-23

----

First of all, welcome aboard! :)

As I mentioned in my preceding comment, I do have specific reasons to call these examples into question, namely because the 3D structure resulting from such a 2D one is typically prone to create tertiary interactions.

Maybe I should have mentioned the "crazy stack" story in the blog, as an example of fully protected duplex composed exclusively of non-canonical pairs, making the case that such non-canonical pairs are known to be weak and couldn't explain alone the SHAPE data, hinting that stacking is probably at least partially responsible, if not largely.

This said, I won't dispute that there is a lot of noise in SHAPE data in the loop areas and closing pairs. I believe that the answers are to be found in the way SHAPE actually works. I'm quite hesitating about posting a blog about that though, as it would plunge very deep into physics and chemistry... Maybe reading the scientific paper I mention in my CBPP page would be enough...

-- ElNando888 (talk) 08:37, 23 July 2013 (UTC)Reply[reply]

----

Nando and Brourd,

I wasn't attempting to draw any conclusion from the Triloop Buffet example other than that there is a lot of variability in the SHAPE data that isn't taken into account when we try to interpret the results of a single design in a single round of a single lab. (I don't even have a guess yet as to how much of that variability is due to experimental noise and how much is due to non-local interactions within the RNA molecule.) I can tell you though, that this variability is not restricted to the barcode hairpin. I picked the barcode hairpin from a lab that went through 2 rounds because I knew there were a lot of instances where the same local assignments were used, so there were a lot of data points.

Nando's conclusion that the leading guanine in the G(GUAA)C tetraloop will be extra well protected from the SHAPE probe seems like a plausible hypothesis, but clearly needs more data to substantiate it. (As Brourd's counter-example illustrates.) I'm sure that when the article is published on the blog, there will be more people who try it out in the current labs. Better yet, we could propose a lab for explicitly testing it.

BTW, I managed to get rid of the Greasemonkey dependency in the data mining tool. Its still beta quality (i.e. missing both important functionality and documentation), so I don't want to advertise it to a wide player audience, but you're welcome to try it out. I'm finding it useful for many things.

Also, on the topic of variability in the possible 3D structures a sequence can take on (as measured in X-Ray diffraction experiments), here's another example of the G(GUAA)C hairpin with a little different twist. This is from the PDB sequence 3O5H (a yeast ribosome).

I tried to make the orientation of the 5' G as close as I could to the one in Nando's image. Note that in this case, the closing G and C are closer to being co-planar (tending to strenghten the H bonds), and the two guanines are less parallel (tending to weaken the stacking interaction). You can see lots of other differences once you start looking for them. Which, if any, have significant effect on the SHAPE score? I would love to help figure it out.

----

This is from the same model, same region, same nucleotides, I just expanded the "view" around a little. As you can see, the main reason of the differing conformation is to be found in the constraints placed on the hairpin loop while it interacts with another stack.

By the way, this is a known fact that GNRA loops tend to engage in a number of tertiary motifs and certain receptor motifs have been studied thoroughly. Also, I seem to recall someone from the lab at Stanford saying that the choice of UUCG for the barcode wasn't an accident: it's a very stable tetraloop, which supposedly tends to interact less (than do GNRA-like ones) with the rest of the environing RNA.

-- ElNando888 (talk) 09:44, 24 July 2013 (UTC)Reply[reply]