User:ElNando888/Blog/Models

From Eterna Wiki

< User:ElNando888‎ | Blog

Revision as of 10:50, 20 February 2014 by ElNando888 (talk | contribs) (added link to follow-up story)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

What does one need to run a simulation on a computer? Well, quite trivially, you probably need a computer to begin with (duh!). Then, you will need to describe the starting point of your simulation, in other words, explain to the computer what is the state of the system at time T0. And finally, the computer probably needs to know how the system can (and/or must) evolve over time, and for that job, it needs a model.

You probably heard this strange word in conversations in the chat room, or read it in various places. You possibly even know that for RNA folding there are various models, and that the one we're using in EteRNA is just one of them. And if you're really well informed, you even know that this model in use at EteRNA is actually a little outdated.

What's a RNA folding model? Well, I'm not exactly sure to be honest, because the word is often used to describe different things. Here is my understanding of it, take it with a grain of salt.

 

  • First, there is what I call a principle, the fundamental idea if you prefer. And I believe, this is what should be properly referred to as 'model'. For RNA, this basic principle is called nearest neighbor. In short, it stems from the idea that a RNA secondary structure naturally delimits spaces in between the backbone and the base pairs, and for each of these spaces, a free energy contribution can be determined, which depends exclusively on the bordering nucleobases around that space, hence the 'nearest neighbor' name.
  • Then there are the numerical details. Basically, it's a long list of numbers describing these free energy contributions in various cases. For instance, what is the free energy of a GAAA tetraloop closed by a CG pair (which is often not the same as a GC pair). These lists should be called parameters, in my opinion, but you will often see them referred to as model too. For instance, the parameters in use at EteRNA are called Turner 1999.
  • Finally, I want to mention the word dynamic programming, as the computational technique that allows to determine the MFE in polynomial time.

 

What about using an example?

GGAAAGAGACUGAGACACUCAGACAGCGAAAGCAAUAAAGUCAAUAAGAGAAAAGUCAAACAGAACUCGUCACUCUUCGGAGUGACAAAAGAAACAACAACAACAAC

I ran this sequence through 3 different 'models' (let's call it that way for now). It resulted in 3 different predicted MFEs, so I also asked each model to tell me what they thought of the 2 other structures.

 

<tbody> </tbody>

Model
A
Model
B
Model
C
<img style="background-color: grey; width: 100px;" src="/wiki/images/B1X_A_t.png" alt="" />
MFE of model A
-25.0 -17.5 -21.06
<img style="background-color: grey; width: 100px;" src="/wiki/images/B1X_B_t.png" alt="" />
MFE of model B
-23.7 -17.6 -21.18
<img style="background-color: grey; width: 100px;" src="/wiki/images/B1X_C_t.png" alt="" />
MFE of model C
-22.8 -16.7 -23.11

 

Of course, you already recognized the tails of this sequence, so you already know that it is a sequence meant to be tested in the Cloud Lab.

And it was.

 

B1X SHAPE.png

 

Ok, I've been trying to think of a smart comment about this picture for the past 5 minutes, and failed. Which model is more accurate in this case is pretty obvious, right? Be careful before jumping to conclusions though. This is one case. The relative performance of models cannot be measured with a single structure, obviously.

 

 

 

 

----

  • The sequence is a design by Brourd that only fell very short of being a winner (here its SHAPE data)
  • In all instances, I used ViennaRNA in its 2.1.3 version. So in all cases, it was the nearest neighbor model.
  • For A, the engine was using the Turner 1999 parameters and dangles=1, which means the prediction should be identical or extremely similar to what we get in EteRNA
  • For B, the engine was using its current default settings, which are based on the Turner 2004 parameters
  • For C, it was using the Andronescu 2007 parameters (those differ from the the Turner ones essentially by the method used to obtain them, which is quite interesting and may be the topic of a future post)

----

 

See also this followup.