User:Dennis9600

From EteRNA WiKi
Jump to: navigation, search

I have not run my lab analysis scripts in a while.  Sorry about that.  I am in the process of relocating from Arizona to Northern California.  Kind of busy with life outside of EteRNA.  I hope to get back in the game soon.

 

I have written some Python scripts which help me analyse EteRNA labs.  One of them collects data on all the past labs and looks for factors that correlate well with synthesis scores.  The other analyzes the active labs and produces a report and Vienna 2.1.1 dot plot for every submitted design.  I post these reports (in text and spreadsheet format on Google Drive).


Analysis Reports and Spreadsheets produced by my Python script can be found in ReportsActiveLabs.  You can bookmark this folder in your browser.  I have decided to stick with one folder rather than change with each publication cycle. 

 

I have updated the statistical forecasting tool which predicts synthesis scores of lab designs in the active labs based on a factors analysis of all past labs.  A new factor, ensemble diversity, has been added and the weights of the other factors have changed a little.  The predictive power of the tool as measured by correlation with past syn scores and standard deviation of the prediction error has improved.  (There is still quite a bit of room for improvement though. Some of the worst outliers are laughably wrong.)

 

 

SmileThe lab entitled "A Codon Riboswitch" has multiple sub-labs, sub-projects (or whatever the correct terminology is).  My analysis script now navigates the heirarchy and generates outputs for all of them.   I think so, anyway. Wink

 

I have  added Vienna 2.1.1 dot plots to my publications.  I wanted to incorporate them into the report files, but it was much more convenient to just make a separate file for each submitted design.  Giving each one a unique file name that was acceptable to Python, the Windows operating system, Ghostscript, and Google Drive required a little name manglingThere are a huge number of files this time.  I haven't had time to look at everything I uploaded to the ReportsActiveLabs folder.  Please PM me if you spot any problems.

 

Analysis Reports and Spreadsheets produced by my Python script can be found in ReportsActiveLabs.  You can bookmark this folder in your browser.  I have decided to stick with one folder rather than change with each publication cycle. 

 

In this edition of my analysis reports and spreadsheets (new items are bolded):

  1) Every lab has it's own subfolder now

  2) Vienna 2.1.1 dot plots for each submitted design!!!

  3) If the target structure contains locked, non-Canonical pairs, the Vienna tools are called with a --nsp option that allows the pairing and assigns an energy of 0 to it.  Zero may not

be the best value to use, but it seems to be better than not doing anything.  Please PM me if you find a better way to treat non-canonical pairs...

  4) Some of the labs now have a starting sequence of 'GG' instead of 'GGAAA'.  It looks like the devs have left things open for additional "tails" to be used in future labs.  For now, my script

recognizes both of the sequences that have appeared and uses the correct one for each lab.

  5)  There is a field for the Vienna 2.1.1 melt point of each design submitted.  This new field now appears in both the text reports and the spreadsheets.

  6) There is a field for the Diversity in the Vienna 2.1.1 MFE ensemble.  Lower is better.  This factor has the highest degree of correlation with synthesis scores of all the factors I have examined.

 

6) My lab tool includes a forecasting tool that looks at factors that have correlated well with past synthesis scores.  The current version of the forecasting tool looks at the following factors:

    a) Whether or not the design folded correctly in EteRNA's energy model (Vienna 1.8.5)

    b) Whether or not the design folded correctly in the Vienna 2.1.1 Energy Model

    c) The frequency of the design in the Vienna 2.1.1 MFE ensemble.

    d) The Vienna 2.1.1 ensemble diversity.   This factor is given the highest weight in the forecasting tool.

    e) The melt point of the design (as reported by the EteRNA server)

    f) The percentages of C,U, and G in the design, and how far they differ from 13, 10, and 21% respectively (the Berex Strategy).

    g) log10(designer's EteRNA points).  This factor is given a very low weight because it is only weakly correlated with synthesis score.

 

Things I have looked at:

I have also looked at the temperature setting of the energy model as a possible factors for my forecasting tool.  I was suprised to find that the default setting of 37C is the best setting for my forecasting tool.

 

 

Things to do:

1. I am (still) thinking about how to add Vienna 2.1.1 melt curves  to these reports.  

2. Investigate what changes EteRNA made to Vienna 1.8.5 dot plots.

3. Look at "upgrading" to Vienna 2.1.3 (from Vienna 2.1.1) in my toolset.

 

Please let me know if you have any other things you would like to see in my lab reports.

 

Happy folding,

---Dennis9600

Personal tools
Main page
Introduction to the Game