User:Omei/Cloud Lab Data Mining Tool

From Eterna Wiki

< User:Omei

Revision as of 06:06, 4 July 2013 by ElNando888 (talk | contribs) (added a few internal links)

Cloud Lab Data Mining Tool

At this point, the biggest need I feel in EteRNA is for a way to extract meaning from the results of thousands of lab designs that have been synthesized.  Ideally, something addressing this need will be built into the EteRNA GUI.  But I decided to just see what I could do to contribute ideas and a sample implementation.

 

I envision three major major parts.

  1. A method for finding the labs that are relevant to a particular question.  As a starting point, this might take the form of searching by sequence or structure motif.  Existing examples are <a href="http://cossmos.slu.edu/search.php">CoSSMoS</a> and <a href="http://rmdb.stanford.edu/repository/advanced_search/">RMDB</a>.
  2. A method of interactively viewing the SHAPE results for all the synthesized designs in a single lab.  This is the part I am actively working on.
  3. A method for integrating the results of the previous two steps.  For example, after finding a relevent lab (step 1) and developing a hypothesis based on it (step 2), it would be nice to be able to gather up all "analogous" data from any other relevent labs, to see if the hypothesis is consistent with other labs.  This is the step I am least clear about how it should work.

As for my approach to step 2, I recently wrote up a short <a href="https://docs.google.com/document/d/1rFJSsYaCn1ZP1DnZ8fDGUiUdtt2FUV9sTrSiS4qudck/edit">preview article</a>.  If you have any comments, you should be able to make them in that document.

The tool is written in Javascript.  Because it needs to make RESTful queries to the EteRNA domain, I'm currently using GreaseMonkey to get around cross site scripting restrictions.  Once it gets to the point it is worth releasing on the general user, I'm thinking I'll set up a Google Apps server to act as a proxy between the player and the EteRNA servers.  Or, perhaps I could work out something with the EteRNA people so that players could get the lab data directly from the EteRNA server.


In the meantime, if there are any user/developers who know how to set up and use Greasemonkey without a lot of support on my part, I would be happy to share a snapshot of my current work in progress.  Just PM me.