User:Omei/Cloud Lab Data Mining Tool: Difference between revisions

From Eterna Wiki
No edit summary
(Updated tool status)
(7 intermediate revisions by 4 users not shown)
Line 1: Line 1:
<h2>Cloud Lab Data Mining Tool</h2>
<h2>Cloud Lab Data Mining Tool</h2>
<p><span style="font-size: small;">At this point, the biggest need I feel in EteRNA is for a way to extract meaning from the results of thousands of lab designs that have been synthesized.&nbsp; Ideally, something addressing this need will be built into the EteRNA GUI.&nbsp; But I decided to just see what I could do to contribute ideas and a sample implementation.</span></p>
<p><span style="font-size: small;">At this point, the biggest need I feel in [[EteRNA]] is for a way to extract meaning from the results of thousands of [[lab]] designs that have been synthesized.&nbsp; Ideally, something addressing this need will be built into the EteRNA GUI.&nbsp; But I decided to just see what I could do to contribute ideas and a sample implementation.</span></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><span style="font-size: small;">I envision three major major parts.</span></p>
<p><span style="font-size: small;">I envision three major major parts.</span></p>
<ol>
<ol>
<li><span style="font-size: small;">A method for finding the labs that are relevant to a particular question.&nbsp; As a starting point, this might take the form of searching by sequence or structure motif.&nbsp; Existing examples are</span><span style="font-size: small;"> <a href="http://cossmos.slu.edu/search.php">CoSSMoS</a> and <a href="http://rmdb.stanford.edu/repository/advanced_search/">RMDB</a>.</span></li>
<li><span style="font-size: small;">A method for finding the labs that are relevant to a particular question.&nbsp; As a starting point, this might take the form of searching by [[sequence]] or [[RNA Structure|structure]] motif.&nbsp; Existing examples are</span><span style="font-size: small;"> <a href="http://cossmos.slu.edu/search.php">CoSSMoS</a> and <a href="http://rmdb.stanford.edu/repository/advanced_search/">RMDB</a>.</span></li>
<li><span style="font-size: small;">A method of interactively viewing the SHAPE results for all the synthesized designs in a single lab.&nbsp; This is the part I am actively working on.</span></li>
<li><span style="font-size: small;">A method of interactively viewing the [[SHAPE]] results for all the synthesized designs in a single lab.&nbsp; This is the part I am actively working on.</span></li>
<li><span style="font-size: small;">A method for integrating the results of the previous two steps.&nbsp; For example, after finding a relevent lab (step 1) and developing a hypothesis based on it (step 2), it would be nice to be able to gather up all "analogous" data from any other relevent labs, to see if the hypothesis is constent with other labs.&nbsp; This is the step I am least clear about how it should work.</span></li>
<li><span style="font-size: small;">A method for integrating the results of the previous two steps.&nbsp; For example, after finding a relevent lab (step 1) and developing a hypothesis based on it (step 2), it would be nice to be able to gather up all "analogous" data from any other relevent labs, to see if the hypothesis is consistent with other labs.&nbsp; This is the step I am least clear about how it should work.</span></li>
</ol>
</ol>
<p><span style="font-size: small;">As for my approach to step 2, I recently wrote up a short <a href="https://docs.google.com/document/d/1rFJSsYaCn1ZP1DnZ8fDGUiUdtt2FUV9sTrSiS4qudck/edit">preview article</a>.&nbsp; If you have any comments, you should be able to make them in that document.</span></p>
<p><span style="font-size: small;">As for my approach to step 2, I recently wrote up a short <a href="https://docs.google.com/document/d/1rFJSsYaCn1ZP1DnZ8fDGUiUdtt2FUV9sTrSiS4qudck/edit">preview article</a>.&nbsp; If you have any comments, you should be able to make them in that document.</span></p>
<p><span style="font-size: small;">The tool is written in Javascript.&nbsp; Because it needs to make RESTful queries to the EteRNA domain, I'm currently using GreaseMonkey to get around cross site scripting restrictions.&nbsp; Once it gets to the point it is worth releasing on the general user, I'm thinking I'll set up a Google Apps server to act as a proxy between the player and the EteRNA servers.&nbsp; Or, perhaps I could work out something with the EteRNA people so that players could get the lab data directly from the EteRNA server.</span></p>
<p>&nbsp;</p>
<p><br /><span style="font-size: small;">In the meantime, if there are any user/developers who know how to set up and use Greasemonkey without a lot of support on my part, I would be happy to share a snapshot of my current work in progress.&nbsp; Just PM me.</span></p>
<p><span style="font-size: small;">The tool is written in HTML/Javascript and is loaded from the user's local disk.&nbsp; Because it needs to make RESTful queries to the EteRNA domain, I'm using YQL's JSONP capability to get around cross domain origin restrictions. &nbsp;I'm collecting ideas on</span><span style="font-size: small;"> </span><span style="font-size: small;">[[User:Omei/Making Cloud Lab Data Accessible|what to do]]</span><span style="font-size: small;"> </span><span style="font-size: small;">about this in the longer term.</span></p>
<p><br /><span style="font-size: small;"><span id="docs-internal-guid-30fd4901-aadf-f984-20a8-2af748d7e1b8"><span>You can download the most recent version of the Database Mining Tool .zip file from <a href="https://drive.google.com/folderview?id=0Bzf0qUriSfzWUmlvTU5mUTl0Rlk&amp;usp=sharing">https://drive.google.com/folderview?id=0Bzf0qUriSfzWUmlvTU5mUTl0Rlk&amp;usp=sharing</a>. &nbsp;Eli and I are writing more documentation; you can see the current progress at&nbsp;<a href="https://docs.google.com/document/d/1f_jR9ydQWtMCZoKCSkhv-GSyZFnwoV50M9mIJqqj1T0/edit?usp=sharing">https://docs.google.com/document/d/1f_jR9ydQWtMCZoKCSkhv-GSyZFnwoV50M9mIJqqj1T0/edit?usp=sharing</a></span></span></span></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><span style="font-size: small;"><br /></span></p>
<p><span style="font-size: small;"><br /></span></p>
<p><span style="font-size: small;"><br /></span></p>
<p><span style="font-size: small;"><br /></span></p>

Revision as of 05:41, 21 August 2013

Cloud Lab Data Mining Tool

At this point, the biggest need I feel in EteRNA is for a way to extract meaning from the results of thousands of lab designs that have been synthesized.  Ideally, something addressing this need will be built into the EteRNA GUI.  But I decided to just see what I could do to contribute ideas and a sample implementation.

 

I envision three major major parts.

  1. A method for finding the labs that are relevant to a particular question.  As a starting point, this might take the form of searching by sequence or structure motif.  Existing examples are <a href="http://cossmos.slu.edu/search.php">CoSSMoS</a> and <a href="http://rmdb.stanford.edu/repository/advanced_search/">RMDB</a>.
  2. A method of interactively viewing the SHAPE results for all the synthesized designs in a single lab.  This is the part I am actively working on.
  3. A method for integrating the results of the previous two steps.  For example, after finding a relevent lab (step 1) and developing a hypothesis based on it (step 2), it would be nice to be able to gather up all "analogous" data from any other relevent labs, to see if the hypothesis is consistent with other labs.  This is the step I am least clear about how it should work.

As for my approach to step 2, I recently wrote up a short <a href="https://docs.google.com/document/d/1rFJSsYaCn1ZP1DnZ8fDGUiUdtt2FUV9sTrSiS4qudck/edit">preview article</a>.  If you have any comments, you should be able to make them in that document.

 

The tool is written in HTML/Javascript and is loaded from the user's local disk.  Because it needs to make RESTful queries to the EteRNA domain, I'm using YQL's JSONP capability to get around cross domain origin restrictions.  I'm collecting ideas on what to do about this in the longer term.


You can download the most recent version of the Database Mining Tool .zip file from <a href="https://drive.google.com/folderview?id=0Bzf0qUriSfzWUmlvTU5mUTl0Rlk&usp=sharing">https://drive.google.com/folderview?id=0Bzf0qUriSfzWUmlvTU5mUTl0Rlk&usp=sharing</a>.  Eli and I are writing more documentation; you can see the current progress at <a href="https://docs.google.com/document/d/1f_jR9ydQWtMCZoKCSkhv-GSyZFnwoV50M9mIJqqj1T0/edit?usp=sharing">https://docs.google.com/document/d/1f_jR9ydQWtMCZoKCSkhv-GSyZFnwoV50M9mIJqqj1T0/edit?usp=sharing</a>