User:Omei/Making Cloud Lab Data Accessible: Difference between revisions

From Eterna Wiki
(Created initial page)
 
(Initial thoughts)
Line 1: Line 1:
<h1 id="firstHeading" class="firstHeading"><span>Making Cloud Lab Data Accessible</span></h1>
<h1 id="firstHeading" class="firstHeading"><span>Making Cloud Lab Data More Accessible</span></h1>
<p>Cloud lab data is currently freely available to anyone on the web.&nbsp; For example, the URL http://eterna.cmu.edu//get/?puznid=2426188&amp;type=solutions returns all the result data that is displayed (in a more user-friendly way) at&nbsp; http://eterna.cmu.edu/game/browse/2426188/ .&nbsp; In fact, the first URL is precisely how the ActionScript downloaded by the second URL gets its data.</p>
<p>The [[User:Omei/Cloud Lab Data Mining Tool|Cloud Lab Data Mining Tool]] I am currently developing gets the same data, but allows it to be filtered, consolidated and displayed in a variety of different ways, to allow a player to focus in on whatever aspect of the results s/he is interested in.&nbsp; The tool is implemented entirely in HTML and Javascript, so it is inherently open source.&nbsp; Assuming a significant number of players find it useful, I would be happy to have it eventuall integrated into the Eterna site.</p>
<h2>Current Situation</h2>
<p>However, there is a complication in writing this kind of application.&nbsp; All standard Web browsers enforce a <a href="http://en.wikipedia.org/wiki/Same_origin_policy">same origin policy</a> that, by default, restricts HTML pages on a local drive from directly accessing data from the Web.&nbsp; As a developer, it is straightforward for me to use developer tools to work around this restriction.&nbsp; (I use Greasemonkey for this purpose.)&nbsp; But having gotten to the point where I am ready for a few beta testers to try the tool out, it isn't realistic to expect the typical player to be able to install and configure Greasemonkey.</p>
<h2>Things EteRNA could do to make the data more accessible</h2>
<p>The default <a href="http://en.wikipedia.org/wiki/Same_origin_policy">same origin policy</a> was adopted by browser makers as a quick way to close off lots of vunerabilities created by "dynamic HTML".&nbsp; But by itself, it is excessively broad.&nbsp; If there were no mechanism for allowing more open data sharing, the browser-centric Web as we know it woulcn't exist.&nbsp; The most straightforward way of expanding the possibilites for user-developed scripts that take advantage of a general purpose browser (as opposed to the very limited in-game scripting capability) would be for the EteRNA server to acknowlege that there is nothing proprietary obout the lab data and "give permission" for it to be used in browser-based user-written scripts that aren't being served from the eterna.cmu.edu domain.&nbsp; There are two standard ways of dong this:</p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/JSONP">JSONP</a>: With this technique, the server adds an additional (optional) callback parameter to the GET query, e.g. callback=foo.&nbsp; The server responds to this by returning an augmented JSON string consisting of "foo(" + &lt;data JSON&gt; + ")".&nbsp; This modified string gives Javascript in the browser access to the result of the query throught the &lt;script&gt; tag.</li>
<li><a href="http://en.wikipedia.org/wiki/Cross-origin_resource_sharing">CORS</a>: With this technique, the server sends the HTML header "Access-Control-Allow-Origin: *" as part of the reponse to the GET query.&nbsp; This header tells the browser that it is OK for Javascript in the browser to see the results of an XMLHttpRequest.</li>
</ul>
<p>If for some reason, the EteRNA developers would prefer not to do this, another option would be</p>
<ul>
<li>Host user-written HTML and scripts on the eterna.cmu.edu (or some other domain that also serves the /get/ requests).</li>
</ul>
<h2>Things that could be done without EteRNA involvement</h2>
<ul>
<li>Find (or write) a flash/flex object that exposes the flash implementation of XMLHttpRequest.&nbsp; Since flash has its own security model, this seems like it should be workable.</li>
<li>Create a server application, such as a GoogleAppEngine server, that acts as a proxy for lab data queries.&nbsp; This was my original long-term plan.</li>
<li>Use Yahoo's YQL Open Data Tables to cache the data for all the labs and allow SQL-like queries over all labs at once.&nbsp; This would be perfect for analyzing the barcode hairpin.</li>
</ul>

Revision as of 23:12, 18 July 2013

Making Cloud Lab Data More Accessible

Cloud lab data is currently freely available to anyone on the web.  For example, the URL http://eterna.cmu.edu//get/?puznid=2426188&type=solutions returns all the result data that is displayed (in a more user-friendly way) at  http://eterna.cmu.edu/game/browse/2426188/ .  In fact, the first URL is precisely how the ActionScript downloaded by the second URL gets its data.

The Cloud Lab Data Mining Tool I am currently developing gets the same data, but allows it to be filtered, consolidated and displayed in a variety of different ways, to allow a player to focus in on whatever aspect of the results s/he is interested in.  The tool is implemented entirely in HTML and Javascript, so it is inherently open source.  Assuming a significant number of players find it useful, I would be happy to have it eventuall integrated into the Eterna site.

Current Situation

However, there is a complication in writing this kind of application.  All standard Web browsers enforce a <a href="http://en.wikipedia.org/wiki/Same_origin_policy">same origin policy</a> that, by default, restricts HTML pages on a local drive from directly accessing data from the Web.  As a developer, it is straightforward for me to use developer tools to work around this restriction.  (I use Greasemonkey for this purpose.)  But having gotten to the point where I am ready for a few beta testers to try the tool out, it isn't realistic to expect the typical player to be able to install and configure Greasemonkey.

Things EteRNA could do to make the data more accessible

The default <a href="http://en.wikipedia.org/wiki/Same_origin_policy">same origin policy</a> was adopted by browser makers as a quick way to close off lots of vunerabilities created by "dynamic HTML".  But by itself, it is excessively broad.  If there were no mechanism for allowing more open data sharing, the browser-centric Web as we know it woulcn't exist.  The most straightforward way of expanding the possibilites for user-developed scripts that take advantage of a general purpose browser (as opposed to the very limited in-game scripting capability) would be for the EteRNA server to acknowlege that there is nothing proprietary obout the lab data and "give permission" for it to be used in browser-based user-written scripts that aren't being served from the eterna.cmu.edu domain.  There are two standard ways of dong this:

  • <a href="http://en.wikipedia.org/wiki/JSONP">JSONP</a>: With this technique, the server adds an additional (optional) callback parameter to the GET query, e.g. callback=foo.  The server responds to this by returning an augmented JSON string consisting of "foo(" + <data JSON> + ")".  This modified string gives Javascript in the browser access to the result of the query throught the <script> tag.
  • <a href="http://en.wikipedia.org/wiki/Cross-origin_resource_sharing">CORS</a>: With this technique, the server sends the HTML header "Access-Control-Allow-Origin: *" as part of the reponse to the GET query.  This header tells the browser that it is OK for Javascript in the browser to see the results of an XMLHttpRequest.

If for some reason, the EteRNA developers would prefer not to do this, another option would be

  • Host user-written HTML and scripts on the eterna.cmu.edu (or some other domain that also serves the /get/ requests).

Things that could be done without EteRNA involvement

  • Find (or write) a flash/flex object that exposes the flash implementation of XMLHttpRequest.  Since flash has its own security model, this seems like it should be workable.
  • Create a server application, such as a GoogleAppEngine server, that acts as a proxy for lab data queries.  This was my original long-term plan.
  • Use Yahoo's YQL Open Data Tables to cache the data for all the labs and allow SQL-like queries over all labs at once.  This would be perfect for analyzing the barcode hairpin.