User:ElNando888/WikiGetSat/Ideas/Sequences sorting in labs: Difference between revisions

From Eterna Wiki
m (Rollback pre-spam)
m (Formatting fix)
Line 1: Line 1:
<p>&lt;p&gt;Bottom-level, the discussion.&lt;/p&gt;&lt;p&gt;Compare with https://getsatisfaction.com/eternagame/topics/sequences_sorting_in_labs&lt;/p&gt;&lt;p&gt;One beautiful feature would be the possibility to wikify those contents.&lt;/p&gt;&lt;p&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;----&lt;/p&gt;&lt;p&gt;If I'm not mistaken, [[sequence]]s in [[lab]]s can be sorted, and the algorithm currently in use seems to be the &lt;a rel="nofollow" href="http://en.wikipedia.org/wiki/Hamming_distance"&gt;Hamming distance&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;I'd like to propose a new sorting algorithm (which I dubbed &amp;ldquo;LDq9&amp;rdquo;), based on a &lt;a rel="nofollow" href="http://en.wikipedia.org/wiki/Lee_distance"&gt;Lee distance&lt;/a&gt; metric with a pseudo-alphabet of size 9 (or more). An example mapping would be:&lt;/p&gt;&lt;p&gt;&lt;code&gt;{{ntA}}.{{ntG}}.&lt;span&gt;{{AlignNt|-}}&lt;/span&gt;.&lt;span&gt;{{AlignNt|-}}&lt;/span&gt;.{{ntU}}.{{ntC}}.&lt;span&gt;{{AlignNt|-}}&lt;/span&gt;.&lt;span&gt;{{AlignNt|-}}&lt;/span&gt;.&lt;span&gt;{{AlignNt|-}}&lt;/span&gt;&lt;br /&gt;{{AlignNt|0}}.&lt;span&gt;{{AlignNt|1}}&lt;/span&gt;.&lt;span&gt;{{AlignNt|2}}&lt;/span&gt;.&lt;span&gt;{{AlignNt|3}}&lt;/span&gt;.&lt;span&gt;{{AlignNt|4}}&lt;/span&gt;.&lt;span&gt;{{AlignNt|5}}&lt;/span&gt;.&lt;span&gt;{{AlignNt|6}}&lt;/span&gt;.&lt;span&gt;{{AlignNt|7}}&lt;/span&gt;.&lt;span&gt;{{AlignNt|8}}&lt;/span&gt;&lt;/code&gt;&lt;/p&gt;&lt;p&gt;Which would result in following specific distances:&lt;/p&gt;&lt;p&gt;A:G = 1 &lt;br /&gt; U:C = 1&lt;/p&gt;&lt;p&gt;G:C = 4 &lt;br /&gt; G:U = 3 &lt;br /&gt; A:U = 4 &lt;br /&gt; A:C = 4&lt;/p&gt;&lt;p&gt;The basic idea simply being that, changes within the same [[nucleotide]]&nbsp; classes ([[purine]]s or [[pyrimidine]]s) represent a short distance, while a&nbsp; change of class represent a larger jump.&lt;/p&gt;&lt;p&gt;I believe that this would give a somewhat better view of the similarity of sequences, specially in the context of [[switch]]es.&lt;/p&gt;&lt;p&gt;-- [[User:ElNando888|ElNando888]]&lt;/p&gt;&lt;p&gt;----&lt;/p&gt;&lt;p&gt;Nice Idea.&lt;/p&gt;&lt;p&gt;-- [[User:jandersonlee|jandersonlee]]&lt;/p&gt;&lt;p&gt;----&lt;/p&gt;&lt;p&gt;Hmm. Suppose I change a [[GC Pair|GC bond]] to CG. How does that get scored? And should it differ if it's a switch lab or not?&lt;/p&gt;&lt;p&gt;-- [[User:jandersonlee|jandersonlee]]&lt;/p&gt;&lt;p&gt;----&lt;/p&gt;&lt;p&gt;GC to CG would be a +8 step. &lt;br /&gt; I don't think the metric should change between static labs and switch&nbsp; ones, but this idea of mine may prove making little difference with a&nbsp; simple Hamming in the case of static target structures. &lt;br /&gt; For switches, I'm almost convinced that this sorting would be a lot more accurate.&lt;/p&gt;&lt;p&gt;-- [[User:ElNando888|ElNando888]]&lt;/p&gt;&lt;p&gt;----&lt;/p&gt;&lt;p&gt;Worth a try if the coding isn't too much.&lt;/p&gt;&lt;p&gt;-- [[User:eternacac|eternacac]]&lt;/p&gt;&lt;p&gt;----&lt;/p&gt;</p>
<p>Bottom-level, the discussion.</p>
<p>Compare with https://getsatisfaction.com/eternagame/topics/sequences_sorting_in_labs</p>
<p>One beautiful feature would be the possibility to wikify those contents.</p>
<p>&nbsp;</p>
<p>----</p>
<p>If I'm not mistaken, [[sequence]]s in [[lab]]s can be sorted, and the algorithm currently in use seems to be the <a rel="nofollow" href="http://en.wikipedia.org/wiki/Hamming_distance">Hamming distance</a>.</p>
<p>I'd like to propose a new sorting algorithm (which I dubbed &ldquo;LDq9&rdquo;), based on a <a rel="nofollow" href="http://en.wikipedia.org/wiki/Lee_distance">Lee distance</a> metric with a pseudo-alphabet of size 9 (or more). An example mapping would be:</p>
<p><code>{{ntA}}.{{ntG}}.<span>{{AlignNt|-}}</span>.<span>{{AlignNt|-}}</span>.{{ntU}}.{{ntC}}.<span>{{AlignNt|-}}</span>.<span>{{AlignNt|-}}</span>.<span>{{AlignNt|-}}</span><br />{{AlignNt|0}}.<span>{{AlignNt|1}}</span>.<span>{{AlignNt|2}}</span>.<span>{{AlignNt|3}}</span>.<span>{{AlignNt|4}}</span>.<span>{{AlignNt|5}}</span>.<span>{{AlignNt|6}}</span>.<span>{{AlignNt|7}}</span>.<span>{{AlignNt|8}}</span></code></p>
<p>Which would result in following specific distances:</p>
<p>A:G = 1 <br /> U:C = 1</p>
<p>G:C = 4 <br /> G:U = 3 <br /> A:U = 4 <br /> A:C = 4</p>
<p>The basic idea simply being that, changes within the same [[nucleotide]] classes ([[purine]]s or [[pyrimidine]]s) represent a short distance, while a change of class represent a larger jump.</p>
<p>I believe that this would give a somewhat better view of the similarity of sequences, specially in the context of [[switch]]es.</p>
<p>-- [[User:ElNando888|ElNando888]]</p>
<p>----</p>
<p>Nice Idea.</p>
<p>-- [[User:jandersonlee|jandersonlee]]</p>
<p>----</p>
<p>Hmm. Suppose I change a [[GC Pair|GC bond]] to CG. How does that get scored? And should it differ if it's a switch lab or not?</p>
<p>-- [[User:jandersonlee|jandersonlee]]</p>
<p>----</p>
<p>GC to CG would be a +8 step. <br /> I don't think the metric should change between static labs and switch ones, but this idea of mine may prove making little difference with a simple Hamming in the case of static target structures. <br /> For switches, I'm almost convinced that this sorting would be a lot more accurate.</p>
<p>-- [[User:ElNando888|ElNando888]]</p>
<p>----</p>
<p>Worth a try if the coding isn't too much.</p>
<p>-- [[User:eternacac|eternacac]]</p>
<p>----</p>

Revision as of 19:29, 12 June 2018

Bottom-level, the discussion.

Compare with https://getsatisfaction.com/eternagame/topics/sequences_sorting_in_labs

One beautiful feature would be the possibility to wikify those contents.

 

----

If I'm not mistaken, sequences in labs can be sorted, and the algorithm currently in use seems to be the <a rel="nofollow" href="http://en.wikipedia.org/wiki/Hamming_distance">Hamming distance</a>.

I'd like to propose a new sorting algorithm (which I dubbed “LDq9”), based on a <a rel="nofollow" href="http://en.wikipedia.org/wiki/Lee_distance">Lee distance</a> metric with a pseudo-alphabet of size 9 (or more). An example mapping would be:

A.G.-.-.U.C.-.-.-
0.1.2.3.4.5.6.7.8

Which would result in following specific distances:

A:G = 1
U:C = 1

G:C = 4
G:U = 3
A:U = 4
A:C = 4

The basic idea simply being that, changes within the same nucleotide classes (purines or pyrimidines) represent a short distance, while a change of class represent a larger jump.

I believe that this would give a somewhat better view of the similarity of sequences, specially in the context of switches.

-- ElNando888

----

Nice Idea.

-- jandersonlee

----

Hmm. Suppose I change a GC bond to CG. How does that get scored? And should it differ if it's a switch lab or not?

-- jandersonlee

----

GC to CG would be a +8 step.
I don't think the metric should change between static labs and switch ones, but this idea of mine may prove making little difference with a simple Hamming in the case of static target structures.
For switches, I'm almost convinced that this sorting would be a lot more accurate.

-- ElNando888

----

Worth a try if the coding isn't too much.

-- eternacac

----