User:ElNando888/WikiGetSat/Ideas/Sequences sorting in labs

From Eterna Wiki

< User:ElNando888‎ | WikiGetSat‎ | Ideas

Revision as of 19:29, 12 June 2018 by LFP6 (talk | contribs) (Formatting fix)

Bottom-level, the discussion.

Compare with https://getsatisfaction.com/eternagame/topics/sequences_sorting_in_labs

One beautiful feature would be the possibility to wikify those contents.

 

----

If I'm not mistaken, sequences in labs can be sorted, and the algorithm currently in use seems to be the <a rel="nofollow" href="http://en.wikipedia.org/wiki/Hamming_distance">Hamming distance</a>.

I'd like to propose a new sorting algorithm (which I dubbed “LDq9”), based on a <a rel="nofollow" href="http://en.wikipedia.org/wiki/Lee_distance">Lee distance</a> metric with a pseudo-alphabet of size 9 (or more). An example mapping would be:

A.G.-.-.U.C.-.-.-
0.1.2.3.4.5.6.7.8

Which would result in following specific distances:

A:G = 1
U:C = 1

G:C = 4
G:U = 3
A:U = 4
A:C = 4

The basic idea simply being that, changes within the same nucleotide classes (purines or pyrimidines) represent a short distance, while a change of class represent a larger jump.

I believe that this would give a somewhat better view of the similarity of sequences, specially in the context of switches.

-- ElNando888

----

Nice Idea.

-- jandersonlee

----

Hmm. Suppose I change a GC bond to CG. How does that get scored? And should it differ if it's a switch lab or not?

-- jandersonlee

----

GC to CG would be a +8 step.
I don't think the metric should change between static labs and switch ones, but this idea of mine may prove making little difference with a simple Hamming in the case of static target structures.
For switches, I'm almost convinced that this sorting would be a lot more accurate.

-- ElNando888

----

Worth a try if the coding isn't too much.

-- eternacac

----