I chose to implement this tool as an applet because it
enables easy access of anyone from anywhere to this tool. This is the first
implementation of this algorithm (as far as I know) and since this algorithm
is the fastest for this problem, I consider this easy access as one of
the most important properties of this tool. The main disadvantages of applets
(related to applications) are the security limitations as file access (no
file access) or committing dos command lines. For this reason I added the
statistical mode in order to enable good estimation of the reversal distance
of a permutation as a function of its size.
The implementation was in Java and the code is available
here. The main reasons for choosing Java is the comfortable GUI implementation
and the simplicity of generating an applet. The most problematic aspect
of Java, the efficiency, became almost irrelevant by the efficiency of
the algorithm and the (relatively) small size of the expected input (not
genomic pure data of millions or billions of base pairs but hundreds or
thousands of genomic regions represented as a signed permutation). The
running time of the algorithm on these input sizes is measured in seconds
(for the fast mode).
Graphical User Interface classes
These classes aren't very interesting. They contain the
GUI implementation and are large and simple)
Algorithm and its auxiliary classes
- This class contains the implementation of the user interface for using
the algorithm. This class presents the main dialog window and enable the
high level functionality of the tool (choose a mode, choose a permutation,
run the algorithm, etc.)
- This class contains the implementation of the "low" functionality of
the tool, i.e., the detailed analysis of a specific permutation that includes
a list of different properties and a drawing of the breakpoints graph.
Not in any other group classes
- a high level implementation of the Genome Rearrangement algorithm. Most
of the interaction between the GUI classes and the algorithm classes is
through this class. This class "knowledge" is the main flow of the algorithm
(like clear the hurdles or find a happy clique). It uses OVGraph class
to run the algorithm.
- The Overlap Graph class. This is the main class of the algorithm (low
level implementation of the algorithm). This class does most of the work
and has very specific functionality. This class isn't independent and it
can be used only with its "interface" - GR class.
- A connected component in the Overlap Graph
- a permutation
- a small class of a reversal
- implementation of the statistical mode of the program (used by GR_applet)
As I stressed before, the tool can't run in batch mode
because of the security restrictions that applets have (after all, no one
wants that something (an applet in our case) that exists in a web page
that he accidentally loaded, will run the command line "del C:\*.*" or
even read files and send them to its creator). The way to convert this
tool to a batch mode tool (that, for example, reads 10,000 signed permutations
in a specific format and run the algorithm on each one of them) is not
very complicated. It contains the following stages:
Build a reader class with the following methods: Reader(String
filename) that opens the file for reading, Readline() that reads the line
next to the one that was read in the last call and close() that closes
the file. The file can be built by a simple script in any script language.
Understand the interaction of the GR_applet class with GR
class. This is the most important stage of this conversion. GR supplies
a good interface and GR_applet code uses this interface. Getting a reversals
sequence is easy by constructing GR object and invoking GR::run(). Getting
properties of a permutation can be done by some simple methods of Permutation
class. Most of the available methods are used in the Statistics class (with
GR::getPermutation and then Permutation::getBreakpointsNumber, Permutation::getCyclesNumber,
...) and it is recommended to take a look on this section too.
Write a Java application that gets at least one parameter
(the file name) and contains a main loop that gets a permutation (through
the Reader) and handles it. This handling usually will contain running
the algorithm (with GR::run()) and get the Reversals arrays or analysis
of the properties of the overlap graph (with GR::getPermutation and then
Permutation::getBreakpointsNumber, Permutation::getCyclesNumber, ...).
The tool supplies all the reasonable functionality I
could think of. Most of the possible improvements are in the GUI aspect
and some in the efficiency aspect.
As always when building a program without specific targets
most of the functionality of the tool was added in the last stage of the
programming. Since the program shield was well formed (I spent most of
the time building and improving it) it became pretty easy to improve the
functionality in short time, especially in the GUI side. Unfortunately,
I didn't have an unlimited amount of time and despite improving the project
became easy the project entered its submission stage I made only a part
of the possible improvement I could think of.
The tool has 2 main windows with partially overlapping functionality.
As I see it the best way to implement the tool is to divide the functionality
to 2 parts. The second one will contain all the functionality that is available
on one permutation (including user reversal) and will be the in the Analysis
Frame. The first one will contain the rest (i.e., the functionality until
there is an initial permutation) and will be in the Applet. The reason
I didn't do it myself is that the analysis frame started as a simple report
text form and later I added the drawing and the buttons. When I found it
is very comfortable to run the algorithm through this frame it was already
too late to change it.
Enable the user to choose a reversal by clicking on 2 vertices.
Show the breakpoints graph.
In the current version the graph is drawn from scratch in
each change. I used double buffering mechanism in order to avoid flickering
but it takes too much time, especially for big graphs. It can be improved
by drawing the "delta" (symmetric differenc) instead of drawing from scratch.
The delta can be achieved by a new method of the OVGraph object (better
but more complicated) or by comparing the old graph to the new one (handle
only the GUI classes but less efficient). The second one will probably
be enough. When having the delta it is easy to "draw" white edges to erase
edges and colored edges for the new (and color changed) edges.
The same is true for the pop-up menus that show the legal
reversals the user can perform. In the current version the menus are emptied
and filled from scratch. It will be better if the menus will be updated
with the delta of the changes. I'm not sure it is possible in all cases
but at least reducing the elements in the menu will be quicker this way.
I used a simple algorithm (DFS) for dividing the overlap
graph to connected components. In the article of Berman and Hannenalli
there is a description of a faster sub algorithm for this problem. Implementing
this sub algorithm might reduce the worst case complexity of the algorithm
(from O(N^2) to O(N*a(N)+r*N) when r is the reversal
distance and a is Ackermann function). From my analysis of the algorithm's
work this improvement won't make a significant improvement of the tool
expected efficiency because most of the permutations (you can check it
by the statistic mode) os size N have a reversal distance that is very
close to N. Since it is very rare to find a permutation with a reversal
distance lower than N/2 the complexity isn't expected to improve significantly.