This software is an efficient implementation of the Seriation problem which 'finds a suitable linear order for a set of objects'. It has been used to order a network of proteins such that 'related' nodes are closer in the other.
The Seriation Package was developed by Felipe Kuentzer, in collaboration with Douglas G. Ávila, Alexandre Pereira, Gabriel Perrone, Samoel da Silva, Alexandre Amory, and Rita de Almeida.
Contact information: Alexandre Amory (amamory @ gmail com)
The input file is a textual file describing an undirected network of nodes (in our examples the nodes are protein names). Example:
L7007 L7008 L7008 L7007 L7010 L7011 L7011 L7010 L7014 L7015 L7015 L7014 L7017 Z1275
In the tab Files you can find networks for different species such as Escherichia coli, Mus musculus, Saccharomyces cerevisiae, Homo sapiens, among others.
The output is a text file with the order of the network nodes. Example:
Protein dim1 Z5822 0 Z5823 1 Z2911 2 Z2910 3 Z2909 4 Z4123 5 Z4124 6 Z3105 7 Z3106 8 ...
The following image represents the Homo sapiens network with a random ordering.
The next image represents the Homo sapiens network 'seriated'.
The Seriation Package is developed in C and tested on Ubuntu 14.04.
- Download the package .
- Is recommended to update your packages before the instalation:
sudo apt-get install update
- To install, you can double-click it or execute:
sudo dpkg -i cfm-seriation_1.0-1_amd64.deb
- In case of missing dependencies, try:
sudo apt-get install -f
- To unistall:
sudo dpkg -r cfm-seriation
- this distribution has the following files
/usr/share/cfm-seriation/bin/ Executable file /usr/share/cfm-seriation/etc/ Auxiliar used to plot charts with GNUPLOT /usr/share/cfm-seriation/data/ Biological input networks /usr/share/cfm-seriation/src/ Source code in C
sudo apt-get install git
git clone https://github.com/amamory/seriation.git
cd seriation
gcc cfm-seriation.c -lm -lpthread -lrt -o cfm-seriation
type 'cfm-seriation' to show the options:
cfm-seriation Usage: cfm-seriation [OPTION...] Seriation Parameters: f=[NETWORK FILE].dat Network file path name o=[ORDER FILE].dat Apply initial order i=[INTERVAL] Number of isothermal steps m=[STEPS] Number of steps c=[FACTOR] Cooling factor a=[ALPHA] Alpha value p=[PERCENTUAL] Percentual energy for initial temperature s=[SEDD] Random seed P Plot graphs v Generate video
type to execute the seriation. This process can take about 12 minutes, depending on the CPU.
./cfm-seriation f=data/Homo_sapiens.dat m=3000 P
In case you want a video of the process, type to execute the seriation.
./cfm-seriation f=/usr/share/cfm-seriation/data/Homo_sapiens.dat m=3000 P v
This will consume some extra time.
Usage: cfm-seriation [OPTION...] Seriation Parameters: f=[NETWORK FILE].dat Network file path name o=[ORDER FILE].dat Apply initial order i=[INTERVAL] Number of isothermal steps m=[STEPS] Number of steps c=[FACTOR] Cooling factor a=[ALPHA] Alpha value p=[PERCENTUAL] Percentual energy for initial temperature s=[SEDD] Random seed P Plot graphs v Generate video Reading file... Proteins: 9684 Interactions: 163509 Applying random order... Saving and plotting initial order... INITIAL Energy: 4123514310 Ordering... 100% [====================================================================================================]
For a quicker test you can execute smaller dataset, like the Escherichia Coli.
./cfm-seriation f=Escherichia_coli.dat
Reading file... Proteins: 3598 Interactions: 13687 Applying random order... Saving initial order... INITIAL Energy: 129449102 Ordering... 100% [====================================================================================================] FINAL Energy: 129025784 Saving final order... Done!
The results are save in a different directory for each execution.
-
Optimization and Analysis of Seriation Algorithm for Ordering Protein Networks. This paper describes the optimizations implemented in this package.
-
Felipe's Master Thesis Otimização e análise de algoritmos de ordenamento de redes proteicas. Full description of the optimizations implemented in this package (in Portuguese).
The source code is distributed under the terms of the GNU General Public License v3 GPL.
If you are using this package on your research, please cite our paper:
KUENTZER, Felipe A. et al. Optimization and analysis of seriation algorithm for ordering protein networks. In: IEEE International Conference on Bioinformatics and Bioengineering (BIBE), 2014. p. 231-237.
If you are using the Seriation Package, please send an email to alexandre.amory at pucrs.br so we can update this list of users: