Python package for working with AAIndex database (https://www.genome.jp/aaindex/)
As stated on the AAIndex website - The AAindex is a database of numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids. AAindex consists of three sections now: AAindex1 for the amino acid index of 20 numerical values, AAindex2 for the amino acid mutation matrix and AAindex3 for the statistical protein contact potentials. All data are derived from published literature [1].
This aaindex Python software package is a very lightweight way of accessing the data represented in the various AAIndex databases. Minimal requirements and external libraries are required to use the package and any record and its associated data can be accessed in one line. Currently the software supports the AAIndex1 database with plans to include AAIndex 2 & 3 in the future. The format of an AAIndex1 record can be seen below.
Install the latest version of aaindex using pip:
pip3 install aaindex
Install by cloning repository:
git clone https://github.com/amckenna41/aaindex.git
python3 setup.py install
The AAIndex module is made up of the AAIndex class which itself has all the functions/attributes of the package, so when importing the module you have to import the class as well.
from aaindex.aaindex import aaindex
The AAIndex class offers diverse functionalities for obtaining any element from any record in the database. Each record is stored in json format in a class attribute called aaindex_json. You can search for a particular record by its index/record code, description or reference. You can also get the index category, and importantly its associated amino acid values.
from aaindex.aaindex import aaindex
full_record = aaindex['CHOP780206'] #get full AAI record
''' Above statement will return ->
{'description': 'Normalized frequency of N-terminal non helical region (Chou-Fasman, 1978b)', 'notes': '', 'refs': "Chou, P.Y. and Fasman, G.D. 'Prediction of the secondary structure of proteins from their amino acid sequence' Adv. Enzymol. 47, 45-148 (1978); Kawashima, S. and Kanehisa, M. 'AAindex: amino acid index database.' Nucleic Acids Res. 28, 374 (2000).", 'values': {'-': 0, 'A': 0.7, 'C': 0.65, 'D': 0.98, 'E': 1.04, 'F': 0.93, 'G': 1.41, 'H': 1.22, 'I': 0.78, 'K': 1.01, 'L': 0.85, 'M': 0.83, 'N': 1.42, 'P': 1.1, 'Q': 0.75, 'R': 0.34, 'S': 1.55, 'T': 1.09, 'V': 0.75, 'W': 0.62, 'Y': 0.99}}
'''
#get individual elements of AAIndex record
record_values = aaindex['CHOP780206']['values']
record_description = aaindex['CHOP780206']['description']
record_references = aaindex['CHOP780206']['refs']
"""
Categories:
Each AAIndex record is classified into 1 of 8 categories: Charge, Composition, Flexibility, Geometry, Hydrophobic, Meta,
Observable, Polar and Secondary Structure. The record categories are parsed from the aaindex_categories.txt file and can be accessed for each record via the get_category_from_record() function.
"""
category = aaindex.get_category_from_record('CHOP780206')
#get total number of records in AAI database
print(aaindex.get_num_records())
#get list of all AAIndex record names
print(aaindex.get_record_names())
/tests
- unit and integration tests for aaindex package./aaindex
- source code and all required external data files for package./images
- images used throughout README.
To run all tests, from the main aaindex folder run:
python3 -m unittest discover
To run main test module, from the main aaindex folder run:
python -m unittest tests.test_aaindex -v
If you have any questions or comments, please contact [email protected] or raise an issue on the Issues tab.