IBD Microbiome Knowledge Graph using Neo4j

Knowledge graphs are increasingly being used to capture biological data due to its adaptability towards inter-related datasets, flexibility, and ability to handle large amounts of data. One of the key areas of application of knowledge graphs is in microbiome data analysis since microbiome data is usually generated in massive amounts. In order to explore the construction of microbiome knowledge graphs, we use data from the Inflammatory Bowel Disease (IBD) study done as part of the Human Microbiome Project (Phase 2). IBD, a chronic inflammatory disease of the human gut, is a growing cause for concern in Europe. Characterized by symptoms such as ulcers, diarrhea and gut inflammations, a key factor known to be majorly affected in IBD patients is the diversity in gut microbiota. There is substantial evidence that few bacterial families are present in highly different proportions between healthy and IBD patient samples. This variation in microbial populations could be linked to differential interaction between the microbiota and mucosal immune system, causing dysbiosis. Here, we have explored the relationships between bacterial families and their involvement in few pathways & enzymes in healthy and IBD cohorts based on differential abundances. We have built a knowledge graph linking taxonomical and functional factors to visualize and analyze the diversity of bacterial families & expression of pathways and enzymes in between cohorts. Abundance matrix tables were extracted, pre-processed and loaded onto Neo4j for pathway, enzyme and taxonomical abundances. The knowledge graph was then explored along the following lines:

Microbial diversity among cohorts
Microbial diversity among cohorts stratified by metadata
Microbial expression profile for pathways
Enzyme expression in most abundant pathway
Enzyme expression in butyrate production pathway

Data Source:

The Human Microbiome Project, and specifically the IBD cohort was selected. Abundance matrix files were downloaded from https://portal.hmpdacc.org/.

Data Model:

An Entity-Relationship model was constructed as below:

Data Extraction & Pre-processing:

Since abundance matrix files were present in .biom format, they were converted to .tsv format. To pre-process them to make them suitable for loading onto Neo4j, python scripts were written.

Knowledge Graph:

The final schema of the KG built on Neo4j looked like:

For more information on any of the above steps, please contact myself ([email protected]) or Natalja Kurbatova ([email protected]) from Zifo RnD Solutions.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
data		data
import_data		import_data
Cypher queries		Cypher queries
Enzyme_preprocessing.py		Enzyme_preprocessing.py
Master_List.xlsx		Master_List.xlsx
Pathway_preprocessing.py		Pathway_preprocessing.py
README.md		README.md
Steps-to-recreate-MicrobiomeKG.docx		Steps-to-recreate-MicrobiomeKG.docx
Taxonomy_preprocessing.py		Taxonomy_preprocessing.py
microbiome_env.yaml		microbiome_env.yaml
prepare_biom_file_from_url.py		prepare_biom_file_from_url.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IBD Microbiome Knowledge Graph using Neo4j

Data Source:

Data Model:

Data Extraction & Pre-processing:

Knowledge Graph:

About

Contributors 2

Languages

harini1111/Microbiome-project

Folders and files

Latest commit

History

Repository files navigation

IBD Microbiome Knowledge Graph using Neo4j

Data Source:

Data Model:

Data Extraction & Pre-processing:

Knowledge Graph:

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages