Skip to content

adhal007/OmixHub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

OmixHub is a platform that interfaces with GDC using python to help users to apply ML based analysis on different sequencing data. Currently we support only for RNA-Seq based datasets from genomic data commons (GDC)

  1. Cohort Creation of Bulk RNA Seq Tumor and Normal Samples from GDC.

  2. Bioinformatics analysis:

    1. Application of PyDESeq2 and GSEA in a single pipeline.
  3. Classical ML analysis:

    1. Applying clustering, supervised ML and outlier sum statistics.
  4. Custom API Connections:

    1. Search and retrieval of Cancer Data cohorts from GDC using complex json filters (Methods in src.Connectors for GDC API search and retrieval using custom queries)
    2. Interacting with MongoDB database in a pythonic manner (DOCS coming soon).
    3. Interacting with Google cloud BigQuery in a pythonic manner (DOCS coming soon).

GETTING STARTED:

  1. Clone the repository git clone https://github.com/adhal007/OmixHub.git
  2. Create the correct conda enviroment for OmixHub: conda env create -f environment.yaml

Applications

  1. RNA Seq Cohort Creation of tumor and normal samples by primary site
    1. Example Jupyter Notebook
    2. Code:
import grequests
import src.Engines.gdc_engine as gdc_engine
from importlib import reload
reload(gdc_engine)

## Create Dataset for differential gene expression
rna_seq_DGE_data = gdc_eng_inst.run_rna_seq_data_matrix_creation(primary_site='Kidney', downstream_analysis='DE')

## Create Dataset for machine learning analysis
rna_seq_DGE_data = gdc_eng_inst.run_rna_seq_data_matrix_creation(primary_site='Kidney', downstream_analysis='ML') 
  1. Differential gene expression(DGE) Gene set enrichment analysis(GSEA) for tumor vs normal samples
    1. Example jupyter notebook
  2. Using GRADIO App for DGE GSEA:
    1. Currently this is restricted to Users. If you want to try this ou or contribute to this reach out to me via [email protected] with your interest.
    2. Running the app:
      1. After completing the steps in getting started, follow the next steps
      2. Run gradio app python3 app_gradio.py
      3. Check out the app navigation documentation.

ADDITIONAL CODE DOCS:

References:

  1. Characterizing tumor toxicity in Gene therapy targets from Bulk RNA-Sequencing
  2. Bayesian Framework for identifying gene expression outliers in individual sample of RNA-Seq data

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published