Source code and data presented in "Causal reasoning over knowledge graphs leveraging drug-perturbed and disease-specific transcriptomic signatures for drug discovery".
data
: contains the four transcriptomic datasets used in the paper as well as the two KGs. Furthermore, it contains the drug-disease pairs studied in clinical trials which are used as positive pairs for the validation of the RPath algorithm.notebooks
: contains the code to run the algorithm and its validation as well as its analysis.
Datasets are publically available and can be directly downloaded from
-
Data requirements:
- Knowledge graphs - You would need a TSV file indicating the triple information with columns: source, target, and polarity as shown here
- Transcriptomic data - You need to have the data in form of a python dictionary such that the key is the gene name or identifier and value is 1, -1, or 0 based on whether the gene is under- or over-expressed. Here, we make use of a dictionary of dictionary where the key for the outer dictionary is the chemical or disease causing the effect and the inner dictionary is the gene identifier and its expression value as shown here.
-
Model usage:
- We first load the data sets mentioned above i.e. the knowledge graph and the transcriptomic data (Cell #3 and #4 from notebook)
- Ensure that the transcriptomic datasets contains entries that are in your knowledge graph (Cell #5 from notebook)
- Call the
get_validated_paths()
method and pass the following arguments:- directed graph = knowledge graph.
- source = source node (here chemical node).
- target = target node (here disease node).
- all paths = list of all paths between the source and target node. Ensure that the path list contains only the nodes and no relation information
- drug_dict = Dictionary of the transcriptomic data for the drug/source of interest.
- disease_dict = Dictionary of the transcriptomic data for the disease/target of interest.
(Cell #8 from notebook)
The overall model usage example is shown in notebook 4
If you have found RPath useful in your work, please consider citing:
Domingo-Fernandez, D., Gadiya, Y., Patel, A., Mubeen, S., Rivas-Barragan, D., Diana, C., B Misra, B., Healey, D., Rokicki, J., Colluru, V. (2022). Causal reasoning over knowledge graphs leveraging drug-perturbed and disease-specific transcriptomic signatures for drug discovery. PLoS computational biology, 18(2): e1009909.DOI