Skip to content

A data mining project focused on discovering complex protein expression correlations using Frequent Itemset Mining on data from cancer patients.

License

Notifications You must be signed in to change notification settings

Abraxos/fim_cancer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fim_cancer

A data mining project focused on discovering complex protein expression correlations using Frequent Itemset Mining on data from cancer patients.

Controller

The overarching controller is responsible for data reading, preprocessing, and feeding our subroutines. This means we need to take the raw data files as presented by TCPA or cBioportal and scrub them so that they can fed into our modules. The pipeline is as follows, we parse out the proteins, patients, and transactions data from the reduced expressions data (we remove irrelevant columns). From here we run fim on the transactions and map the result from indices to names. Fim will give us our discretized transactions that then needs to be formatted into a table structure. We also need to produce a table from our clinical data. Once we have our in-memory tables we simply run the confirmer of our finder in a loop over all the solutions (frequent item sets). As we discover z-scores we can do the ranking of the sets in place or after we are done looping. We can output our result either permanent storage or house them in memory.

c=controller('examples/data/TCGA-THCA-L3-S54_reduced.csv','examples/data/thca_tcga_clinical_data.tsv',[4,5,6])
print(c.ranks) 

About

A data mining project focused on discovering complex protein expression correlations using Frequent Itemset Mining on data from cancer patients.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published