fim_cancer

A data mining project focused on discovering complex protein expression correlations using Frequent Itemset Mining on data from cancer patients.

Controller

The overarching controller is responsible for data reading, preprocessing, and feeding our subroutines. This means we need to take the raw data files as presented by TCPA or cBioportal and scrub them so that they can fed into our modules. The pipeline is as follows, we parse out the proteins, patients, and transactions data from the reduced expressions data (we remove irrelevant columns). From here we run fim on the transactions and map the result from indices to names. Fim will give us our discretized transactions that then needs to be formatted into a table structure. We also need to produce a table from our clinical data. Once we have our in-memory tables we simply run the confirmer of our finder in a loop over all the solutions (frequent item sets). As we discover z-scores we can do the ranking of the sets in place or after we are done looping. We can output our result either permanent storage or house them in memory.

c=controller('examples/data/TCGA-THCA-L3-S54_reduced.csv','examples/data/thca_tcga_clinical_data.tsv',[4,5,6])
print(c.ranks)

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
cgpb_finder		cgpb_finder
examples		examples
fp_growth		fp_growth
logs		logs
trash		trash
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
controller.py		controller.py
ranks.txt		ranks.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fim_cancer

Controller

About

Releases

Packages

Contributors 2

Languages

License

Abraxos/fim_cancer

Folders and files

Latest commit

History

Repository files navigation

fim_cancer

Controller

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages