Data science and machine learning tutorial using heart disease as an application. Work in progress.
Author: Chris Kennedy (ck37.com)
Run interactively in RStudio Cloud
- Analysis
- 1-clean-merge.Rmd - import and merge raw data, extreme values, constant predictors (output)
- 2-clean-impute.Rmd - analyze missingness, impute missing values, histogram condense (output)
- 3-clean-finalize.Rmd - factors to indicators, collinearity, check invertability (output)
- 4-explore.Rmd - exploratory data analysis
- 5-model.Rmd - random forest convergence, SuperLearner ensembling, and nested SuperLearner evaluation (output)
- 6-calibration.Rmd - calibration of prediction
- 7-interpret.Rmd - variable importance and accumulated local effects (output)
- Directories
Ideally integrate renv, use slido, and add drake.
Kennedy, Chris J., Mark, Dustin, Huang, Jie, Reed, Mary. (2020). "Development of a nested ensemble machine learning prognostic model for predicting 60-day risk of major adverse cardiac events in adults with chest pain." Google Slides
Janosi, A., Steinbrunn, W., Pfisterer, M., & Detrano, R. (1988). Heart disease data set. The UCI KDD Archive.
Boehmke, B., & Greenwell, B. M. (2019). Hands-On Machine Learning with R. CRC Press. (Free online)
Molnar, C. (2020). Interpretable machine learning. Lulu.com. (Free online)
Riley, R. D., van der Windt, D., Croft, P., & Moons, K. G. (Eds.). (2019). Prognosis Research in Healthcare: concepts, methods, and impact. Oxford University Press. (Amazon)
Steyerberg, E. W. (2019). Clinical prediction models. Springer International Publishing. (Amazon)
If you find this tutorial useful please cite it as noted below:
Kennedy, Chris J. (2020). "Tutorial on predictive modeling in R." GitHub repository. https://github.com/ck37/Predictive-Modeling-in-R