Dirty Data Project

Introduction

This project consists of tasks that aim to test approaches to data wrangling and cleaning. Each task has its own analysis document which answers various questions using cleaned data obtained by programmatically processing the raw data.

Two tasks were chosen:

Task 1 - Decathlon Data
Task 4 - Halloween Candy Data

Languages

The code is written in R and both tasks contain RStudio .Rproj files.

How to run

Both tasks require that a cleaning script is run prior to attempting to run the analysis

The cleaning script will be found at data_cleaning_scripts/cleaning.R

Open the RStudio .Rproj and run cleaning.R. This script will generate new clean CSV data files in the clean_data folder. If this step has completed successfully open the relevant task analysis .Rmd file in the documentation_and_analysis folder.

Data used

CodeClan provided test data to students but due to file size concerns the original source data is not included in this repository for task 4. Similarly the clean data generated by the cleaning script is not uploaded but can be generated from the code.

CodeClan staff can find the source data files in CodeClan repository dr22_classnotes/week_03/day_5/dirty_data_project_raw_data/candy_ranking_data

For those outside CodeClan the data can be obtained from the following sources

Task 1 - Decathlon

decathlon: Performance in decathlon (data).
Department of statistics and computer science, Agrocampus Rennes

N.B. A copy of this file is in task1/raw_data/decathlon.rds

Task 4 - Halloween Candy

So Much Candy Data, Seriously. University of British Columbia

N.B. Before attempting to run any cleaning/analysis scripts the three source data files required should be copied to folder task4/raw_data. Full details in the analysis document.

Packages

The following R packages are required to run the code. The version numbers used at the time of the original project are shown.

Task 1

Package	Version used for analysis
`janitor`	"2.2.0"
`tidyverse`	"2.0.0"

Task 4

Package	Version used for analysis
`assertr`	"3.0.0"
`here`	"1.0.1"
`readxl`	"1.4.3"
`tidyverse`	"2.0.0"

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
task1		task1
task4		task4
.gitattributes		.gitattributes
.gitignore		.gitignore
README.html		README.html
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dirty Data Project

Introduction

Languages

How to run

Data used

Task 1 - Decathlon

Task 4 - Halloween Candy

Packages

Task 1

Task 4

About

Releases

Packages

Languages

dataquine/dirty_data_codeclan_project_lesley_duff

Folders and files

Latest commit

History

Repository files navigation

Dirty Data Project

Introduction

Languages

How to run

Data used

Task 1 - Decathlon

Task 4 - Halloween Candy

Packages

Task 1

Task 4

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages