Skip to content

smagellan/getdata-08-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Notes

  1. Script-sanitizer requires dplyr R package to run
  2. Script expects to find UCI HAR Dataset within current directory
  3. Script writes tidied data into UCI_HAR_tidied.txt
  4. It reads datasets slowly since read.table is utilized as file reader(fread of data.table causes SIGSEGV under linux)

How it works

  1. Read and prepare/enrich dataSet(readEnrichedDataSet function):
  • Read dataSet(X_test/X_train), with correct columns(variables) names(comes from features.txt).
  • Attach SUBJECT_ID column(subject_test.txt/subject_train.txt)
  • Attach ACTIVITY_ID column(y_test.txt/y_train.txt)
  • Attach ACTIVITY_NAME column with corresponding activity names(activity_labels.txt)
  1. Filter out unneeded columns, transform column names(extractMeasuresOfInterest function)
  • retain columns containing 'std()' or 'mean()'(but meanFreq dropped) in names.
  • std() becomes standard_deviation, mean() becomes mean_value
  1. Group dataSet by SUBJECT_ID, ACTIVITY_NAME, calculate mean values of remaining columns within groups.
  2. Write processed dataSet into UCI_HAR_tidied.txt (180 observations of 68 variables (SUBJECT_ID, ACTIVITY_NAME, 66 of sensors measurements))

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages