Overview

The focus of this project is the analysis of birth data in the United States, although this could be described in a variety of different ways depending upon the intent of the user. This is an R-Package complete with data sets, documentation, and custom functions for analyzing birth certificate records, which are generally available in their raw form but also generally difficult to work with due to size or non-machine friendly formatting.

The data sets included in this package are primarily drawn from the Birth Data files made available by the CDC through their Vital Stats Online portal. Data are derived from birth certificates reaching back to 1968.

Installation

This package can be easily installed in R using the devtools:install_github function.

# if you don't already have the devtools package install it
install.packages("devtools")

# then install this package, I recommend including dependencies=TRUE
devtools::install_github("Mikuana/vitalstatistics", dependencies=TRUE)

It is crucial to recognize that the births data set in this package is reduced from hundreds of millions of records using a simple dimensional record count strategy. This makes the table somewhat similar to a OLAP Cube, and accordingly you cannot perform simple arithmetic but instead must always account for the cases field to obtain the correct result.

Raw Data Processing

A key feature of this project is the ability to process the raw Birth Data files into analytic data sets that are more suitable for data science. This is achieved prior to the R package build, and therefore is unncessary to understand if the births data set is adequate for your purposes. If you need more you can get more, but it will require you to step outside of R and use several different tools. You can get started by looking at the raw data readme.

Disclaimer

Use of this data strictly prohibits any attempts to identify individuals. This is described in great detail on the CDC web page. I highly suggest you take this seriously, and don't try to identify individuals within this data set. Even yourself. Really.

http://www.cdc.gov/nchs/data_access/vitalstatsonline.htm

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
R		R
data-raw		data-raw
data		data
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md
vitalstatistics.Rproj		vitalstatistics.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Installation

Raw Data Processing

Disclaimer

About

Releases

Packages

Languages

License

Mikuana/vitalstatistics

Folders and files

Latest commit

History

Repository files navigation

Overview

Installation

Raw Data Processing

Disclaimer

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages