Skip to content

Evaluation of some of the ethnicolr models on the NC Voter Registration Data New Models Based on NC Voter Registration Data.

Notifications You must be signed in to change notification settings

appeler/nc_race_ethnicity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Evaluating Ethnicolr

We evaluate some of the ethnicolr models on the NC Voter Registration Data (access limited to researchers with university affiliation). There are some challenges in evaluation given how race and ethnicity are coded varies across the two states.

Measuring Race and Ethnicity

North Carolina distinguishes between race and ethnicity and has two columns. Here's the codebook:

/ ***************************************************************************
Race codes
race               description
*******************************************************************************
A                  ASIAN
B                  BLACK or AFRICAN AMERICAN
I                  AMERICAN INDIAN or ALASKA NATIVE
M                  TWO or MORE RACES
O                  OTHER
P                  NATIVE HAWAIIAN or PACIFIC ISLANDER
U                  UNDESIGNATED
W                  WHITE
*************************************************************************** /

/ ***************************************************************************
Ethnic codes
ethnicity          description
*******************************************************************************
HL                 HISPANIC or LATINO
NL                 NOT HISPANIC or NOT LATINO
UN                 UNDESIGNATED
*************************************************************************** /

FL codebook is as follows:

Analyses

We start by presenting a full cross-tabulation of prediction from the FL full name model and NC concatenation of race and ethnicity, e.g., Asian--HL, Asian--NL, etc.

Next we present three comparisons:

Comparison # 1: Clean Commensurate Group

(race_code == 'B') & (ethnic_code == 'NL') ==> nh_black (race_code == 'W') & (ethnic_code == 'NL') ==> nh_white

The overall accuracy is 82%, with accuracy for NH Black at 33% and NH White at 96%.

Comparison #2: Low FP

  1. (race_code == 'B') & (ethnic_code == 'NL') ==> nh_black
  2. (race_code == 'W') & (ethnic_code == 'NL') ==> nh_white
  3. ((race_code == 'W') & (ethnic_code == 'HL')) | ((race_code == 'B') & (ethnic_code == 'HL')) ==> hispanic
  4. (race_code == 'A') & (ethnic_code == 'NL') ==> asian

The overall accuracy is 81%, with accuracy for NH Black at 33%, NH White at 96%, Asians at 60%, and Hispanics at 59%.

Comparison #3: Low FN

  1. (race_code == 'B') & (ethnic_code == 'NL') ==> nh_black
  2. (race_code == 'W') & (ethnic_code == 'NL') ==> nh_white
  3. ethnic_code == 'HL' ==> hispanic
  4. (race_code == 'A') & (ethnic_code == 'NL') ==> asian

The overall accuracy is 81%, with accuracy for NH Black at 33%, NH White at 96%, Asians at 60%, and Hispanics at 71%.

NC Ethnicolr Model(s)

We build new LSTM models based on NC data. We start by assuming y = concatenation of ethnic code and race code. We remove U and also UN --- assuming they are 'missing at random.' This gives us 12 categories.

We build a separate model that only predicts the race_code and takes out 'U', again assuming it to be 'missing at random.' We also build a model that only predicts ethnic_code and take out the UN.

Scripts

  1. Download NC Data
  2. FL Model Evaluation on NC Data
  3. 12 category Model
  4. Race code model
  5. Latino model
  6. NC Model Evaluation on FL Data

Authors

Suriyan Laohaprapanon and Gaurav Sood

About

Evaluation of some of the ethnicolr models on the NC Voter Registration Data New Models Based on NC Voter Registration Data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published