Skip to content

brownsarahm/wass-repair

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

wass-repair

The main goal of this code is to show how to transform distributions in the Wasserstein space. The method which achieves this goal is called bcmap.geometric_adjustment.

Some important notes before using the code

  • the methods assume your data (training/testing) is input as a pandas.dataframe
  • CDFs are a common object in the code, and are represented as discrete empirical CDFs via np.ndarry. This means that given continuous scores between 0 and 1 (eg. 0.07, .32, .9, etc) that we must also provide the bins which will discretize the scores. So for example, if we provide bins [.33, .66, 1], then the vector representation of the scores given previously would be [2, 0, 1] with a CDF representation [2/3,2/3,1]. For binary classification and regression, having equal width bins between the min score and the max score (usually 100 bins between 0 and 1 i.e. np.linspace(0,1,100)) is sufficient.

Preparing Your Data

In order to pass a dataframe to the geometric repair method, it must contain three columns. Each individual will correspond to a row. In each row, the three columns are :

  • score - this contains the score denoting the likelihood of the positive classifcation instance. will be task specific but its likely a number between 0 << 1
  • label a 0/1 flag denoting if this item belongs to the positive class in the dataset
  • group a 0/1 flag denoting the protected group membership

A Note on Train/Test Splitting

Here, the naming convention of training data vs. testing data is simply used to describe the data that are used to determine/compute the reparied scores for propostprocessing (training data) and the dataframe that the repair will be applied onto (testing data). Unlike other machine learning applications, these data may be the same, or may be different. The dataframe that is returned is the testing data with the postprocessing applied to it.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 88.0%
  • Python 12.0%