Table of Contents
Original authors: Jinsung Yoon, James Jordon, Mihaela van der Schaar
Paper: Jinsung Yoon, James Jordon, Mihaela van der Schaar, "GAIN: Missing Data Imputation using Generative Adversarial Nets," International Conference on Machine Learning (ICML), 2018.
- Original Github repository: https://github.com/jsyoon0823/GAIN
- Paper Link: http://proceedings.mlr.press/v80/yoon18a/yoon18a.pdf
- Supplementary material: http://proceedings.mlr.press/v80/yoon18a/yoon18a-supp.pdf
This directory contains implementations of GAIN framework for imputation using the main five datasets used in the original paper:
- UCI Letter (https://archive.ics.uci.edu/ml/datasets/Letter Recognition)
- UCI Spam (https://archive.ics.uci.edu/ml/datasets/Spambase)
- UCI Credit (https://archive.ics.uci.edu/ml/datasets/default of credit card clients)
- UCI Breast Cancer (https://archive.ics.uci.edu/ml/datasets/Breast Cancer Wisconsin (Diagnostic))
- UCI Online News Popularity (https://archive.ics.uci.edu/ml/datasets/Online News Popularity)
.
├── data # Contains the raw data
├── docker # Contains the files to create a Docker container
├── src # Source files
│ ├── data # Scripts to load and preprocess data
│ └── models # Scripts that define the GAIN model, the training loop and the MLP base for GAIN
├── reports # Folder generated by running, contains the results of the experiments (Tensorboard logs, etc...)
├── main.py # Main script to run an experiment
├── replicate_table1_paper.py # Script to replicate the results of the table 1 of the original paper, saves the results in a reports folder
├── setup.sh # Script that creates a Docker container
├── requirements.txt # Requirements file
├── logo.png # Logo used in the README
├── LICENSE
└── README.md
To run the pipeline for training and evaluation on GAIN framework, simply run
python3 -m main.py
.
Note that any model architecture can be used as the generator and discriminator model such as multi-layer perceptrons or CNNs.
If you want to run the code in a Docker container, you can use the following commands:
- Give execution permissions to the setup.sh file:
$ chmod x setup.sh
- Run the setup.sh file:
$ ./setup.sh
If you have exited the container, you can access it again by running the setup.sh file again.
data_name
: letter, spam, credit, breast or newsmiss_rate
: probability of missing componentsbatch_size
: batch sizehint_rate
: hint ratealpha
: hyperparameteriterations
: iterations
$ python3 main.py --data_name spam
--miss_rate: 0.2 --batch_size 128 --hint_rate 0.9 --alpha 100
--iterations 10000
If you want to replicate the results of the table 1 of the original paper, you can use the following command:
$ python3 replicate_table1_paper.py
This project is licensed under the Apache License 2.0 - see the LICENSE file for details