Experimental Implementation of DP-WGAN
Differentially Private Synthetic Data Generation

For Continuous Data with binary Targets using the Differentially Private Wasserstein GAN

DP-WGAN Synthetic Data for "Health care: Heart attack possibility" Kaggle Dataset --> view Notebook
DP-WGAN Synthetic Data for "BankNote Authentication UCI" Kaggle Dataset --> view Notebook

Metrics achieved for DP-WGAN on the Heart Disease Dataset

*after multiple attempts using normalized input data, epsilon = approx 3.4 and delta = 1e-5

Process Steps & Key Concepts

The data needs to be in csv format and has to be partitioned as train and test before feeding it to the models.
Missing values are not supported and needs to replaced appropriately by the user before usage.
In case the data has continuous and categorical attributes, it needs to be pre-processed
(discretization for continuous values/ encoding for categorical attr.)
The generative GAN-based ML models are trained using the training dataset.
The generative model is used to create a synthetic version of the train dataset
To compensate for irregularities multiple GAN-Generator models are trained
To compensate for irregularities multiple synthetic datasets are generated,
the optimal best-performing dataset that yields the max AUC is selected
Logistic Regression Classifiers are trained using the real data, as well as, the synthetically generated dataset
Both classifiers are evaluated regarding performance on the left-out real test dataset (preserved for evaluation)
Relevant Metrics (mainly AUC) and visualizations of correlation-matrices of synthetic datasets were generated

Acknowledgements & Sources

Major parts of this summary notebook were extracted from this BOREALIS Private Data Generation Github repository by BorealisAI. Note that, this Jupyter notebook covers only one (DP-WGAN) of various possible datasets and generative models for differentially private synthetic data generation. The aforementioned analysis aproaches have yielded the following results as extracted from the original notebook. For more information rearding differential privacy specific privacy arguments Delta & Epsylon please refer to this info-page by Microsoft

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.ipynb_checkpoints		.ipynb_checkpoints
dpsdg_knowledgebase		dpsdg_knowledgebase
source_data		source_data
synth_data		synth_data
.DS_Store		.DS_Store
README.md		README.md
dataset_prep_banknote.ipynb		dataset_prep_banknote.ipynb
dataset_prep_heartdisease.ipynb		dataset_prep_heartdisease.ipynb
dpwgan_borealis_banknote.ipynb		dpwgan_borealis_banknote.ipynb
dpwgan_borealis_heart_disease.ipynb		dpwgan_borealis_heart_disease.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Experimental Implementation of DP-WGAN
Differentially Private Synthetic Data Generation

Metrics achieved for DP-WGAN on the Heart Disease Dataset

Process Steps & Key Concepts

Acknowledgements & Sources

About

Releases

Packages

Languages

stefanrmmr/differentially_private_synthetic_data

Folders and files

Latest commit

History

Repository files navigation

Experimental Implementation of DP-WGANDifferentially Private Synthetic Data Generation

Metrics achieved for DP-WGAN on the Heart Disease Dataset

Process Steps & Key Concepts

Acknowledgements & Sources

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Experimental Implementation of DP-WGAN
Differentially Private Synthetic Data Generation

Packages