Skip to content

radekBednarik/bq_anonymization_public

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Big Query Anonymization Test Tool

Status


PUBLIC VERSION: Testing solution for BQ GDPR anonymization use case.

📝 Table of Contents

🧐 About

IMPORTANT: This is a public version of the project. Feature files and SQL templates were anonymized. Also, API connection to BigQuery is not possible. Rest of the codebase is intact.

This projects implements a testing solution using python-behave framework to test, whether ID fields in BQ datasets' tables were anonymized successfully.

🏁 Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

What things you need to install the software and how to install them.

  • Python 3.6 with these external packages:
    • behave
    • allure-behave
    • pandas
    • openpyxl
    • tqdm
    • pyhamcrest
    • google
    • google-cloud-biqquery
    • protobuf
  • linux (Ubuntu)/Win10 OS
  • allure reporting tool
    • on Win10 install using scoop
    • on Ubuntu/linux install using linuxbrew
  • access to tested BQ data project
  • access to BQ API, have it set up and have proper roles
  • access to this repository

Get familiar with used external tools' documentation to really understand, what is going on

Google and protobuf packages had to be placed in setup.py file to ensure proper functionality of BQ API library package.

Installing

  1. Install Python (refer to documentation, how to do that on your OS)
  2. fire up your command line tool of choice and get to the directory, where you will want to clone the project from github
  3. clone this repo
  4. run "python3 setup.py install" if on ubuntu, or "py setup.py install" if on win10. On Win10, package "pandas" will not be installed, you will have to do it manually. See comment in the setup.py file for link. Download the package, and run command pip install [path to package]/packagefile

🔧 Running the tests

  1. In the console, be in the root folder of the project
  2. run command "behave -f allure_behave.formatter:AllureFormatter -f pretty -o allure-results .\test\features" if on ubuntu, or "behave -f allure_behave.formatter:AllureFormatter -f pretty -o allure-results ./test/features" if on Win10
  3. wait, until tests are finished
  4. failed test have BQ data saved in XLSX file with timestamped name in the ./reports folder.
  5. you can also display interactive HTML report. To do this, run "allure serve" command in your console and the report will open in your default browser. It should be Firefox or Chrome.

Pseudo-random feature file test running

All datasets are divided into 5 feature files, with few exceptions. It is possible to run them either as it is specified above, or, if needed, it is possible to apply pseudo-random selection of the feature file.

To do that, run "python3 (or py on windows) manage.py -r" command in the console.

This will pick one of the tags stored in the list in the "functions.py" file and then run behave test framework, as usual, but only the feature file tagged by this tag will be actually run.

This process can be repeated as many times, as there are some tags, that were not picked, or "exhausted". When that happens, ValueError exception is caught, and you have to manually clear the "config.json" file.

To do that, use the utility "py manage.py -c".

You can also run the utility with both parameters at once, so next time the pseudorandom function will be able to choose from full set of tags again. In this case, run command like this "py manage.py -r -c".

Manage.py utility

To provide easier and faster work with behave coupled with allure reporting tool - since that console command can be quite long, you can use manage.py utility to cover these scenarios:

  • py manage.py -r will run one randomly picked feature file from all tagged feature files. This feature file will not be ran again, until config.json is cleared.
  • py manage.py -c will clear config.json file, which stores tags of feature files, which were already randomly run.
  • py manage.py -b will run all feature files like this command "behave -f allure_behave.formatter:AllureFormatter -f pretty -o allure-results .\test\features" would do.
  • py manage.py -t "@tag1" -t "@tag2" etc... wil run all feature files or just some of their scenarios tagged by provided tags. Take care to enter the tags wrapped in " " !.
  • py manage.py -h is always available by default and will display all available command with short descriptions.

✍️ Authors