Skip to content

This project focuses on analysing the performance of various Machine Learning models available in python's scikit-learn package when trying to predict wine classification

License

Notifications You must be signed in to change notification settings

glopezv95/ipynb-rndforest-wine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learning classification model analysis

This project focuses on analysing the performance of various Machine Learning models available in python's scikit-learn package when trying to predict wine classification.

The dataset used throughout the project is the UCI ML Wine Dataset, and it has been imported using the sklearn.datasets module.

Table of contents

Project configuration

Project structure

Running the project
Deploying the project

Project configuration

In order for the project to run properly, a series of steps need to be done;

Create a virtual environment

python -m venv venv

Activate the virtual environment

Using PowerShell

venv/scripts/activate

Using bash

source venv/bin/activate

Install dependencies

pip install -r requirements.txt

Project structure

PROJECT DIRECTORY
├─ auxiliary
│  ├─ const.py
│  └─ functions.py
├─ main.py
├─ optimisation.ipynb
├─ README.md
├─ requirements.txt
└─ venv
   ├─ *

auxiliary folder

const.py

This file contains constant miscellaneous variables used throughout the project, including the random seed, the title and subtitle of the main script, and the url to the database raw data.

functions.py

This file contains all the functions used on the main script to perform the extraction and transformation of the dataset, as well as to predict data imported using the main script.

main.py script

This is the python script from where the project is run.

optimisation.ipynb notebook

This notebook includes all the preprocessing done on the data in order to select the most adequate Machine Learning model for the dataset. This includes;

  • Data analysis. Data loading and describing. Feature analysis.
  • Data preparation. PCA dimension reduction, train/test split and feature normalising.
  • Model selection. K Nearest Neighbors, Ridge Classifier and Random Forest Classifier performance testing.
  • Selected model description.

requirements.txt

This .txt includes all the necessary packages in order for the main script to run properly.

Running the project

In order to utilise the model generated by the project, the main.py script needs to be run. On the shell, the following snippet needs to be written;

python main.py

Deploying the project

In order to deploy the project and automatically create a virtual environment, activate it, install dependencies and run the project, the deploy.py script needs to be run. On the shell, the following snippet needs to be written;

python deploy.py

Back up

About

This project focuses on analysing the performance of various Machine Learning models available in python's scikit-learn package when trying to predict wine classification

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published