Chocolate Segmentation (because Customer Segmentation is boring)

Content

Gathering the data, Notebook
Make an exploratory data analysis, Notebook
Cleanse and prepare the data, Notebook
Cluster the data, Notebook
Interpret the data, Notebook

Introduction

The goal of this project is to go though an end-to-end clustering project from getting the data to use the generated insights further. An imaginary usage could be to use the clusters to build a recommender system for chocolate: "If you like this chocolate, you might like that ones as well." I've chosen the chocolate context because I have much more contact to chocolates than to customers or end users.

Overview

The project is built upon data from the public API of the U.S. Department of Agriculture (https://fdc.nal.usda.gov/api-guide.html). The API documentation can be found here. The data is accessed via REST API and stored in a local PostgreSQL database. But you can see the raw data from the API here.
Next steps are performing an exploratory data analysis as a foundation for the data cleansing step. And the data cleansing itself, including a wide range of adjustments, e.g. extracting data in lists. The cleansed and fully prepared data is stored again in the database. A cleansed but no-encoded version of the data for visualization or other projects can be found here as csv.
Last part is the clustering itself. Two algorithms, KMeans and DBSCAN were used on different subsets of the dataset based on the first clustering results. For both a comprehensive hyperparamter tuning was implemented in order to get a optimal result.

Side Note: I made a python script that includes all steps requesting and cleaning the data in a straight-forward way here.

Results

Unfortunately, both clustering algorithms were not able to define meaningful clusters. In summary, there is one big cluster aka. no cluster. For more information, please see the clustering notebook.
Nevertheless, I still believe one can find interesting information through visualization.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
src		src
.gitignore		.gitignore
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chocolate Segmentation (because Customer Segmentation is boring)

Content

Introduction

Overview

Results

About

Languages

phisinger/Chocolate-clustering

Folders and files

Latest commit

History

Repository files navigation

Chocolate Segmentation (because Customer Segmentation is boring)

Content

Introduction

Overview

Results

About

Topics

Resources

Stars

Watchers

Forks

Languages