From the course: AWS Certified Data Analytics – Specialty (DAS-C01) Cert Prep: 4 Analysis and Visualization

Cleaning data - Amazon Web Services (AWS) Tutorial

From the course: AWS Certified Data Analytics – Specialty (DAS-C01) Cert Prep: 4 Analysis and Visualization

Start my 1-month free trial Buy for my team

Cleaning data

“

One of the more common tasks that you'll find with dirty data is needing to preprocess it before you can do anything useful with it. Let's take this first scenario here where you have some data and you want to do some exploratory data analysis on it. Before you even get to that step, one of the things you could do is look for missing values. For example, if you ran inside of pandas, look for null values like DF.NaN. If you did find them, it would be up to you as the data scientist to decide, do you want to drop the row, for example? Maybe you have so much data, it's better to just drop those rows or you have a small amount of data and you may want to impute the value. For example, maybe take the median value for the data set and put that value inside. In a second scenario for natural language processing, really common to do preprocessing like tokenization, removing stopwords. In this particular scenario, one of the scenarios that a data scientist would encounter is preprocessing before you do sentiment analysis. So for example, removing things like the, and, or, because they'll remove the effectiveness of something you're doing for NLP.

Cleaning data - Amazon Web Services (AWS) Tutorial

From the course: AWS Certified Data Analytics – Specialty (DAS-C01) Cert Prep: 4 Analysis and Visualization

Cleaning data

Download courses and learn on the go

Contents

Start learning today.

Explore Business Topics

Explore Creative Topics

Explore Technology Topics