Data Prep and Cleaning For Machine Learning
Data Prep and Cleaning For Machine Learning
CLEANING FOR
MACHINE
LEARNING
AN OVERVIEW
Why?
Data Preparation & Cleaning is an extremely important
part of the overall Machine Learning process, one that must
be considered before ever looking to build or train a model.
Missing Values 1
4 Categorical Data
Outliers 5
6 Feature Scaling
Feature Engineering/Selection 7
8 Validation Split
Note: Which steps are applicable can depend on the data you’re
using, the problem you’re solving, and on the type of model you’re
applying! However, in the vast majority of cases you can’t go
wrong at least considering these 8 steps!
Missing Values 1
You train the model with the training set only. The
validation and/or test sets, are held-back from training and
are used to assess model performance. They provide a
true understanding of how accurate predictions are on
new or unseen data.
8
Want to land an incredible
role in the exciting, future-
proof, and lucrative field of
Data Science?
LEARN THE
RIGHT SKILLS
A curriculum based on
input from hundreds of
leaders, hiring managers,
and recruiters
https://data-science-infinity.teachable.com
BUILD YOUR
PORTFOLIO
https://data-science-infinity.teachable.com
EARN THE
CERTIFICATION
https://data-science-infinity.teachable.com
LAND AN
AMAZING ROLE
https://data-science-infinity.teachable.com
Taught by former Amazon
& Sony PlayStation Data
Scientist Andrew Jones
What do DSI
students say?
"I had over 40 interviews without an offer.
After DSI I quickly got 7 offers including
one at KPMG and my amazing new role
at Deloitte!"
- Ritesh
- Keith
"I'm now at University, and my Data
Science related subjects are a piece of
cake after completing this course!
https://data-science-infinity.teachable.com