Week 15
Week 15
Data Preparation
Feature Selection
Model Training
Model Evaluation
Predictions
Step 1: Define the Problem
• We need to define our objectives and evaluate the problem
that we are facing.
• Generally, our predictions are divided into three different
main categories depending on the ML problem we need to
answer;
• Classification
• Clustering
• Regression
Step 2: Data Collection and Integration
• Data is everywhere so it can be collected from multiple
sources like internet, databases or other types of storage.
• Chances are, that some of the data you collect going to be
noisy - your data is possibly incomplete even irrelevant.
• So, wherever it comes from, it will need to be compiled - get
integrated.
• The quality and quantity of data that you gather will directly
determine how accurate your model can be.
Step 3: Data Preparation
• Data you collected is
• Noisy
• Incomplete
• Irrelevant
• You must clean it.
• You also need to normalize the data.
Step 4: Data Visualization and Analysis
• Perform Exploratory Data Analysis (EDA)
• It's a technique that helps you understand the relationships
within your dataset.
• This leads to better features, better models.
• When you can see the data in a chart or plotted out, you can
help unveil previously unseen patterns.
• It reveals corrupt data or outliers that you don't want,
properties that could be very significant in your analysis
Step 5: Feature Selection
• A feature is a characteristic that might help when solving the
problem.
• We will look for features that correlate to our desired output.
• A crucial part in this step is Feature Engineering – the
process of manipulating the original data into new and
potentially a lot of more useful features.
Step 6: Training
• The training often considered the main part of machine
learning.
• Overfitting: the model learns the particulars of dataset too
well.
• Underfitting: will not have enough features to model the
data properly
• Bias: which is the gap between predicted value and actual
value.
• Variance: how dispersed your predicted values are.
Step 6: Training
Step 7: Model Evaluation
• One of the most effective ways to evaluate your model's
accuracy, precision, and ability to recall involves looking at
something called a confusion matrix.
• One a fine day, your friend is not around and you have to
play with someone else. And you lose horribly. You begin to
wonder what went wrong. When this phenomenon happens in
machine learning, people call it overfitting.
Overfitting
Underfitting
• Underfitting refers to a model that can neither model the
training dataset nor generalize to new dataset.