Week 6 - Lecture 11-1
Week 6 - Lecture 11-1
Machine Learning
Week 6 - Lecture 11
COSC 202 Data Science and AI
Menatalla Abououf
Fall 2024
Outline – Week 6
• Introduction to Machine Learning
1. What is Machine Learning?
2. Why Machine learning?
3. What is a Model?
• Types of Machine Learning
• Types of supervised Machine learning and Its framework
• Linear Regression
• Understanding model parameters
• Train-test split
• Evaluating regression models
2
Life Cycle of Data Science
Problem Statement What Problem are we trying to solve?
Data Cleaning How should we clean or our data so our model can use it?
Exploratory Data Analysis What insights can we gain from the data?
Data Transformation How to prepare data so our model can use it?
3
Life Cycle of Data Science – Modeling &
Validation
1. Selecting Appropriate Machine Learning Model
2. Splitting Data
3. Model Training
4. Model Validation
5. Model Evaluation
6. Final Model Selection
4
What is Machine Learning (ML)?
“A computational method that is a subfield of artificial
intelligence and that enables a computer to learn to perform tasks
by analyzing a large dataset without being explicitly programmed”
Classify
Cluster
Predict
6
Do you want to
predict a
category?
That’s
Classification
Example:
Classify the
picture
7
Do you want to
discover
structure in
unexplored data?
That’s Clustering
Example:
Cluster the
following
Model
8
Do you want to
predict a value?
That’s
Regression
Example:
predict the
salary of an
employee with 8 8 years
years Experience
experience.
9
Exercise:
4. Predicting an apartment rent using the location and the number of rooms.
10
What is a Model?
11
What is a Model?
• ML can perform a task by being 'trained' with a large dataset.
• During training, an algorithm is optimized to find certain patterns or outputs
from the dataset, depending on the task.
• The output of this process is called a machine learning model.
Input AI Model (Algorithm) Output
Predictions/
Data Classification/Cl
ustering
13
Types of Machine Learning
14
Supervised Learning
• Supervised learning is a machine learning approach that’s defined by its use
of labeled data sets.
• These data sets are designed to train or “supervise” algorithms into
classifying data or predicting outcomes accurately.
15
Unsupervised Learning
• Unsupervised learning uses machine learning algorithms to analyze and
cluster unlabeled data sets.
• These algorithms discover hidden patterns in data without the need for
human intervention (hence, they are “unsupervised”).
16
Supervised vs Unsupervised Learning
17
Supervised vs unsupervised learning: Which
is best for you?
You need to:
1- Define your goals Starting point
2- Evaluate your input data
3- Review your options for Algorithms
18
Reinforcement learning
• Reinforcement learning (RL) is a machine
learning (ML) technique that trains software
to make decisions to achieve the most
optimal results.
19
Reinforcement learning - Example
Example: Recommendation System
Machine
Learning
Supervised Unsupervised
21
Linear Regression
22
Classification vs Regression : Which is best
for you? Starting point
23
Supervised ML Framework
• A supervised ML framework estimates a relationship between the features
and the label.
𝑦𝑝 = 𝑓 Ω, 𝑥
• 𝑥 → Input
• Ω → the fit parameters which involves the aspects of the model being
estimated (fitted) using the data
• 𝑦p → Output (values predicted by the model)
• 𝑓 . → Prediction function (model) that generates predictions from the input
𝑥 and the learned parameters Ω
24
Supervised ML Framework
• Data scientists will train the model to find the best parameters by looking at
past data.
• Each observation X is going to relate to some outcome variable (label) y.
• The more of these values we have the better the machine learning model can
learn the parameters
x x x x
y y y y
25
Supervised ML Framework
• New observations are fed into the model with its learnt parameters (Ω) from
the training set to predict the output (𝑦𝑝 ).
• A model should generalize when new data comes in → we train on just a
subset of our old data and evaluate on a holdout set that hasn't been trained
to see how it'll perform in the real world.
• The model is evaluated by comparing the closeness of 𝑦 and 𝑦𝑝 .
x x x x x x x x
𝑦𝑝 = 𝑓 Ω, 𝑥
y y y y 𝑦𝑝 𝑦𝑝 𝑦𝑝 𝑦𝑝 26
Supervised ML Framework - Summary
fit
Labeled data + Model Fitted model
predict
unlabeled data + Fitted model Predicted label
27
Recommended Reading
• Artificial Intelligence with Python, by Alberto Artasanchez and Prateek Joshi.
Publisher: Packt Publishing Ltd, 2nd Edition, 2020. ISBN-10: 183921953X.
ISBN-13: 978-1839219535. - Pages 93 – 95 and page 117
28