Machine Learning
Machine Learning
Overview
Techniques and Applications
By
Firas Gerges
1
Outline
• What is Machine Learning
• Machine Learning vs. Traditional
Model
• Steps in Machine Learning
• Machine Learning Types and
Algorithms
• Machine Learning Use Case
2
Machine Learning
• Situation: Features/Attributes/Predictors/Input
• Outcome:
Target/class/Prediction/Predictand/Output
3
The Learning Problem
4
The Learning Problem
5
The Learning Problem: How to estimate f
• Estimate f:
1. Construct observed set of “training” data: {}
2. Use “training” data as input to a ML technique to construct
6
ML vs.
Traditional
Computing
7
Machine Learning
Types
• Supervised Learning:
• Regression
• Classification
• Unsupervised Learning:
• Clustering
• Association Analysis
• Reinforcement Learning
8
Supervised vs.
Unsupervised Learning
9
How to
Perform
Machine
Learning
10
Steps Perform Machine Learning
ML Algorithm Model
Model Testing
Selection Training/Learning
Model Deployment
11
Objective and Goals Definition
We often don't just start with
the training data set and plug
in a learning algorithm to find
the predictor
12
Data Acquisition
• Define what data to use as input to the model:
• Data type: textual, images, numerical, etc. (or
combination)
• Cost: Data collection is often costly; researchers should
think of what kind of data would be most useful to solve
the problem. This requires “domain knowledge”.
• Data availability
Data analysis to
Data cleaning,
Extracting Textual to keep/drop
Normalization/ handling missing
features from numerical features, features
Standardization values, removing
images transformation embedding and
outliers, etc.
encoding.
14
Problem is defined
Linear/logistic
Regression
15
Linear Regression
• Linear Regression assumes that there is a linear relationship
present between input features and the output.
• It aims to find the best fitting line (or plane) that describes
two or more variables.
16
K-Nearest Neighbors
17
Support Vector
Machines
18
Decision Tree Learning
Decision tree learning is a method for approximating target
functions, in which the learned function is represented by a decision
tree
These learning methods are among the most popular, due to their
efficiency and their white-box nature
19
Random Forest is a tree-based
algorithm that leverages multiple
decision trees when making a
prediction
20
• It consists of units connected through weighted edges
• It learns weights that minimize the error between
Neural Networks predictions and observations
21
• Linear Regression:
• Pros:
• Simple and Effective
How to • No parameter tuning is necessary
• Features importance (scale before)
Choose a • Perform well on linear data
• Fast
ML • Cons:
• Poor performance on non-linear data
Algorithm • Poor performance with irrelevant and highly
correlated features
• Requires feature engineering to only keep
relevant data
22
• K-Nearest Neighbors:
• Pros:
• Simple and easy to understand
How to • Only one parameter to tune
ML • Cons:
• Poor performance on data with a lot of
Algorithm features
• Requires data scaling
• Very sensitive to outliers
• Poor performance on imbalanced data
23
• Support Vectors:
• Pros:
24
• Random Forests:
• Pros:
Choose a dimensionality
• Impact of outliers is minimal
ML • Features importance
• Irrelevant features won’t affect performance (only
Algorithm
for decision tree)
• Cons:
• Large number of trees can make the algorithm too
slow and ineffective for real-time predictions
25
• Neural Networks:
• Pros:
26
Now it is time to train the model
27
Once a model shows good
performance in the
training/validation step, testing
data is used
Model
Testing/Evaluation
This is called Evaluation Phase:
testing the performance of the
model on new, unseen data
28
ML Advantages, Challenges, and Limitations
Advantages: Limitations:
Identifies trends and patterns Domain expertise is often required
No human intervention is often needed Challenges in interpreting results
Always learning and improving Requires time and resources
Handles multi-dimensional and heterogeneous data High error susceptibility
Wide applications High data dependency
29
Recommended Readings
Maini, V., & Sabri, S. (2017). Machine learning for humans. Online:
https://medium. com/machine-learning-for-humans.
Géron, A. (2022). Hands-on machine learning with Scikit-Learn,
Keras, and TensorFlow. O'Reilly Media, Inc.
Mitchell, T. M. (1997). Machine learning (Vol. 1, No. 9). New York:
McGraw-hill.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning.
MIT press.
30