0% found this document useful (0 votes)
46 views

Machine Learning

grown) decision trees to improve accuracy and avoid overfitting. It averages the predictions of each tree to make a final prediction.

Uploaded by

Firas Gerges
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

Machine Learning

grown) decision trees to improve accuracy and avoid overfitting. It averages the predictions of each tree to make a final prediction.

Uploaded by

Firas Gerges
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Machine Learning

Overview
Techniques and Applications
By
Firas Gerges

1
Outline
• What is Machine Learning
• Machine Learning vs. Traditional
Model
• Steps in Machine Learning
• Machine Learning Types and
Algorithms
• Machine Learning Use Case

2
Machine Learning

• Machine Learning (ML), a subfield of Artificial


Intelligence (AI), is a set of methods and approaches
that leverage algorithms and data to imitate
intelligent human behavior: Learning from
experience.

• ML learns from data (situation => outcome) to


predict the outcome of a new given situation.

• Situation: Features/Attributes/Predictors/Input
• Outcome:
Target/class/Prediction/Predictand/Output

3
The Learning Problem

• Suppose we observe the output space and the input space

• Task is to find a relationship/mapping function f between and :


• (ε is a random error (noise) term, independent of X )

4
The Learning Problem

• We cannot compute f, but we can estimate it by learning from data.

• Role of Machine Learning is to construct as an estimate of f by learning from the data.

• Two main objectives:


• Prediction: use on X to compute a prediction of ((X))
• Inference: use to study relationship between and

5
The Learning Problem: How to estimate f

• ML techniques are used to estimate f. Different ML techniques have different


formulations and assumptions around the form and type of f (linear, non-linear,
decision tree, etc.)

• Estimate f:
1. Construct observed set of “training” data: {}
2. Use “training” data as input to a ML technique to construct

6
ML vs.
Traditional
Computing

7
Machine Learning
Types

• Supervised Learning:
• Regression
• Classification
• Unsupervised Learning:
• Clustering
• Association Analysis
• Reinforcement Learning

8
Supervised vs.
Unsupervised Learning

Morimoto, J., Ponton, F. Virtual reality in


biology: could we become virtual naturalists?.
Evo Edu Outreach 14, 7 (2021).
https://doi.org/10.1186/s12052-021-00147-x

9
How to
Perform
Machine
Learning

10
Steps Perform Machine Learning

Objectives and Data Processing and


Data Acquisition
Goals Definition Feature Engineering

ML Algorithm Model
Model Testing
Selection Training/Learning

Model Deployment

11
Objective and Goals Definition
We often don't just start with
the training data set and plug
in a learning algorithm to find
the predictor

Problem statement: Forecast Questions:


the river discharge
We must define and formulate
the problem: Input and Output

Predicting what? Continuous or Prediction/Forecasting or


Predicting or Forecasting? Site specific or general model?
discrete? Quantitative or qualitative? Inference?

Example of prediction: Use


Example of forecasting: Use
temperature, wind speed,
data known only in the past (t-
humidity, etc. at time t to
x) to predict discharge at time
predict river discharge at time
t
t

12
Data Acquisition
• Define what data to use as input to the model:
• Data type: textual, images, numerical, etc. (or
combination)
• Cost: Data collection is often costly; researchers should
think of what kind of data would be most useful to solve
the problem. This requires “domain knowledge”.
• Data availability

• Be Careful: Seasonality and temporal


dependencies
13
Processing and Feature Engineering

Feature engineering is the process of


selecting, manipulating, and transforming raw Some steps we often see in feature
data into features that can be used in Machine engineering:
Learning.

Data analysis to
Data cleaning,
Extracting Textual to keep/drop
Normalization/ handling missing
features from numerical features, features
Standardization values, removing
images transformation embedding and
outliers, etc.
encoding.

14
Problem is defined
Linear/logistic
Regression

Machine Constraints are


defined K-Nearest Neighbors
Learning (KNN)

Algorithms: Data are collected and


processed Classical ML Support Vector
techniques: Machines (SVM)
Supervised Next: Selecting ML
Learning technique to train:
Decision Trees (DT)
Neural Networks
(NN) / Deep Learning
Random Forest (RF)

15
Linear Regression
• Linear Regression assumes that there is a linear relationship
present between input features and the output.
• It aims to find the best fitting line (or plane) that describes
two or more variables.

• Linear Regression is used to predict continuous output


• Logistic Regression is used for classification

16
K-Nearest Neighbors

• KNN is instance-based ML technique that


does not require training.
• Idea is to label the new instance (data
case) based on the top K nearest
instances:
• Nearest = Distance Measure

17
Support Vector
Machines

• SVM maps the data into a high


dimensional space and tries to find a
hyperplane that separates the cases
based on their class label
• SVM works by maximizing the width of
the area separating the different classes,
in order to minimize the margin of the
error of the desired hyperplane

18
Decision Tree Learning
Decision tree learning is a method for approximating target
functions, in which the learned function is represented by a decision
tree
These learning methods are among the most popular, due to their
efficiency and their white-box nature

A decision tree is simply a series of sequential decisions made to


reach a specific result
Mitchell, Tom M., and Tom M. Mitchell. Machine learning.
Vol. 1. No. 9. New York: McGraw-hill, 1997.

Tree is built following concepts from information theory related to


entropy and information gain

19
Random Forest is a tree-based
algorithm that leverages multiple
decision trees when making a
prediction

Random Random Forest combines the

Forest vs output of multiple (randomly


created) Decision Trees to predict

Decision Tree the final output

It will generate different results


each time

20
• It consists of units connected through weighted edges
• It learns weights that minimize the error between
Neural Networks predictions and observations

21
• Linear Regression:
• Pros:
• Simple and Effective
How to • No parameter tuning is necessary
• Features importance (scale before)
Choose a • Perform well on linear data
• Fast
ML • Cons:
• Poor performance on non-linear data
Algorithm • Poor performance with irrelevant and highly
correlated features
• Requires feature engineering to only keep
relevant data

22
• K-Nearest Neighbors:
• Pros:
• Simple and easy to understand
How to • Only one parameter to tune

Choose a • No assumption about the data


• Can easily be changed to handle new data

ML • Cons:
• Poor performance on data with a lot of
Algorithm features
• Requires data scaling
• Very sensitive to outliers
• Poor performance on imbalanced data

23
• Support Vectors:
• Pros:

How to • Suitable for data with high dimensions


• Impact of outliers is minimal
Choose a • Should (often) always outperform
Linear Regression
ML • Cons:
• Very slow to train
Algorithm • Sensitive to noise (overlapped cases)
• Selection of hyperparameters (and
kernel) is very important

24
• Random Forests:
• Pros:

How to • Good Performance on Imbalanced datasets


• Handling of huge amount of data with high

Choose a dimensionality
• Impact of outliers is minimal

ML • Features importance
• Irrelevant features won’t affect performance (only

Algorithm
for decision tree)
• Cons:
• Large number of trees can make the algorithm too
slow and ineffective for real-time predictions

25
• Neural Networks:
• Pros:

How to • Good for nonlinear data with large number of


inputs

Choose a • Once trained, the predictions are very fast


• Can explore deep hidden relationship between
data
ML • Cons:
• Black box
Algorithm • Requires a lot of training data
• Computationally expensive
• Large possibilities of architectures and parameters

26
Now it is time to train the model

Original data set is divided into training


Model and testing

Training Training data is used to train and cross-


validate the model:
• Feature analysis and scaling
• Model training
• Parameters and architecture tuning

27
Once a model shows good
performance in the
training/validation step, testing
data is used
Model
Testing/Evaluation
This is called Evaluation Phase:
testing the performance of the
model on new, unseen data

28
ML Advantages, Challenges, and Limitations

Advantages: Limitations:
Identifies trends and patterns Domain expertise is often required
No human intervention is often needed Challenges in interpreting results
Always learning and improving Requires time and resources
Handles multi-dimensional and heterogeneous data High error susceptibility
Wide applications High data dependency

29
Recommended Readings
Maini, V., & Sabri, S. (2017). Machine learning for humans. Online:
https://medium. com/machine-learning-for-humans.
Géron, A. (2022). Hands-on machine learning with Scikit-Learn,
Keras, and TensorFlow. O'Reilly Media, Inc.
Mitchell, T. M. (1997). Machine learning (Vol. 1, No. 9). New York:
McGraw-hill.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning.
MIT press.
30

You might also like