0% found this document useful (0 votes)
46 views

L09-An Introduction To Machine Learning

This document discusses key concepts in machine learning including supervised learning, unsupervised learning, training and testing concepts, and overfitting and underfitting. It also covers different types of data that can be used in machine learning like text, numbers, images, and videos. Common machine learning algorithms for classification, regression, clustering, and pattern discovery are also mentioned.

Uploaded by

Alda lumbangaol
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

L09-An Introduction To Machine Learning

This document discusses key concepts in machine learning including supervised learning, unsupervised learning, training and testing concepts, and overfitting and underfitting. It also covers different types of data that can be used in machine learning like text, numbers, images, and videos. Common machine learning algorithms for classification, regression, clustering, and pattern discovery are also mentioned.

Uploaded by

Alda lumbangaol
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

10S3001 – Kecerdasan Buatan (Certan)

Samuel I. G. Situmeang, S.TI., M.Sc.


Semester Gasal T.A. 2020/2021
Modified slides provided from Ansaf Salleb-Aouissi
Artificial Intelligence, Columbia University, 2018
▪ Machine Learning Concepts

▪ Supervised Learning and Unsupervised Learning

▪ Training-Testing Concepts

▪ Overfitting and Underfitting

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 2


3

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del


Source: https://www.domo.com/
10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 4
Data comes in different sizes and types:
 Texts
 Numbers
 Clickstreams
 Graphs
 Tables
 Images
 Transactions
 Videos
 Some or all of the above!

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 5


▪ Wherever we go, we are "datafied".

▪ Smartphones are tracking our locations.

▪ We leave a data trail in our web browsing.

▪ Interaction in social networks.

▪ Privacy is an important issue in Data Science.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 6


10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 7
▪ We all use it on a daily basis. Examples:

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 8


▪ Spam filtering
▪ Credit card fraud detection
▪ Digit recognition on checks, zip codes
▪ Detecting faces in images
▪ MRI image analysis
▪ Recommendation system
▪ Search engines
▪ Handwriting recognition
▪ Scene classification
▪ etc...

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 9


Research progress

Image classification Audio synthesis Games

Products

Voice recognition Translation Self-driving cars


10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 10
▪ More compute
▪ More data
▪ Better algorithms → Need more people who understand the algorithms!

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 11


10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 12
Statistics: Machine Learning:
▪ Hypothesis testing ▪ Decision trees
▪ Experimental design ▪ Rule induction
▪ Analysis of variance (ANOVA) ▪ Neural Networks
▪ Linear regression ▪ Support Vector Machines (SVMs)
▪ Logistic regression ▪ Clustering method
▪ Generalized Linear Models (GLM) ▪ Association rules
▪ Principal Component Analysis (PCA) ▪ Feature selection
▪ Visualization
▪ Graphical models
▪ Genetic algorithm

http://statweb.stanford.edu/~jhf/ftp/dm-stat.pdf

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 13


Alan Turing proposed the concept of a learning machine in 1950 (in the same paper
that proposed the Turing test).

Idea: Divide the problem into two parts:


1. A machine that simulates a child’s brain (analogous to a blank notebook: should function
by simple mechanisms and have lots of blank sheets).
2. A way of teaching the child machine (should be simple since we know how to teach a
human child).

Teacher rewards good behaviour and penalizes bad behaviour.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 14


“An important feature of a learning machine is that its teacher will often be very
largely ignorant of quite what is going on inside.”
Alan Turing

▪ While we don’t know how our brain converts input to output, we know what the
output should be for every input.
▪ We can use this knowledge to teach the machine.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 15


“How do we create computer programs that improve with experience?”
Tom Mitchell
http://videolectures.net/mlas06_mitchell_itm/

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 16


“How do we create computer programs that improve with experience?”
Tom Mitchell
http://videolectures.net/mlas06_mitchell_itm/

"A computer program is said to learn from experience 𝐸 with respect to some class of
tasks 𝑇 and performance measure 𝑃, if its performance at tasks in 𝑇, as measured by
𝑃, improves with experience 𝐸."
Tom Mitchell. Machine Learning 1997.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 17


▪ A branch of artificial intelligence, concerned with the design and development of
algorithms that allow computers to evolve behaviors based on empirical data.

▪ As intelligence requires knowledge, it is necessary for computers to acquire


knowledge.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 18


There some variations of how to define the types of Machine Learning Algorithms but
commonly they can be divided into categories according to their purpose and the
main categories are the following:
▪ Supervised learning (predictive model, "labeled" data).
▪ Classification
▪ Numeric prediction/forecasting/regression
▪ Unsupervised learning (descriptive model, "unlabeled" data).
▪ Clustering
▪ Pattern Discovery
▪ Semi-supervised learning (mixture of "labeled" and "unlabeled" data).
▪ Reinforcement learning. Using this algorithm, the machine is trained to make specific
decisions. It works this way: the machine is exposed to an environment where it trains
itself continually using trial and error. This machine learns from past experience and tries
to capture the best possible knowledge to make accurate business decisions.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 19


▪ Supervised learning
▪ Classification. e.g. Logistic Regression, Decision Tree, KNN, Random Forest, SVM, & Naive
Bayes
▪ Numeric prediction/forecasting/regression. e.g. Linear Regression, KNN, Gradient Boosting
& AdaBoost
▪ Unsupervised learning
▪ Clustering. e.g. K-Means
▪ Pattern Discovery. e.g. Apriori, FP-Growth, & Eclat

▪ Semi-supervised learning
▪ Reinforcement learning.
▪ e.g. Q-Learning, Temporal Difference (TD), & Deep Adversarial Networks

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 20


Source: https://en.proft.me/

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 21


22

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del


Given: Training data: 𝑥1 , 𝑦1 , … , 𝑥𝑛 , 𝑦𝑛 Τ𝑥𝑖 ∈ ℝ𝑑 and 𝑦𝑖 is the label.
example 𝑥1 → 𝑥11 𝑥12 … 𝑥1𝑑 𝑦1 ← label
… … … … … …
example 𝑥𝑖 → 𝑥𝑖1 𝑥𝑖2 … 𝑥𝑖𝑑 𝑦𝑖 ← label
… … … … … …
example 𝑥𝑛 → 𝑥𝑛1 𝑥𝑛2 … 𝑥𝑛𝑑 𝑦𝑛 ← label

fruit length width weight label


fruit 1 165 38 172 Banana
fruit 2 218 39 230 Banana
fruit 3 76 80 145 Orange
fruit 4 145 35 150 Banana
fruit 5 90 88 160 Orange

fruit n … … … …

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 23


Training data: "examples" 𝑥 with "labels" 𝑦.
𝑥1 , 𝑦1 , … , 𝑥𝑛 , 𝑦𝑛 Τ𝑥𝑖 ∈ ℝ𝑑

▪ Classification: 𝑦 is discrete. To simplify, 𝑦 ∈ −1. +1

𝑓: ℝ𝑑 → −1, +1 𝑓 is called a binary classifier

Example: Approve credit yes/no, spam/ham, banana/orange.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 24


Classification:

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 25


Classification:

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 26


Classification:

Methods: Support Vector Machines, neural networks, decision trees, K-nearest


neighbors, naive Bayes, etc.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 27


Classification:

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 28


Non linear classification

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 29


Training data: "examples" 𝑥 with "labels" 𝑦.
𝑥1 , 𝑦1 , … , 𝑥𝑛 , 𝑦𝑛 Τ𝑥𝑖 ∈ ℝ𝑑

▪ Regression: 𝑦 is a real value, 𝑦 ∈ ℝ

𝑓: ℝ𝑑 → ℝ 𝑓 is called a regressor

Example: amount of credit, weight of fruit.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 30


Regression:

Example: Income in function of age, weight of the fruit in function


of its length.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 31


Regression:

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 32


Regression:

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 33


Regression:

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 34


Training data: "examples" 𝑥.
𝑥1 , … , 𝑥𝑛 , 𝑥𝑖 ∈ 𝑋 ⊂ ℝ𝑛

▪ Clustering/segmentation:

𝑓: ℝ𝑑 → 𝐶1 , … , 𝐶𝑘 set of cluster

Example: Find clusters in the population, fruits, species.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 35


Clustering/segmentation:

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 36


Clustering/segmentation:

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 37


Clustering/segmentation:

Methods: K-means, gaussian mixtures, hierarchical clustering, spectral clustering etc.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 38


39

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del


Training set

ML Algorithm

Model (f)

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 40


Training set

ML Algorithm

Income,
gender,
age, Credit amount $
family Model (f) Credit yes/np
status,
zipcode

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 41


Training set

ML Algorithm

Income,
gender,
age, Credit amount $
family Model (f) Credit yes/np
status,
zipcode

Question: How can we be confident about 𝒇?

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 42


▪ We calculate 𝐸 𝑡𝑟𝑎𝑖𝑛 the in-sample error (training error or empirical error/risk).
𝑛

𝐸 𝑡𝑟𝑎𝑖𝑛 𝑓 = ෍ 𝑙𝑜𝑠𝑠 𝑦𝑖 , 𝑓 𝑥𝑖 prediction


𝑖=1 label

true label

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 43


▪ We calculate 𝐸 𝑡𝑟𝑎𝑖𝑛 the in-sample error (training error or empirical error/risk).
𝑛

𝐸 𝑡𝑟𝑎𝑖𝑛 𝑓 = ෍ 𝑙𝑜𝑠𝑠 𝑦𝑖 , 𝑓 𝑥𝑖
𝑖=1
▪ Examples of loss functions:
▪ Classification error:
1 𝑠𝑖𝑔𝑛 𝑦𝑖 ≠ 𝑠𝑖𝑔𝑛 𝑓 𝑥𝑖
𝑙𝑜𝑠𝑠 𝑦𝑖 , 𝑓 𝑥𝑖 =ቊ
0 otherwise

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 44


▪ We calculate 𝐸 𝑡𝑟𝑎𝑖𝑛 the in-sample error (training error or empirical error/risk).
𝑛

𝐸 𝑡𝑟𝑎𝑖𝑛 𝑓 = ෍ 𝑙𝑜𝑠𝑠 𝑦𝑖 , 𝑓 𝑥𝑖
𝑖=1
▪ Examples of loss functions:
▪ Classification error:
1 𝑠𝑖𝑔𝑛 𝑦𝑖 ≠ 𝑠𝑖𝑔𝑛 𝑓 𝑥𝑖
𝑙𝑜𝑠𝑠 𝑦𝑖 , 𝑓 𝑥𝑖 =ቊ
0 otherwise
▪ Least square loss:
2
𝑙𝑜𝑠𝑠 𝑦𝑖 , 𝑓 𝑥𝑖 = 𝑦𝑖 − 𝑓 𝑥𝑖

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 45


▪ We calculate 𝐸 𝑡𝑟𝑎𝑖𝑛 the in-sample error (training error or empirical error/risk).
𝑛

𝐸 𝑡𝑟𝑎𝑖𝑛 𝑓 = ෍ 𝑙𝑜𝑠𝑠 𝑦𝑖 , 𝑓 𝑥𝑖
𝑖=1
▪ We aim to have 𝐸 𝑡𝑟𝑎𝑖𝑛 𝑓 small, i.e., minimize 𝐸 𝑡𝑟𝑎𝑖𝑛 𝑓

▪ We hope that 𝐸 𝑡𝑟𝑎𝑖𝑛 𝑓 , the out-sample error (test/true error), will be small too.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 46


Example: Split the data randomly into 60% for training, 20% for validation and 20% for
testing.

Source: https://towardsdatascience.com/

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 47


Training set is a set of examples used for learning a model (e.g., a classification
model).

Source: https://towardsdatascience.com/

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 48


Training set is a set of examples used for learning a model (e.g., a classification
model).
Validation set is a set of examples that cannot be used for learning the model but can
help tune model parameters (e.g., selecting K in K-NN). Validation helps control
overfitting.

Source: https://towardsdatascience.com/

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 49


Training set is a set of examples used for learning a model (e.g., a classification
model).
Validation set is a set of examples that cannot be used for learning the model but can
help tune model parameters (e.g., selecting K in K-NN). Validation helps control
overfitting.
Test set is used to assess the performance of the final model and provide an
estimation of the test error.

Source: https://towardsdatascience.com/

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 50


Training set is a set of examples used for learning a model (e.g., a classification
model).
Validation set is a set of examples that cannot be used for learning the model but can
help tune model parameters (e.g., selecting K in K-NN). Validation helps control
overfitting.
Test set is used to assess the performance of the final model and provide an
estimation of the test error.

Note: Never use the test set in


any way to further tune the
parameters or revise the model.

Source: https://towardsdatascience.com/

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 51


52

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del


▪ Overfitting: keadaan ketika model memiliki kinerja baik hanya untuk training
data/seen examples tetapi tidak memiliki kinerja baik untuk unseen examples.
▪ Terjadi ketika model terlalu fleksibel (memiliki kemampuan yang terlalu tinggi untuk
mengestimasi banyak fungsi) atau terlalu mencocokkan diri terhadap training data.
▪ Underfitting: keadaan ketika model memiliki kinerja buruk baik untuk training data
dan unseen examples.
▪ Terjadi akibat model yang telalu tidak fleksibel (memiliki kemampuan yang rendah untuk
mengestimasi variasi fungsi.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 53


High Bias Low Bias
LowVariance High Variance

Prediction Error
Test error
Training error

Underfitting Good models Overfitting

Low Complexity of the model High

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 54


10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 55
10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 56
▪ In general, use simple models!

▪ Reduce the number of features manually or do feature selection.

▪ Do a model selection.

▪ Use regularization (keep the features but reduce their importance by setting small

parameter values).

▪ Do a cross-validation to estimate the test error.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 57


We want to minimize:

Classification term + 𝐶 × Regularization term

෍ 𝑙𝑜𝑠𝑠 𝑦𝑖 , 𝑓 𝑥𝑖 +𝐶×𝑅 𝑓
𝑖=1

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 58


Hint: Avoid high-degree polynomials.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 59


A method for estimating test error using training data.
Algorithm:
Given a learning algorithm 𝒜 and a dataset 𝒟
Step 1: Randomly partition D into k equal-size subsets D1; : : : ; Dk
Step 2:
For 𝑗 = 1 to 𝑘
Train 𝒜 on all 𝐷𝑖 , 𝑖 ∈ 1, … , 𝑘 and 𝑖 ≠ 𝑗, and get 𝑓𝑗
Apply 𝑓𝑗 to 𝐷𝑗 and compute 𝐸𝐷𝑗
Step 3: Average error over all folds.
𝑘

෍ 𝐸𝐷𝑗
𝑗=1

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 60


Review the concepts and terminology:

Instance, example, feature, label, supervised learning, unsupervised learning,


classification, regression, clustering, prediction, training set, validation set, test
set, K-fold cross validation, classification error, loss function, overfitting,
underfitting, regularization.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 61


1. Tom Mitchell, Machine Learning.
2. Abu-Mostafa, Yaser S. and Magdon-Ismail, Malik and Lin, Hsuan-Tien, Learning
From Data, AMLBook.
3. The elements of statistical learning. Data mining, Inference, and Prediction T.
Hastie, R. Tibshirani, J. Friedman.
4. Christopher Bishop. Pattern Recognition and Machine Learning.
5. Richard O. Duda, Peter E. Hart, David G. Stork. Pattern Classification. Wiley

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 62


▪ Major journals/conferences: ICML, NIPS, UAI, ECML/PKDD, JMLR, MLJ, etc.
▪ Machine learning video lectures:
http://videolectures.net/Top/Computer_Science/Machine_Learning/
▪ Machine Learning (Theory):
http://hunch.net/
▪ LinkedIn ML groups: \Big Data" Scientist, etc.
▪ Women in Machine Learning:
https://groups.google.com/forum/#!forum/women-in-machine-learning
▪ KDD nuggets
http://www.kdnuggets.com/

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 63


▪ S. J. Russell and P. Borvig, Artificial Intelligence: A Modern Approach (4th Edition),

Prentice Hall International, 2020.


▪ Chapter 19. Learning from Examples

▪ T. Mitchell, Machine Learning, 1997.

▪ T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining,
Inference, and Prediction (2nd Edition), 2009.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 64


10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 65

You might also like