0% found this document useful (0 votes)

46 views

L09-An Introduction To Machine Learning

This document discusses key concepts in machine learning including supervised learning, unsupervised learning, training and testing concepts, and overfitting and underfitting. It also covers different types of data that can be used in machine learning like text, numbers, images, and videos. Common machine learning algorithms for classification, regression, clustering, and pattern discovery are also mentioned.

Uploaded by

Alda lumbangaol

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views

L09-An Introduction To Machine Learning

Uploaded by

Alda lumbangaol

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 65

10S3001 – Kecerdasan Buatan (Certan)

Samuel I. G. Situmeang, S.TI., M.Sc.

Semester Gasal T.A. 2020/2021
Modified slides provided from Ansaf Salleb-Aouissi
Artificial Intelligence, Columbia University, 2018
▪ Machine Learning Concepts

▪ Supervised Learning and Unsupervised Learning

▪ Training-Testing Concepts

▪ Overfitting and Underfitting

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 2

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del

Source: https://www.domo.com/
10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 4
Data comes in different sizes and types:
 Texts
 Numbers
 Clickstreams
 Graphs
 Tables
 Images
 Transactions
 Videos
 Some or all of the above!

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 5

▪ Wherever we go, we are "datafied".

▪ Smartphones are tracking our locations.

▪ We leave a data trail in our web browsing.

▪ Interaction in social networks.

▪ Privacy is an important issue in Data Science.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 6

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 7
▪ We all use it on a daily basis. Examples:

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 8

▪ Spam filtering
▪ Credit card fraud detection
▪ Digit recognition on checks, zip codes
▪ Detecting faces in images
▪ MRI image analysis
▪ Recommendation system
▪ Search engines
▪ Handwriting recognition
▪ Scene classification
▪ etc...

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 9

Research progress

Image classification Audio synthesis Games

Products

Voice recognition Translation Self-driving cars

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 10
▪ More compute
▪ More data
▪ Better algorithms → Need more people who understand the algorithms!

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 11

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 12
Statistics: Machine Learning:
▪ Hypothesis testing ▪ Decision trees
▪ Experimental design ▪ Rule induction
▪ Analysis of variance (ANOVA) ▪ Neural Networks
▪ Linear regression ▪ Support Vector Machines (SVMs)
▪ Logistic regression ▪ Clustering method
▪ Generalized Linear Models (GLM) ▪ Association rules
▪ Principal Component Analysis (PCA) ▪ Feature selection
▪ Visualization
▪ Graphical models
▪ Genetic algorithm

http://statweb.stanford.edu/~jhf/ftp/dm-stat.pdf

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 13

Alan Turing proposed the concept of a learning machine in 1950 (in the same paper
that proposed the Turing test).

Idea: Divide the problem into two parts:

1. A machine that simulates a child’s brain (analogous to a blank notebook: should function
by simple mechanisms and have lots of blank sheets).
2. A way of teaching the child machine (should be simple since we know how to teach a
human child).

Teacher rewards good behaviour and penalizes bad behaviour.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 14

“An important feature of a learning machine is that its teacher will often be very
largely ignorant of quite what is going on inside.”
Alan Turing

▪ While we don’t know how our brain converts input to output, we know what the
output should be for every input.
▪ We can use this knowledge to teach the machine.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 15

“How do we create computer programs that improve with experience?”
Tom Mitchell
http://videolectures.net/mlas06_mitchell_itm/

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 16

“How do we create computer programs that improve with experience?”
Tom Mitchell
http://videolectures.net/mlas06_mitchell_itm/

"A computer program is said to learn from experience 𝐸 with respect to some class of
tasks 𝑇 and performance measure 𝑃, if its performance at tasks in 𝑇, as measured by
𝑃, improves with experience 𝐸."
Tom Mitchell. Machine Learning 1997.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 17

▪ A branch of artificial intelligence, concerned with the design and development of
algorithms that allow computers to evolve behaviors based on empirical data.

▪ As intelligence requires knowledge, it is necessary for computers to acquire

knowledge.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 18

There some variations of how to define the types of Machine Learning Algorithms but
commonly they can be divided into categories according to their purpose and the
main categories are the following:
▪ Supervised learning (predictive model, "labeled" data).
▪ Classification
▪ Numeric prediction/forecasting/regression
▪ Unsupervised learning (descriptive model, "unlabeled" data).
▪ Clustering
▪ Pattern Discovery
▪ Semi-supervised learning (mixture of "labeled" and "unlabeled" data).
▪ Reinforcement learning. Using this algorithm, the machine is trained to make specific
decisions. It works this way: the machine is exposed to an environment where it trains
itself continually using trial and error. This machine learns from past experience and tries
to capture the best possible knowledge to make accurate business decisions.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 19

▪ Supervised learning
▪ Classification. e.g. Logistic Regression, Decision Tree, KNN, Random Forest, SVM, & Naive
Bayes
▪ Numeric prediction/forecasting/regression. e.g. Linear Regression, KNN, Gradient Boosting
& AdaBoost
▪ Unsupervised learning
▪ Clustering. e.g. K-Means
▪ Pattern Discovery. e.g. Apriori, FP-Growth, & Eclat

▪ Semi-supervised learning
▪ Reinforcement learning.
▪ e.g. Q-Learning, Temporal Difference (TD), & Deep Adversarial Networks

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 20

Source: https://en.proft.me/

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 21

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del

Given: Training data: 𝑥1 , 𝑦1 , … , 𝑥𝑛 , 𝑦𝑛 Τ𝑥𝑖 ∈ ℝ𝑑 and 𝑦𝑖 is the label.
example 𝑥1 → 𝑥11 𝑥12 … 𝑥1𝑑 𝑦1 ← label
… … … … … …
example 𝑥𝑖 → 𝑥𝑖1 𝑥𝑖2 … 𝑥𝑖𝑑 𝑦𝑖 ← label
… … … … … …
example 𝑥𝑛 → 𝑥𝑛1 𝑥𝑛2 … 𝑥𝑛𝑑 𝑦𝑛 ← label

fruit length width weight label

fruit 1 165 38 172 Banana
fruit 2 218 39 230 Banana
fruit 3 76 80 145 Orange
fruit 4 145 35 150 Banana
fruit 5 90 88 160 Orange
…
fruit n … … … …

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 23

Training data: "examples" 𝑥 with "labels" 𝑦.
𝑥1 , 𝑦1 , … , 𝑥𝑛 , 𝑦𝑛 Τ𝑥𝑖 ∈ ℝ𝑑

▪ Classification: 𝑦 is discrete. To simplify, 𝑦 ∈ −1. +1

𝑓: ℝ𝑑 → −1, +1 𝑓 is called a binary classifier

Example: Approve credit yes/no, spam/ham, banana/orange.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 24

Classification:

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 25

Classification:

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 26

Classification:

Methods: Support Vector Machines, neural networks, decision trees, K-nearest

neighbors, naive Bayes, etc.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 27

Classification:

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 28

Non linear classification

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 29

Training data: "examples" 𝑥 with "labels" 𝑦.
𝑥1 , 𝑦1 , … , 𝑥𝑛 , 𝑦𝑛 Τ𝑥𝑖 ∈ ℝ𝑑

▪ Regression: 𝑦 is a real value, 𝑦 ∈ ℝ

𝑓: ℝ𝑑 → ℝ 𝑓 is called a regressor

Example: amount of credit, weight of fruit.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 30

Regression:

Example: Income in function of age, weight of the fruit in function

of its length.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 31

Regression:

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 32

Regression:

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 33

Regression:

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 34

Training data: "examples" 𝑥.
𝑥1 , … , 𝑥𝑛 , 𝑥𝑖 ∈ 𝑋 ⊂ ℝ𝑛

▪ Clustering/segmentation:

𝑓: ℝ𝑑 → 𝐶1 , … , 𝐶𝑘 set of cluster

Example: Find clusters in the population, fruits, species.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 35

Clustering/segmentation:

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 36

Clustering/segmentation:

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 37

Clustering/segmentation:

Methods: K-means, gaussian mixtures, hierarchical clustering, spectral clustering etc.

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 38

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del

Training set

ML Algorithm

Model (f)

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 40

Training set

ML Algorithm

Income,
gender,
age, Credit amount $
family Model (f) Credit yes/np
status,
zipcode

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 41

Training set

ML Algorithm

Income,
gender,
age, Credit amount $
family Model (f) Credit yes/np
status,
zipcode

Question: How can we be confident about 𝒇?

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 42

▪ We calculate 𝐸 𝑡𝑟𝑎𝑖𝑛 the in-sample error (training error or empirical error/risk).
𝑛

𝐸 𝑡𝑟𝑎𝑖𝑛 𝑓 = ෍ 𝑙𝑜𝑠𝑠 𝑦𝑖 , 𝑓 𝑥𝑖 prediction

𝑖=1 label

true label

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 43

▪ We calculate 𝐸 𝑡𝑟𝑎𝑖𝑛 the in-sample error (training error or empirical error/risk).
𝑛

𝐸 𝑡𝑟𝑎𝑖𝑛 𝑓 = ෍ 𝑙𝑜𝑠𝑠 𝑦𝑖 , 𝑓 𝑥𝑖
𝑖=1
▪ Examples of loss functions:
▪ Classification error:
1 𝑠𝑖𝑔𝑛 𝑦𝑖 ≠ 𝑠𝑖𝑔𝑛 𝑓 𝑥𝑖
𝑙𝑜𝑠𝑠 𝑦𝑖 , 𝑓 𝑥𝑖 =ቊ
0 otherwise

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 44

▪ We calculate 𝐸 𝑡𝑟𝑎𝑖𝑛 the in-sample error (training error or empirical error/risk).
𝑛

𝐸 𝑡𝑟𝑎𝑖𝑛 𝑓 = ෍ 𝑙𝑜𝑠𝑠 𝑦𝑖 , 𝑓 𝑥𝑖
𝑖=1
▪ Examples of loss functions:
▪ Classification error:
1 𝑠𝑖𝑔𝑛 𝑦𝑖 ≠ 𝑠𝑖𝑔𝑛 𝑓 𝑥𝑖
𝑙𝑜𝑠𝑠 𝑦𝑖 , 𝑓 𝑥𝑖 =ቊ
0 otherwise
▪ Least square loss:
2
𝑙𝑜𝑠𝑠 𝑦𝑖 , 𝑓 𝑥𝑖 = 𝑦𝑖 − 𝑓 𝑥𝑖

10S3001-Certan | Gasal 20/21 | Institut Teknologi Del 45

▪ We calculate 𝐸 𝑡𝑟𝑎𝑖𝑛 the in-sample error (training error or empirical error/risk).
𝑛

𝐸 𝑡𝑟𝑎𝑖𝑛 𝑓 = ෍ 𝑙𝑜𝑠𝑠 𝑦𝑖 , 𝑓 𝑥𝑖
𝑖=1
▪ We aim to have 𝐸 𝑡𝑟𝑎𝑖𝑛 𝑓 small, i.e., minimize 𝐸 𝑡𝑟𝑎𝑖𝑛 𝑓

▪ We hope that 𝐸 𝑡𝑟𝑎𝑖𝑛 𝑓 , the out-sample error (test/true error), will be small too.