0% found this document useful (0 votes)
10 views

Military AI-Week 02-Key Concept Machine Learning

Uploaded by

Adhi Kusumadjati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Military AI-Week 02-Key Concept Machine Learning

Uploaded by

Adhi Kusumadjati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 84

Key Concept of Machine Learning

Military Artificial Intelligence


Prof. Dr. Eng. Wisnu Jatmiko, S.T., M.Kom.
Dr. Ario Yudo Husodo, S.T., M.T.
Grafika Jati, S.Kom., M.Kom.
© Fasilkom UI - 2023
Discussion Topic
Outline
❑What is Machine Learning?
❑Supervised Learning
❑Decision Trees
❑Linear Regression
❑Logistic Regression
❑Naive Bayes Classifier
❑Support Vector Machine
❑Nearest Neighbor Methods
❑Dimensionality Reduction: Principal Component Analysis
❑K-Means Clustering
What is Machine Learning?
Current View of ML Disciplines
History of Machine Learning
• 1950s
– Samuel’s checker player
– Selfridge’s Pandemonium
• 1960s:
– Neural networks: Perceptron
– Pattern recognition
– Learning in the limit theory
– Minsky and Papert prove limitations of Perceptron
• 1970s:
– Symbolic concept induction
– Winston’s arch learner
– Expert systems and the knowledge acquisition bottleneck
– Quinlan’s ID3
– Michalski’s AQ and soybean diagnosis
– Scientific discovery with BACON
– Mathematical discovery with AM

51
Slide credit: Ray Mooney
History of Machine Learning (cont.)
• 1980s:
– Advanced decision tree and rule learning
– Explanation-based Learning (EBL)
– Learning and planning and problem solving
– Utility problem
– Analogy
– Cognitive architectures
– Resurgence of neural networks (connectionism, backpropagation)
– Valiant’s PAC Learning Theory
– Focus on experimental methodology
• 1990s
– Data mining
– Adaptive software agents and web applications
– Text learning
– Reinforcement learning (RL)
– Inductive Logic Programming (ILP)
– Ensembles: Bagging, Boosting, and Stacking
– Bayes Net learning
52
Slide credit: Ray Mooney
History of Machine Learning (cont.)
• 2000s
– Support vector machines & kernel methods
– Graphical models
– Statistical relational learning
– Transfer learning
– Sequence labeling
– Collective classification and structured outputs
– Computer Systems Applications (Compilers, Debugging, Graphics, Security)
– E-mail management
– Personalized assistants that learn
– Learning in robotics and vision
• 2010s
– Deep learning systems
– Learning for big data
– Bayesian methods
– Multi-task & lifelong learning
– Applications to vision, speech, social networks, learning to read, etc.
– ???
53
Based on slide by Ray Mooney
Machine Learning as Scientific Field
A scientific field that:
⚫ research fundamental principles

⚫ for developing algorithms

⚫ to predict the future based on data

“Learning from Data”


Overwhelmed by Data
What is Machine Learning?
“Learning is any process by which a system improves
performance from experience.”
- Herbert Simon

Definition by Tom Mitchell (1998):


Machine Learning is the study of algorithms that
• improve their performance P
• at some task T
• with experience E.
A well-defined learning task is given by <P, T, E>.
3
Traditional Programming

Data
Computer Output
Program

Machine Learning

Data
Computer Program
Output

4
Slide credit: Pedro Domingos
When Do We Use Machine Learning?
ML is used when:
• Human expertise does not exist (navigating on Mars)
• Humans can’t explain their expertise (speech recognition)
• Models must be customized (personalized medicine)
• Models are based on huge amounts of data (genomics)

Learning isn’t always useful:


• There is no need to “learn” to calculate payroll
5
Based on slide by E. Alpaydin
A classic example of a task that requires machine learning:
It is very hard to say what makes a 2

6
Slide credit: Geoffrey Hinton
Some more examples of tasks that are best
solved by using a learning algorithm
• Recognizing patterns:
– Facial identities or facial expressions
– Handwritten or spoken words
– Medical images
• Generating patterns:
– Generating images or motion sequences
• Recognizing anomalies:
– Unusual credit card transactions
– Unusual patterns of sensor readings in a nuclear power plant
• Prediction:
– Future stock prices or currency exchange rates
7
Slide credit: Geoffrey Hinton
Sample Applications
• Web search
• Computational biology
• Finance
• E-commerce
• Space exploration
• Robotics
• Information extraction
• Social networks
• Debugging software
• [Your favorite area]

8
Slide credit: Pedro Domingos
Samuel’s Checkers-Player
“Machine Learning: Field of study that gives
computers the ability to learn without being
explicitly programmed.” -Arthur Samuel (1959)

9
Defining the Learning Task
Improve on task T, with respect to
performance metric P, based on experience E
T: Playing checkers
P: Percentage of games won against an arbitrary opponent
E: Playing practice games against itself

T: Recognizing hand-written words


P: Percentage of words correctly classified
E: Database of human-labeled images of handwritten words

T: Driving on four-lane highways using vision sensors


P: Average distance traveled before a human-judged error
E: A sequence of images and steering commands recorded while
observing a human driver.

T: Categorize email messages as spam or legitimate.


P: Percentage of email messages correctly classified.
E: Database of emails, some with human-given labels
10
Slide credit: Ray Mooney
State of the Art Applications of
Machine Learning

11
Autonomous Cars

• Nevada made it legal for


autonomous cars to drive on
roads in June 2011
• As of 2013, four states (Nevada,
Florida, California, and
Michigan) have legalized
autonomous cars
Penn’s Autonomous Car →
(Ben Franklin Racing Team) 12
Autonomous Car Sensors

13
Autonomous Car Technology
Path
Planning

Laser Terrain Mapping

Learning from Human Drivers


Adaptive Vision

Sebastian

Stanley

Images and movies taken from Sebastian Thrun’s multimedia w1e4bsite.


Amazing Deep Learning
Scene Labeling via Deep Learning

[Farabet et al. ICML 2012, PAMI 2013] 19


Amazing Deep Learning
Amazing Deep Learning
Types of Learning

23
Types of Learning

• Supervised (inductive) learning


– Given: training data + desired outputs (labels)
• Unsupervised learning
– Given: training data (without desired outputs)
• Semi-supervised learning
– Given: training data + a few desired outputs
• Reinforcement learning
– Rewards from sequence of actions

24
Based on slide by Pedro Domingos
Machine Learning Types and Its Applications
Supervised Learning - Regression Questions
• Regression: reflects the features of attribute values of samples in a sample dataset. The
dependency between attribute values is discovered by expressing the relationship of sample
mapping through functions.
• How much will I benefit from the stock next week?
• What's the temperature on Tuesday?
Supervised Learning: Regression
• Given (x 1 , y1), (x 2 , y2), ..., (x n , yn)
• Learn a function f(x) to predict y given x
– y is real-valued == regression
9
8
September Arctic Sea Ice Extent

7
(1,000,000 sq km)

6
5
4
3
2
1
0
1970 1980 1990 2000 2010 2020
Year
26
Data from G. Witt. Journal of Statistics Education, Volume 21, Number 1 (2013)
Supervised : Classification
Data feature Label

Feature 1 ... Feature n Goal

Supervised learning
Feature 1 ... Feature n Goal
algorithm

Feature 1 ... Feature n Goal

Wind Enjoy
Weather Temperature
Speed Sports
Sunny Warm Strong Yes
Rainy Cold Fair No
Sunny Cold Weak Yes
Supervised Learning: Classification
• Given (x 1 , y1), (x 2 , y2), ..., (x n , yn)
• Learn a function f(x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)

1(Malignant)

0(Benign)
Tumor Size

27
Based on example by Andrew Ng
Supervised Learning: Classification
• Given (x 1 , y1), (x 2 , y2), ..., (x n , yn)
• Learn a function f(x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)

1(Malignant)

0(Benign)
Tumor Size

Based on example by Andrew Ng


Tumor Size 28
Supervised Learning: Classification
• Given (x 1 , y1), (x 2 , y2), ..., (x n , yn)
• Learn a function f(x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)

1(Malignant)

0(Benign)
Tumor Size
Predict Benign Predict Malignant

Based on example by Andrew Ng


Tumor Size 29
Supervised Learning
• x can be multi-dimensional
– Each dimension corresponds to an attribute

- Clump Thickness
- Uniformity of Cell Size
Age - Uniformity of Cell Shape

Tumor Size

30
Based on example by Andrew Ng
Unsupervised Learning - Clustering Questions
• Clustering: classifies samples in a sample dataset into several
categories based on the clustering model. The similarity of samples
belonging to the same category is high.
• Which audiences like to watch movies
of the same subject?
• Which of these components are
damaged in a similar way?
Unsupervised Learning
• Given x 1 , x 2 , ..., x n (without labels)
• Output hidden structure behind the x’s
– E.g., clustering

31
Unsupervised Learning
Genomics application: group individuals by genetic similarity
Genes

Individuals 32
[Source: Daphne Koller]
Unsupervised Learning

Organize computing clusters Social network analysis

Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison)

Market segmentation Astronomical data analysis 33


Slide credit: Andrew Ng
Unsupervised Learning
• Independent component analysis – separate a
combined signal into its original sources

34
Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html
Unsupervised Learning
• Independent component analysis – separate a
combined signal into its original sources

35
Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html
Semi-Supervised Learning
Data Feature Label

Feature 1 ... Feature n Goal

Semi-supervised
Feature 1 ... Feature n Unknown
learning algorithms

Feature 1 ... Feature n Unknown

Wind Enjoy
Weather Temperature
Speed Sports
Sunny Warm Strong Yes
Rainy Cold Fair /
Sunny Cold Weak /
Reinforcement Learning
• Given a sequence of states and actions with
(delayed) rewards, output a policy
– Policy is a mapping from states → actions that
tells you what to do in a given state
• Examples:
– Credit assignment problem
– Game playing
– Robot in a maze
– Balance a pole on your hand

36
The Agent-Environment Interface

Agent and environment interact at discrete time steps : t = 0, 1, 2, K


Agent observes state at step t : st S
produces action at step t : at  A(st )
gets resulting reward : rt+1 
and resulting next state : st +1

... rt +1 s rt +2 s rt +3 s ...
st a t +1 at +1 t +2 at +2 t +3 at +3
t
37
Slide credit: Sutton & Barto
Reinforcement Learning

https://www.youtube.com/watch?v=4cgWya-wjgY 38
Inverse Reinforcement Learning
• Learn policy from user demonstrations

Stanford Autonomous Helicopter


http://heli.stanford.edu/
https://www.youtube.com/watch?v=VCdxqn0fcnE
39
Learning problems

Supervised learning problems:

• Classification aims to assign each input to one of a finite number of categories (target values
are discrete).
• Regression aims to assign each input to a value in a continuous set of possible target values
• Probability estimation is a special case of regression where target values range between 0
and 1 and represent probability values.

Unsupervised learning problems:

• Clustering aims to discover groups of similar examples within the input space.
• Density estimation aims to determine the distribution of data within the input space.
• Projection/dimensionality reduction aims to obtain a representation of data in a dimension
different (typically lower) than its original dimension.

Reinforcement learning problem:

• Credit assignment aims to determine a way to reward (or punish) every action the algorithm
provides so that at the end of the action sequence, it arrives at the best/correct answer.

12
Machine Learning Process

Feature Model
Data Model Model
Data cleansing extraction and deployment and
collection training evaluation
selection integration

Feedback and iteration


Basic Machine Learning Concept — Dataset
⚫ Dataset: a collection of data used in machine learning tasks. Each data record is called a
sample. Events or attributes that reflect the performance or nature of a sample in a
particular aspect are called features.
⚫ Training set: a dataset used in the training process, where each sample is referred to as a
training sample. The process of creating a model from data is called learning (training).
⚫ Test set: Testing refers to the process of using the model obtained after learning for
prediction. The dataset used is called a test set, and each sample is called a test sample.
Checking Data Overview

Feature 1 Feature 2 Feature 3 Label

No. Area School Districts Direction House Price

1 100 8 South 1000

2 120 9 Southwest 1300


Training
set
3 60 6 North 700

4 80 9 Southeast 1100

Test set 5 95 3 South 850


Importance of Data Processing
Data is crucial to models. It is the ceiling of model capabilities. Without good data, there is no good
model.

Data
Data
Data cleansing
preprocessing normalization
Normalize data to
reduce noise and
Fill in missing values, improve model
and detect and accuracy.
eliminate causes of
dataset exceptions. Data dimension
reduction
Simplify data
attributes to avoid
dimension explosion.
Data Cleansing

Most machine learning models process features, which are usually numeric representations of input
variables that can be used in the model.
In most cases, the collected data can be used by algorithms only after being preprocessed. The
preprocessing operations include the following:
Data filtering
Processing of lost data
Processing of possible exceptions, errors, or abnormal values
Combination of data from multiple data sources
Data consolidation
Dirty Data (1)
• Generally, real data may have some quality
problems.
• Incompleteness: contains missing values or the data
that lacks attributes
• Noise: contains incorrect records or exceptions.
• Inconsistency: contains inconsistent records.
Dirty Data (2)
#Stu
IsTea
# Id Name Birthday Gender dent Country City
cher
s

1 111 John 31/12/1990 M 0 0 Ireland Dublin

2 222 Mery 15/10/1978 F 1 15 Iceland Missing value

3 333 Alice 19/04/2000 F 0 0 Spain Madrid

4 444 Mark 01/11/1997 M 0 0 France Paris

5 555 Alex 15/03/2000 A 1 23 Germany Berlin Invalid value

6 555 Peter 1983-12-01 M 1 10 Italy Rome

7 777 Calvin 05/05/1995 M 0 0 Italy Italy Value that should


be in another
8 888 Roxane 03/08/1948 F 0 0 Portugal Lisbon
column
Genev
9 999 Anne 05/09/1992 F 0 5 Switzerland
Invalid duplicate item a

10 101010 Paul 14/11/1992 M 1 26 Ytali Rome Misspelling

Incorrect format Attribute dependency


Data Conversion

After being preprocessed, the data needs to be converted into a representation form
suitable for the machine learning model. Common data conversion forms include the
following:
With respect to classification, category data is encoded into a corresponding numerical
representation.
Value data is converted to category data to reduce the value of variables (for age segmentation).
Other data
In the text, the word is converted into a word vector through word embedding (generally using the
word2vec model, BERT model, etc).
Process image data (color space, grayscale, geometric change, Haar feature, and image enhancement)
Feature engineering
Normalize features to ensure the same value ranges for input variables of the same model.
Feature expansion: Combine or convert existing variables to generate new features, such as the
average.
Machine Learning Model
Machine learning

Supervised learning Unsupervised learning

Classification Regression Clustering Others

Logistic regression Linear regression K-means Correlation rule


Hierarchical Principal component
SVM SVM
clustering analysis (PCA)
Neural network Neural network Density-based Gaussian mixture
clustering model (GMM)
Decision tree Decision tree

Random forest Random forest

GBDT GBDT

KNN

Naive Bayes
Linear Regression (1)
⚫ Linear regression: a statistical analysis method to determine the quantitative relationships between
two or more variables through regression analysis in mathematical statistics.
⚫ Linear regression is a type of supervised learning.

Unary linear regression Multi-dimensional linear regression


Linear Regression (2)
⚫ The model function of linear regression is as follows, where 𝑤 indicates the weight parameter, 𝑏 indicates the bias, and 𝑥
indicates the sample attribute.

hw ( x) = wT x + b
⚫ The relationship between the value predicted by the model and actual value is as follows, where 𝑦 indicates the actual
value, and 𝜀 indicates the error.
y = w x+b+
T

⚫ The error 𝜀 is influenced by many factors independently. According to the central limit theorem, the error 𝜀 follows normal
distribution. According to the normal distribution function and maximum likelihood estimation, the loss function of linear
regression is as follows:
1
J ( w) =  ( hw ( x) − y )
2

2m
⚫ To make the predicted value close to the actual value, we need to minimize the loss value. We can use the gradient descent
method to calculate the weight parameter 𝑤 when the loss function reaches the minimum, and then complete model
building.
Discussion Topic
The Essence of ML
Traditional Programming

Data
Computer Output
Program

Machine Learning
Data
Computer Program
Output
ML Analogy

Seeds
(Algorithm)
Nutrients
(Data)

Gardener
(You)

Plants (Program)
General Approach in ML

https://towardsdatascience.com/4-machine-learning-approaches-that-every-data-scientist-should-know-e3a9350ec0b9
Review Types of ML
Nearest Neighbor
Notation:

Rain
No Rain

? Unknown
Pressure

Humidity
k-Nearest-Neighbor (k-NN) Classification 66

k-Nearest-Neighbor is a classification method that maps inputs into the most common
class (the majority) of the closest “k” data points from the input.

Example, k=3
? =1
Pressure

=2

Humidity
Defining Distance in k-NN 67
The Value of K in k-NN 68
Let’s Code K-NN 69

• Link Dataset → https://drive.google.com/file/d/1bjlaN7dug-


kPggA2ed6wWZIZhJV01tiy/view?usp=sharing

Task:
Given some characteristics, determine the fruit class
Let’s Code K-NN (2) 70
Let’s Code K-NN (3) 71
Let’s Code K-NN (4) 72
Let’s Code K-NN (5) 73
Let’s Code K-NN (6) 74
Linear Regression 75

• Linear regression is a regression model that estimates the relationship


between one independent variable and one dependent variable using a
straight line.
Usage of Regression 76

• Stock market prediction


Let’s Code Regression 77

• Link Dataset →
https://drive.google.com/file/d/1_r3huDi9I6Oj0BXfZBGJlqGjAxwDWRGy/
view?usp=sharing

Task:
Given an area data, determine the estimated price
Let’s Code Regression (2) 78
Let’s Code Regression (3) 79
Let’s Code Regression (4) 80
Clustering – K Means 81

• K-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest
mean, serving as a prototype of the cluster.
• For the k-means algorithm, specify the final number of clusters (k). Then, divide n data objects into k clusters. The clusters obtained
meet the following conditions: (1) Objects in the same cluster are highly similar. (2) The similarity of objects in different clusters is
small.
• K-means algorithm identifies k number of centroids, and then allocates every data point to the nearest cluster, while keeping the
centroids as small as possible.
• The ‘means’ in the K-means refers to averaging of the data; that is, finding the centroid.
Let’s Code K-Means 82
Let’s Code K-Means (2) 83
Thank You

You might also like