0% found this document useful (0 votes)

15 views30 pages

5.3 Supervised & Reinforcement

Uploaded by

jeevith.k2053

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views30 pages

5.3 Supervised & Reinforcement

Uploaded by

jeevith.k2053

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 30

Game Programming

A. Avinash, Ph.D.,
Assistant Professor
School of Computer Science and Engineering (SCOPE)
Vellore Institute of Technology (VIT), Chennai
Introduction to Evolutionary Computation

• Evolutionary Computation is the field of study devoted to the design,

development, and analysis is problem solvers based on natural
selection (simulated evolution).
• Evolution has proven to be a powerful search process.
• Evolutionary Computation has been successfully applied to a wide
range of problems including:
– Aircraft Design,
– Routing in Communications Networks,
– Tracking Windshear,
– Game Playing (Checkers [Fogel])
Introduction to Evolutionary Computation
(Applications cont.)
• Robotics,
• Air Traffic Control,
• Design,
• Scheduling,
• Machine Learning,
• Pattern Recognition,
• Job Shop Scheduling,
• VLSI Circuit Layout,
• Strike Force Allocation,
Introduction to Evolutionary Computation
(Applications cont.)
• Theme Park Tours (Disney Land/World)
• Market Forecasting,
• Egg Price Forecasting,
• Design of Filters and Barriers,
• Data-Mining,
• User-Mining,
• Resource Allocation,
• Path Planning,
• Etc.
Background: Evolutionary Computation
• Evolutionary computation (EC) methods are based on
population of solutions
– Each iteration involves propagating all elements of the
population
– Each member of population (“chromosome”) corresponds
to one value of 
• Genetic algorithms (GAs) are most popular form of EC
• Early work in 1950s and 1960s; influential 1975 book by
John Holland laid foundation for modern implementations
• Population-based structure well suited to parallel
processing
– But infeasible in some real-time applications
Background: EC (cont’d)
• Motivation for EC: Evolution
seems to work well in nature.… Prototype EC Method
perhaps it can be used in Initial Population
optimization
• Three main types of EC Next
Selection
– Genetic Algorithms Iteration
(Generation) Reproduction
– Evolution Strategies
– Evolutionary Programming
Mutation
• Many other types of EC exist
(ant colony, particle swarm,
differential evolution, etc.)
Standard GA Operations
• Selection is the mechanism by which the “parents” are
chosen for producing offspring to be passed into next
generation
• Elitism passes best chromosome(s) to next generation
intact
• Crossover takes parent-pairs from selection step and
creates offspring
• Mutation makes “slight” random modifications to some or
all of the offspring in next generation
Selection
• Parent selection methods based on probability of
selection being increasing function of fitness
• Roulette-wheel selection is common method
– Probability an individual is selected is equal to its fitness
divided by the total fitness in the population
• Problem: Selection probability highly dependent on
units and scaling for fitness function
• Rank selection and tournament selection methods
reduce sensitivity to choice of fitness function
Mutation
• Mutation operator introduces spontaneous variability (as in random search algorithms)
• Mutation generally makes only small changes to solution
• Bit-based coding and real (floating point) coding require different type of mutation
– Bit-based mutation generally involves “flipping” bit(s)
– Real-based mutation often involves adding small (Monte Carlo) random vector to
chromosomes
• Example below shows mutation on one element in chromosome in bit-based coding:
Essential Steps of Basic GA
(Noise-Free Measurements)
Step 0 (initialization) Randomly generate initial population
of N (say) chromosomes and evaluate fitness function.
Step 1 (parent selection) Set Ne = 0 if elitism strategy is not
used; 0 < Ne < N otherwise. Select with replacement
NNe parents from full population.

Step 2 (crossover) For each pair of parents identified in step

1, perform crossover on parents at a randomly chosen
splice point (or points if using multi-point crossover) with
probability Pc .

Essential Steps of GA (cont’d)

Step 3 (replacement and mutation) Replace the non-elite

N Ne chromosomes with the current population of
offspring from step 2. Perform mutation on the bits with a
small probability Pm .

Step 4 (fitness and end test) Compute the fitness values for
the new population of N chromosomes. Terminate the
algorithm if stopping criterion or budget of fitness function
evaluations is met; else return to step 1.
Machine Learning

Arthur Samuel, a pioneer in the field of artificial intelligence and

computer gaming, coined the term “Machine Learning” as
– “Field of study that gives computers the capability to learn
without being explicitly programmed”.

How it is different from traditional

Programming:
 In Traditional Programming, we
feed the Input, Program logic and
run the program to get output.
 In Machine Learning, we feed the
input, output and run it on
machine during training and the
machine creates its own logic,
which is being evaluated while
testing.
Terminologies that one should know before starting
Machine Learning:

 Model: A model is a specific representation learned from data by

applying some machine learning algorithm. A model is also
called hypothesis.

 Feature: A feature is an individual measurable property of our data. A

set of numeric features can be conveniently described by a feature
vector. Feature vectors are fed as input to the model. For example, in
order to predict a fruit, there may be features like color, smell,
taste, etc.

 Target(Label): A target variable or label is the value to be predicted

by our model. For the fruit example discussed in the features section,
the label with each set of input would be the name of the fruit like
apple, orange, banana, etc.
 Training: The idea is to give a set of inputs(features) and it’s expected
outputs(labels), so after training, we will have a model (hypothesis)
that will then map new data to one of the categories trained on.
Supervised Learning: Supervised learning is when the model is getting
trained on a labelled dataset. Labelled dataset is one which have both input
and output parameters. In this type of learning both training and validation
datasets are labelled as shown in the figures below.

Classification Regression
Types of Supervised Learning:
• Classification
• Regression

Classification : It is a Supervised Learning task where output is

having defined labels(discrete value). For example in above Figure
A, Output – Purchased has defined labels i.e. 0 or 1 ; 1 means the
customer will purchase and 0 means that customer won’t purchase. It
can be either binary or multi class classification.
In binary classification, model predicts either 0 or 1 ; yes or no but
in case of multi class classification, model predicts more than one
class.
Example: Gmail classifies mails in more than one classes like
social, promotions, updates, offers.
Regression : It is a Supervised Learning task where output is having
continuous value.
Example in before regression Figure, Output – Wind Speed is not
having any discrete value but is continuous in the particular range.
The goal here is to predict a value as much closer to actual output
value as our model can and then evaluation is done by calculating
error value. The smaller the error the greater the accuracy of our
regression model.
Supervised Learning Algorithms:

 Linear Regression

 Nearest Neighbor

 Gaussian Naive Bayes

 Decision Trees

 Support Vector Machine (SVM)

 Random Forest
Rewards

A reward Rt is a scalar feedback signal

Indicates how well agent is doing at step t
The agent’s job is to maximise cumulative
reward
Reinforcement learning is based on the
Definition
reward(Reward Hypothesis)
hypothesis
All goals can be described by the maximisation of
expected cumulative reward
Do you agree with this
statement?
Agent and Environment

observation action

Ot At At each step t the

agent: Executes
action At Receives
reward Rt
observation Ot
Receives scalar
reward Rt
The environment:
Receives action At
Emits observation Ot+1
Emits scalar reward Rt+1
t increments at env.
step
History and State

The history is the sequence of observations, actions,

rewards

Ht = O1, R1, A1, ..., At−1, Ot , Rt

i.e. all observable variables up to time t

i.e. the sensorimotor stream of a robot or embodied
agent What happens next depends on the history:
The agent selects actions
The environment selects observations/rewards
State is the information used to determine what
happens next Formally, state is a function of the
history:

St = f (Ht )
Environment State

e
observation action The environment state
Ot At St is the environment’s
private representation
i.e. whatever data the
reward Rt environment uses to
pick the next
observation/reward
The environment state is
not usually visible to the
Even if S
t is visible, it
e
agent
may
contain
environment state St
e
irrelevant
information
Agent State

The agent statet Sa is

agent state Sa t
agent’s
the
internal
observation action
representati
i.e. whatever
Ot At
on
information the agent
uses to pick the next
action
reward Rt
i.e. it is the
information used by
reinforcement
learning algorithms
It can be any function
of history:
Sta = f (Ht )
Information State
An information state (a.k.a. Markov state) contains
all useful information from the history.
Definition
A state St is Markov if and only if

P[St+1 | St ] = P[St+1 | S1, ..., St ]

“The future is independent of the past given the

present”

H1:t → St → Ht+1:∞

Once the state is known, the history may be

The
thrownenvironment
away statet Se is
Markov
The history
i.e. The Ht is
state is a sufficient statistic of the future
Major Components of an RL Agent

An RL agent may include one or more of these

components:
Policy: agent’s behaviour function
Value function: how good is each state and/or action
Model: agent’s representation of the environment
Policy

A policy is the agent’s behaviour

It is a map from state to action, e.g.
Deterministic policy: a = π(s)
Stochastic policy: π(a|s) = P[At = a|St =
s]
Value Function

Value function is a prediction of future reward

Used to evaluate the goodness/badness of states
And therefore to select between actions, e.g.

vπ (s) = Eπ Rt+1 + γRt+2 + γ2Rt+3 + ... | St = s

Model

A model predicts what the environment will

do next
P predicts the next state
R predicts the next (immediate) reward, e.g.
P sar ′ = t+1 = s | S =
t s, A =
t a]
P[S
s
Ra = E [R | S = s, A = a]
s t+1 t t
Maze Example

Start
Rewards: -1 per time-
step Actions: N, E, S,
W States: Agent’s
Goal location
Maze Example: Policy

Start

Goal

Arrows represent policy π(s) for each

Maze Example: Value Function

-14 -13 -12 -11 -10 -9

Start -16 - -12 -8

-16 - -6 -7
17

-18 - -5
19

-24 -20 -4 -3

-23 -22 -21 - -2 -1 Goal

Numbers represent value vπ (s) of each

Maze Example: Model

-1 -1 -1 -1 -1 -1 Agent may have an

Start -1 -1 -1 -1 internal model of the
-1 -1 -1
environment
-1
Dynamics: how actions
change the state
-1 -1

-1 -1
Rewards:
from eachhow much
state
Goal
reward
The model may be
imperfect
Grid layout represents transition s
model Pa ′ represent immediate reward
Numbers s
Ra from
s
(samestate
each for all
s
a)

1589372770679_spammer detection fake pople identification on social networks1 (1)
No ratings yet
1589372770679_spammer detection fake pople identification on social networks1 (1)
64 pages
AI_Unit_5_Machine Learning
No ratings yet
AI_Unit_5_Machine Learning
27 pages
Unit-5
No ratings yet
Unit-5
52 pages
Unit-I
No ratings yet
Unit-I
112 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
27 pages
L02 Fundamentals of ML
No ratings yet
L02 Fundamentals of ML
46 pages
Chapter Six Machine Learning
No ratings yet
Chapter Six Machine Learning
39 pages
Machine Learning Basics and Applications for Beginners
No ratings yet
Machine Learning Basics and Applications for Beginners
15 pages
Chapter 1
No ratings yet
Chapter 1
3 pages
ML unit I
No ratings yet
ML unit I
10 pages
Tesla Stock Marketing Price Prediction
No ratings yet
Tesla Stock Marketing Price Prediction
62 pages
ML notes
No ratings yet
ML notes
18 pages
Lecture Slide 01_ML
No ratings yet
Lecture Slide 01_ML
40 pages
Chapter 1 Introduction To Machine Learning
No ratings yet
Chapter 1 Introduction To Machine Learning
29 pages
ML UNIT IV PART II
No ratings yet
ML UNIT IV PART II
9 pages
Chapter-1 Ml Intro
No ratings yet
Chapter-1 Ml Intro
36 pages
P4 - HTZ-1.4 Statistics and Probability
No ratings yet
P4 - HTZ-1.4 Statistics and Probability
3 pages
01 Introduction ML
No ratings yet
01 Introduction ML
60 pages
1 ML M1503-Introduction - ABP
No ratings yet
1 ML M1503-Introduction - ABP
14 pages
ML
No ratings yet
ML
78 pages
Guru Nanak Dev Engineering College, Ludhiana
No ratings yet
Guru Nanak Dev Engineering College, Ludhiana
48 pages
Ch3-Machine Learning
No ratings yet
Ch3-Machine Learning
124 pages
AIML CO - 3,4 NOTES
No ratings yet
AIML CO - 3,4 NOTES
98 pages
Unit - 2 Machine Learning
No ratings yet
Unit - 2 Machine Learning
45 pages
Different Types of Control Systems - An Overview Tutorial
No ratings yet
Different Types of Control Systems - An Overview Tutorial
18 pages
unit 01
No ratings yet
unit 01
32 pages
Unit 4
No ratings yet
Unit 4
8 pages
Visit: Http://Pimpalepatil - Googlepages.co M
No ratings yet
Visit: Http://Pimpalepatil - Googlepages.co M
28 pages
Lab Project
No ratings yet
Lab Project
8 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
20 pages
Day 2 Part 1
No ratings yet
Day 2 Part 1
52 pages
UNIT-I
No ratings yet
UNIT-I
132 pages
Evolutionary Computation and Its Applications: Dr. K.Indira
No ratings yet
Evolutionary Computation and Its Applications: Dr. K.Indira
78 pages
Measure Phase
No ratings yet
Measure Phase
73 pages
Software Development Project Report
No ratings yet
Software Development Project Report
3 pages
Lect6 PDF
No ratings yet
Lect6 PDF
66 pages
2024 Machine Learning Intro
No ratings yet
2024 Machine Learning Intro
50 pages
Machine Learning - ch1
No ratings yet
Machine Learning - ch1
46 pages
Composite Function Chain Rule
No ratings yet
Composite Function Chain Rule
13 pages
61FIN3FMO Revision
No ratings yet
61FIN3FMO Revision
5 pages
Machine Learning_v1 (1)
No ratings yet
Machine Learning_v1 (1)
30 pages
AAM Summer 2024 Question Paper
No ratings yet
AAM Summer 2024 Question Paper
4 pages
Digital Image Processing Assignment
No ratings yet
Digital Image Processing Assignment
5 pages
Module 1
No ratings yet
Module 1
50 pages
Internship Report
No ratings yet
Internship Report
31 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
Crypto 2
No ratings yet
Crypto 2
8 pages
Statistical Thermodynamics
100% (1)
Statistical Thermodynamics
17 pages
GA Main
No ratings yet
GA Main
27 pages
CHAPTER 6 Machine Learning: Objective
No ratings yet
CHAPTER 6 Machine Learning: Objective
29 pages
Course. Introduction To Machine Learning Lecture 1. Introduction To ML
No ratings yet
Course. Introduction To Machine Learning Lecture 1. Introduction To ML
46 pages
Viva & Orals A.I Lab: Brainheaters
No ratings yet
Viva & Orals A.I Lab: Brainheaters
7 pages
Lecture 1
No ratings yet
Lecture 1
36 pages
ML Unit-1 Notes
No ratings yet
ML Unit-1 Notes
23 pages
CS F469 Information Retrieval - Handout
No ratings yet
CS F469 Information Retrieval - Handout
3 pages
Cryptography
No ratings yet
Cryptography
21 pages
Random Number Generators and Their Applications: A Review: June 2019
No ratings yet
Random Number Generators and Their Applications: A Review: June 2019
6 pages
A Sampling of Various Other Learning Methods
No ratings yet
A Sampling of Various Other Learning Methods
34 pages
Fundamentals of Differential Equations 8th Edition Nagle Solutions Manual
100% (12)
Fundamentals of Differential Equations 8th Edition Nagle Solutions Manual
16 pages
Frank Wolfe
No ratings yet
Frank Wolfe
10 pages
unit 1
100% (1)
unit 1
13 pages
Week 4 - Digital Logic Design Lab Term2310
No ratings yet
Week 4 - Digital Logic Design Lab Term2310
1 page
Explore - LeetCodesort
No ratings yet
Explore - LeetCodesort
3 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
89 pages
Value Stream MapTemplate
No ratings yet
Value Stream MapTemplate
4 pages
Chapter-4 - CS-411 Compiler Construction
No ratings yet
Chapter-4 - CS-411 Compiler Construction
8 pages
A Preliminary Idea On Machine Learning
No ratings yet
A Preliminary Idea On Machine Learning
40 pages
About: Tag: Tree Data Structure
No ratings yet
About: Tag: Tree Data Structure
5 pages
Forensic Face Sketch Recognition Using Computer Vision: Vineet Srivastava
No ratings yet
Forensic Face Sketch Recognition Using Computer Vision: Vineet Srivastava
4 pages
Machine Learning in Embedded System
No ratings yet
Machine Learning in Embedded System
56 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
ML Doc1
No ratings yet
ML Doc1
14 pages
Chapter 5
No ratings yet
Chapter 5
3 pages
Intro Machine Learning
No ratings yet
Intro Machine Learning
4 pages
Condition Monitoring of Rotating Equipme
No ratings yet
Condition Monitoring of Rotating Equipme
14 pages
Machine Learning: Presentation By: C. Vinoth Kumar SSN College of Engineering
100% (1)
Machine Learning: Presentation By: C. Vinoth Kumar SSN College of Engineering
15 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
4 pages
Introduction To Machine Learning For Beginners
No ratings yet
Introduction To Machine Learning For Beginners
5 pages
Kruse - Data Structures and Program Design in C 1991
100% (2)
Kruse - Data Structures and Program Design in C 1991
272 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
32 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
CS3401 - Algorithm
No ratings yet
CS3401 - Algorithm
37 pages
Matrix-Matrix Multiplication
No ratings yet
Matrix-Matrix Multiplication
3 pages
Concept Learning
No ratings yet
Concept Learning
85 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Evolutionary Robotics: Fundamentals and Applications
From Everand
Evolutionary Robotics: Fundamentals and Applications
Fouad Sabry
No ratings yet
Differential Evolution: Fundamentals and Applications
From Everand
Differential Evolution: Fundamentals and Applications
Fouad Sabry
No ratings yet
MACHINELEARING UNIT 1material
100% (1)
MACHINELEARING UNIT 1material
64 pages
AI Lab MAnual Final
No ratings yet
AI Lab MAnual Final
44 pages

5.3 Supervised & Reinforcement

Uploaded by

5.3 Supervised & Reinforcement

Uploaded by

Game Programming

• Evolutionary Computation is the field of study devoted to the design,

Step 2 (crossover) For each pair of parents identified in step

Step 3 (replacement and mutation) Replace the non-elite

Arthur Samuel, a pioneer in the field of artificial intelligence and

How it is different from traditional

 Model: A model is a specific representation learned from data by

 Feature: A feature is an individual measurable property of our data. A

 Target(Label): A target variable or label is the value to be predicted

Classification : It is a Supervised Learning task where output is

 Gaussian Naive Bayes

 Support Vector Machine (SVM)

A reward Rt is a scalar feedback signal

Ot At At each step t the

The history is the sequence of observations, actions,

Ht = O1, R1, A1, ..., At−1, Ot , Rt

i.e. all observable variables up to time t

The agent statet Sa is

P[St+1 | St ] = P[St+1 | S1, ..., St ]

“The future is independent of the past given the

Once the state is known, the history may be

An RL agent may include one or more of these

A policy is the agent’s behaviour

Value function is a prediction of future reward

vπ (s) = Eπ Rt+1 + γRt+2 + γ2Rt+3 + ... | St = s

A model predicts what the environment will

Arrows represent policy π(s) for each

-14 -13 -12 -11 -10 -9

Start -16 - -12 -8

-23 -22 -21 - -2 -1 Goal

Numbers represent value vπ (s) of each

-1 -1 -1 -1 -1 -1 Agent may have an

You might also like