0% found this document useful (0 votes)
15 views30 pages

5.3 Supervised & Reinforcement

Uploaded by

jeevith.k2053
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views30 pages

5.3 Supervised & Reinforcement

Uploaded by

jeevith.k2053
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 30

Game Programming

A. Avinash, Ph.D.,
Assistant Professor
School of Computer Science and Engineering (SCOPE)
Vellore Institute of Technology (VIT), Chennai
Introduction to Evolutionary Computation

• Evolutionary Computation is the field of study devoted to the design,


development, and analysis is problem solvers based on natural
selection (simulated evolution).
• Evolution has proven to be a powerful search process.
• Evolutionary Computation has been successfully applied to a wide
range of problems including:
– Aircraft Design,
– Routing in Communications Networks,
– Tracking Windshear,
– Game Playing (Checkers [Fogel])
Introduction to Evolutionary Computation
(Applications cont.)
• Robotics,
• Air Traffic Control,
• Design,
• Scheduling,
• Machine Learning,
• Pattern Recognition,
• Job Shop Scheduling,
• VLSI Circuit Layout,
• Strike Force Allocation,
Introduction to Evolutionary Computation
(Applications cont.)
• Theme Park Tours (Disney Land/World)
• Market Forecasting,
• Egg Price Forecasting,
• Design of Filters and Barriers,
• Data-Mining,
• User-Mining,
• Resource Allocation,
• Path Planning,
• Etc.
Background: Evolutionary Computation
• Evolutionary computation (EC) methods are based on
population of solutions
– Each iteration involves propagating all elements of the
population
– Each member of population (“chromosome”) corresponds
to one value of 
• Genetic algorithms (GAs) are most popular form of EC
• Early work in 1950s and 1960s; influential 1975 book by
John Holland laid foundation for modern implementations
• Population-based structure well suited to parallel
processing
– But infeasible in some real-time applications
Background: EC (cont’d)
• Motivation for EC: Evolution
seems to work well in nature.… Prototype EC Method
perhaps it can be used in Initial Population
optimization
• Three main types of EC Next
Selection
– Genetic Algorithms Iteration
(Generation) Reproduction
– Evolution Strategies
– Evolutionary Programming
Mutation
• Many other types of EC exist
(ant colony, particle swarm,
differential evolution, etc.)
Standard GA Operations
• Selection is the mechanism by which the “parents” are
chosen for producing offspring to be passed into next
generation
• Elitism passes best chromosome(s) to next generation
intact
• Crossover takes parent-pairs from selection step and
creates offspring
• Mutation makes “slight” random modifications to some or
all of the offspring in next generation
Selection
• Parent selection methods based on probability of
selection being increasing function of fitness
• Roulette-wheel selection is common method
– Probability an individual is selected is equal to its fitness
divided by the total fitness in the population
• Problem: Selection probability highly dependent on
units and scaling for fitness function
• Rank selection and tournament selection methods
reduce sensitivity to choice of fitness function
Mutation
• Mutation operator introduces spontaneous variability (as in random search algorithms)
• Mutation generally makes only small changes to solution
• Bit-based coding and real (floating point) coding require different type of mutation
– Bit-based mutation generally involves “flipping” bit(s)
– Real-based mutation often involves adding small (Monte Carlo) random vector to
chromosomes
• Example below shows mutation on one element in chromosome in bit-based coding:
Essential Steps of Basic GA
(Noise-Free Measurements)
Step 0 (initialization) Randomly generate initial population
of N (say) chromosomes and evaluate fitness function.
Step 1 (parent selection) Set Ne = 0 if elitism strategy is not
used; 0 < Ne < N otherwise. Select with replacement
NNe parents from full population.

Step 2 (crossover) For each pair of parents identified in step


1, perform crossover on parents at a randomly chosen
splice point (or points if using multi-point crossover) with
probability Pc .

Essential Steps of GA (cont’d)

Step 3 (replacement and mutation) Replace the non-elite


N Ne chromosomes with the current population of
offspring from step 2. Perform mutation on the bits with a
small probability Pm .

Step 4 (fitness and end test) Compute the fitness values for
the new population of N chromosomes. Terminate the
algorithm if stopping criterion or budget of fitness function
evaluations is met; else return to step 1.
Machine Learning

Arthur Samuel, a pioneer in the field of artificial intelligence and


computer gaming, coined the term “Machine Learning” as
– “Field of study that gives computers the capability to learn
without being explicitly programmed”.

How it is different from traditional


Programming:
 In Traditional Programming, we
feed the Input, Program logic and
run the program to get output.
 In Machine Learning, we feed the
input, output and run it on
machine during training and the
machine creates its own logic,
which is being evaluated while
testing.
Terminologies that one should know before starting
Machine Learning:

 Model: A model is a specific representation learned from data by


applying some machine learning algorithm. A model is also
called hypothesis.

 Feature: A feature is an individual measurable property of our data. A


set of numeric features can be conveniently described by a feature
vector. Feature vectors are fed as input to the model. For example, in
order to predict a fruit, there may be features like color, smell,
taste, etc.

 Target(Label): A target variable or label is the value to be predicted


by our model. For the fruit example discussed in the features section,
the label with each set of input would be the name of the fruit like
apple, orange, banana, etc.
 Training: The idea is to give a set of inputs(features) and it’s expected
outputs(labels), so after training, we will have a model (hypothesis)
that will then map new data to one of the categories trained on.
Supervised Learning: Supervised learning is when the model is getting
trained on a labelled dataset. Labelled dataset is one which have both input
and output parameters. In this type of learning both training and validation
datasets are labelled as shown in the figures below.

Classification Regression
Types of Supervised Learning:
• Classification
• Regression

Classification : It is a Supervised Learning task where output is


having defined labels(discrete value). For example in above Figure
A, Output – Purchased has defined labels i.e. 0 or 1 ; 1 means the
customer will purchase and 0 means that customer won’t purchase. It
can be either binary or multi class classification.
In binary classification, model predicts either 0 or 1 ; yes or no but
in case of multi class classification, model predicts more than one
class.
Example: Gmail classifies mails in more than one classes like
social, promotions, updates, offers.
Regression : It is a Supervised Learning task where output is having
continuous value.
Example in before regression Figure, Output – Wind Speed is not
having any discrete value but is continuous in the particular range.
The goal here is to predict a value as much closer to actual output
value as our model can and then evaluation is done by calculating
error value. The smaller the error the greater the accuracy of our
regression model.
Supervised Learning Algorithms:

 Linear Regression

 Nearest Neighbor

 Gaussian Naive Bayes

 Decision Trees

 Support Vector Machine (SVM)

 Random Forest
Rewards

A reward Rt is a scalar feedback signal


Indicates how well agent is doing at step t
The agent’s job is to maximise cumulative
reward
Reinforcement learning is based on the
Definition
reward(Reward Hypothesis)
hypothesis
All goals can be described by the maximisation of
expected cumulative reward
Do you agree with this
statement?
Agent and Environment

observation action

Ot At At each step t the


agent: Executes
action At Receives
reward Rt
observation Ot
Receives scalar
reward Rt
The environment:
Receives action At
Emits observation Ot+1
Emits scalar reward Rt+1
t increments at env.
step
History and State

The history is the sequence of observations, actions,


rewards

Ht = O1, R1, A1, ..., At−1, Ot , Rt

i.e. all observable variables up to time t


i.e. the sensorimotor stream of a robot or embodied
agent What happens next depends on the history:
The agent selects actions
The environment selects observations/rewards
State is the information used to determine what
happens next Formally, state is a function of the
history:

St = f (Ht )
Environment State

e
observation action The environment state
Ot At St is the environment’s
private representation
i.e. whatever data the
reward Rt environment uses to
pick the next
observation/reward
The environment state is
not usually visible to the
Even if S
t is visible, it
e
agent
may
contain
environment state St
e
irrelevant
information
Agent State

The agent statet Sa is


agent state Sa t
agent’s
the
internal
observation action
representati
i.e. whatever
Ot At
on
information the agent
uses to pick the next
action
reward Rt
i.e. it is the
information used by
reinforcement
learning algorithms
It can be any function
of history:
Sta = f (Ht )
Information State
An information state (a.k.a. Markov state) contains
all useful information from the history.
Definition
A state St is Markov if and only if

P[St+1 | St ] = P[St+1 | S1, ..., St ]

“The future is independent of the past given the


present”

H1:t → St → Ht+1:∞

Once the state is known, the history may be


The
thrownenvironment
away statet Se is
Markov
The history
i.e. The Ht is
state is a sufficient statistic of the future
Major Components of an RL Agent

An RL agent may include one or more of these


components:
Policy: agent’s behaviour function
Value function: how good is each state and/or action
Model: agent’s representation of the environment
Policy

A policy is the agent’s behaviour


It is a map from state to action, e.g.
Deterministic policy: a = π(s)
Stochastic policy: π(a|s) = P[At = a|St =
s]
Value Function

Value function is a prediction of future reward


Used to evaluate the goodness/badness of states
And therefore to select between actions, e.g.

vπ (s) = Eπ Rt+1 + γRt+2 + γ2Rt+3 + ... | St = s


Model

A model predicts what the environment will


do next
P predicts the next state
R predicts the next (immediate) reward, e.g.
P sar ′ = t+1 = s | S =
t s, A =
t a]
P[S
s
Ra = E [R | S = s, A = a]
s t+1 t t
Maze Example

Start
Rewards: -1 per time-
step Actions: N, E, S,
W States: Agent’s
Goal location
Maze Example: Policy

Start

Goal

Arrows represent policy π(s) for each


Maze Example: Value Function

-14 -13 -12 -11 -10 -9

Start -16 - -12 -8


15

-16 - -6 -7
17

-18 - -5
19

-24 -20 -4 -3

-23 -22 -21 - -2 -1 Goal


22

Numbers represent value vπ (s) of each


Maze Example: Model

-1 -1 -1 -1 -1 -1 Agent may have an


Start -1 -1 -1 -1 internal model of the
-1 -1 -1
environment
-1
Dynamics: how actions
change the state
-1 -1

-1 -1
Rewards:
from eachhow much
state
Goal
reward
The model may be
imperfect
Grid layout represents transition s
model Pa ′ represent immediate reward
Numbers s
Ra from
s
(samestate
each for all
s
a)

You might also like