100% found this document useful (2 votes)
126 views

ML Module 5 2022 PDF

This document provides an overview of ensemble learning techniques, including bagging, boosting, and Q-learning. It discusses how bagging creates multiple models on bootstrapped subsets of data and combines results, while boosting focuses on correcting errors of previous models by increasing weights of misclassified examples. Q-learning is then explained as a type of reinforcement learning where an agent takes actions in an environment and receives rewards to learn an optimal policy. Real-world examples of modeling reinforcement learning tasks are also presented.

Uploaded by

january
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
126 views

ML Module 5 2022 PDF

This document provides an overview of ensemble learning techniques, including bagging, boosting, and Q-learning. It discusses how bagging creates multiple models on bootstrapped subsets of data and combines results, while boosting focuses on correcting errors of previous models by increasing weights of misclassified examples. Q-learning is then explained as a type of reinforcement learning where an agent takes actions in an environment and receives rewards to learn an optimal policy. Real-world examples of modeling reinforcement learning tasks are also presented.

Uploaded by

january
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

MACHINE LEARNING(CS30110)

6TH SEM CSE

Dr. Rojalina Priyadarshini


H.O.D, Comp.Sc. & Engg.
Mail: [email protected]
7008730761, 9437937546
Module-5 Overview

Boosting Q-Learning
Ensembel
Learning
Topic Topic

2 3 4 5
1

Bagging One class


Topic classifier Topic
Topic
ENSEMBLE LEARNING

▪ Ensemble methods is a machine learning technique that


combines several base models in order to produce one optimal
predictive model.
▪ Multiple models are combined to improve the overall
performance of the predictive system.
▪ In learning models, noise, variance, and bias are the major
sources of error.
▪ The idea behind ensemble learning is that by combining the
predictions of multiple models, the resulting ensemble model
can achieve better accuracy and robustness than any of the
individual models.
SIMPLE ENSEMBLE LEARNING

1) Max Voting:

▪ The max voting method is generally used for classification problems.


▪ In this technique, multiple models are used to make predictions for each data point.
▪ The predictions by each model are considered as a ‘vote’. The predictions which we
get from the majority of the models are used as the final prediction.
▪ Example: when you asked 5 of your colleagues to rate your movie (out of 5); we’ll
assume three of them rated it as 4 while two of them gave it a 5. Since the majority
gave a rating of 4, the final rating will be taken as 4. You can consider this as taking
the mode of all the predictions
Colleague Colleague Colleague Colleague Colleague Final
1 2 3 4 5 rating

5 4 5 4 4 4
SIMPLE ENSEMBLE LEARNING

2) Averaging: In this method, we take an average of predictions from all the models and
use it to make the final prediction. Averaging can be used for making predictions in
regression problems or while calculating probabilities for classification problems.

For example, in the below case, the averaging method would take the average of all the
values.
i.e. (5+4+5+4+4)/5 = 4.4

Colleague 1 Colleague 2 Colleague 3 Colleague 4 Colleague 5 Final rating

5 4 5 4 4 4.4
SIMPLE ENSEMBLE LEARNING

3) Weighted Averaging: This is an extension of the averaging method. All models are
assigned different weights defining the importance of each model for prediction.

For instance, if two of the colleagues are critics, while others have no prior experience in
this field, then the answers by these two friends are given more importance as
compared to the other people.

The result is calculated as [(5*0.23) + (4*0.23) + (5*0.18) + (4*0.18) + (4*0.18)] = 4.41.

Colleague Colleague Colleague Colleague Colleague Final


1 2 3 4 5 rating

weight 0.23 0.23 0.18 0.18 0.18

rating 5 4 5 4 4 4.41
ADVANCED ENSEMBLE TECHNIQUES:
BAGGING
1) Bagging: The idea behind bagging is combining the results
of multiple models to get a generalized result. Here’s a
question:
▪ If you create all the models on the same set of data and
combine there is a high chance that these models will give
the same result. To solve this, bootstrapping technique is
used.
▪ Bootstrapping is a sampling technique in which we create
subsets of observations from the original dataset, with
replacement. The size of the subsets is the same as the
size of the original set.
▪ Bagging (or Bootstrap Aggregating) technique uses these
subsets (bags) to get a fair idea of the distribution
(complete set). The size of subsets created for bagging may
be less than the original set.
BOOTSTRAPING

Sampling

Bootstrap sampling
TYPES OF ENSEMBLE TECHNIQUES:
BAGGING
1. Multiple subsets are created from the original dataset,
selecting observations with replacement.
2. A base model (weak model) is created on each of these
subsets.
3. The models run in parallel and are independent of each
other.
4. The final predictions are determined by combining the
predictions from all the models.
ADVANCED ENSEMLE LEARNING:
BOOSTING ALGORITHM
Question: If a data point is incorrectly predicted by the Steps:
first model, and then the next (probably all models), will 1. A subset is created from the original dataset.
combining the predictions provide better results?
2. Initially, all data points are given equal weights.
Such situations are taken care of by boosting. 3. A base model is created on this subset.
4. This model is used to make predictions on the whole
Boosting:
dataset. Errors are calculated using the actual values
and predicted values.
▪ Boosting is a sequential process, where each
5. The observations which are incorrectly predicted, are
subsequent model attempts to correct the errors of
given higher weights.
the previous model.
Another model is created and predictions are made on
▪ The succeeding models are dependent on the previous
the dataset.
model.
6. Similarly, multiple models are created, each correcting
the errors of the previous model.
Common Types:
7. The final model (strong learner) is the weighted mean
of all the
1.AdaBoost (Adaptive Boosting)
2.Gradient Tree Boosting
3.XGBoost
ADVANCED ENSEMLE LEARNING:
BOOSTING ALGORITHM
Box 1:
▪ Assign equal weights to each data point and applied a decision
boundary to classify them as + (plus) or – (minus).
▪ The decision boundary (D1) has generated vertical line at left
side to classify the data points.
▪ This vertical line has incorrectly predicted three + (plus) as –
(minus).
▪ In such case, assign higher weights to these three + (plus) and
apply another decision boundary.

Box 2:
▪ The size of three incorrectly predicted + (plus) is bigger as
compared to rest of the data points.
▪ In this case, the second decision boundary (D2) will try to
predict them correctly.
▪ Now, a vertical line (D2) at right side of this box has classified
three mis-classified + (plus) correctly.
▪ But again, it has caused mis-classification errors. This time with
three -(minus). Again, we will assign higher weight to three –
(minus) and apply another decision stump.
ADVANCED ENSEMLE LEARNING:
BOOSTING ALGORITHM
Box 3:

▪ Here, three – (minus) are given higher weights.


▪ A decision boundary (D3) is applied to predict these mis-
classified observation correctly.
▪ This time a horizontal line is generated to classify + (plus) and –
(minus) based on higher weight of mis-classified observation.

Box 4:
▪ Here, we all decision boundaries D1, D2 and D3 are combined
to form a strong prediction having complex rule as compared
to individual weak learner.
▪ Finally it can be observed that this algorithm has classified
these observation quite well as compared to any of individual
weak learner.
Reinforcement Learning Revisited

Q-Learning is a type of reinforcement learning


Terms used in Reinforcement Learning

▪ Agent(): An entity that can perceive/explore the environment and act upon it.

▪ Environment(): A situation in which an agent is present or surrounded by. In RL,


we assume the stochastic environment, which means it is random in nature.

▪ Action(): Actions are the moves taken by an agent within the environment.

▪ State(): State is a situation returned by the environment after each action taken
by the agent.

▪ Reward(): A feedback returned to the agent from the environment to evaluate


the action of the agent.

▪ Policy(): Policy is a strategy applied by the agent for the next action based on the
current state.
Q-Learning

Real-World Examples Of Modeling A Reinforcement Learning Task


Determining the Placement of Ads on a Web Page

Agent: The program making decisions on how many ads are appropriate for a page.
Environment: The web page.

Action: One of three: (1) putting another ad on the page; (2) dropping an ad from
the page; (3) neither adding nor removing.

Reward: Positive when revenue increases; negative when revenue drops.


In this scenario, the agent observes the environment and gets its current status. The
status can be how many ads there are on the web page and whether or not there is
room for more.
The agent then chooses which of the three actions to take at each step. if
programmed to get positive rewards whenever the revenue increase, and negative
rewards whenever revenue falls, it can develop its effective policy.
Q-Learning

Real-World Examples Of Modeling A Reinforcement Learning Task


Controlling A Walking Robot

Agent: The program controlling a walking robot.

Environment: The real world.

Action: One out of four moves (1) forward; (2) backward; (3) left; and (4) right.

Reward: Positive when it approaches the target destination; negative when it


wastes time, goes in the wrong direction or falls down.
In this final example, a robot can teach itself to move more effectively by adapting
its policy based on the rewards it receives.
Markov Decision Process

▪ Markov Decision Process or MDP, is used to formalize the


reinforcement learning problems.
▪ The dynamics of an environment can be modeled as
a Markov Process.
▪ In MDP, the agent constantly interacts with the environment
and performs actions;
▪ at each action, the environment responds and generates a
new state.

MDP contains a tuple of four elements (S, A, Pa, Ra):

•A set of finite States S


•A set of finite Actions A
•Rewards received after transitioning from state S to state S',
due to action a.
•Probability Pa.
MDP uses Markov property, and to better understand the MDP,
we need to learn about it.
Markov Property

It says that "If the agent is present in the current state S1,
performs an action a1 and move to the state s2, then the state
transition from s1 to s2 only depends on the current state and
future action and states do not depend on past actions, rewards,
or states."
Markov Property the current state transition does not depend on
any past action or state. Hence, MDP is an RL problem that
satisfies the Markov property.
EX: Chess game, the players only focus on the current state and
do not need to remember past actions or states.
Finite MDP:
A finite MDP is when there are finite states, finite rewards, and
finite actions. In RL, we consider only the finite MDP.
Markov Process:
Markov Process is a memoryless process with a sequence of
random states S1, S2, ....., St that uses the Markov Property.
Markov process is also known as Markov chain, which is a tuple (S,
P) on state S and transition function P. These two components (S
and P) can define the dynamics of the system.
Q-Learning

EX: robot has to cross a maze and reach the end point. There are mines,
and the robot can only move one tile at a time. If the robot steps onto a
mine, the robot is dead. The robot has to reach the end point in the
shortest time possible.

The scoring/reward system is as below:

▪ The robot loses 1 point at each step. This is done so that the robot
takes the shortest path and reaches the goal as fast as possible.
▪ If the robot steps on a mine, the point loss is 100 and the game ends.
▪ If the robot gets power ⚡️, it gains 1 point.
▪ If the robot reaches the end goal, the robot gets 100 points.
Q-Learning

Training of Robot:

Step-1 Create a Q-Table

▪ Q-Table a simple lookup table where we calculate the maximum expected future
rewards for action at each state. Basically, this table will guide us to the best action
at each state.

▪ There will be four numbers of actions at each non-edge tile. When a robot is at a
state it can either move up or down or right or left.

▪ In the Q-Table, the columns are the actions and the rows are the states.

▪ Each Q-table score will be the maximum expected future reward that the robot will
get if it takes that action at that state. This is an iterative process
Q-Learning

To learn each value of the Q-table, we use the Q-Learning algorithm.


Mathematics: the Q-Learning algorithm.

Q-function

The Q-function uses the Bellman equation and takes two inputs:
state (s) and action (a).
Q-Learning

Q-Learning Algorithm

1. For each s,a initialize the table entry Q(s, a) to


zero
2. Observe the current state s
3. Do:
▪ Select an action a and execute it
▪ Receive immediate reward r
▪ Observe the new state s’
▪ Update the table entry Q(s,a) as follows:
▪ Q(s,a)=r+ϒ max Q (s’,a’)
▪ s s’
Q-learning algorithm process:
Step-1: Initialization of the Q-Table:

There are n columns, where n= number of actions. There


are m rows, where m= number of states. We will initialize
the values at 0.
Q-Learning

Q-learning algorithm process:


Step 2 choose and perform an action (Iterative Process)
[Exploration and exploitation]: epsilon greedy strategy

Exploration: Initially the epsilon rates will be higher. The robot will explore the
environment and randomly choose actions. The logic behind this is that the robot does
not know anything about the environment

Exploitation: As the robot explores the environment, the epsilon rate decreases and the
robot starts to exploit the environment.

Four actions to choose from: up, down, left, and right.


Initially the robot chose random action: Right
Q-Learning

To update the Q-table Bellman Equation is used

Q-Learning Algorithm

The key elements are 1. For each (s,a) initialize the table entry Q(s, a) to
▪ Action performed by the agent is referred to as “a” zero
▪ State occurred by performing action is “s” 2. Observe the current state s
▪ The reward/feedback obtained for each action is “R” 3. Do:
▪ A discount factor is ϒ ▪ Select an action a and execute it
▪ Receive immediate reward r
Q(s,a)=r+ϒ * max Q(s’,a’) ▪ Observe the new state s’
▪ Update the table entry Q(s,a) as follows:
Q(state, action)=R(State, action)+ gamma * max (next state all Q(s,a)=r+ϒ max Q (s’,a’)
actions) ▪ s s’
Problem Solving on Q-Learning

▪ Suppose we have five rooms in a building which are connected by doors. Lets number the rooms from 0 to 4.
Door 1 and 4 lead into the building from outside to room number 5
▪ The rooms are represented in a graph, each room is a node and the door as a link
Problem Solving on Q-Learning

▪ Goal: The goal is room number-5


▪ Reward: The doors lead immediately to the goal have an instant reward of 100, doors not directly connected to the target
room have zero award
Problem Solving on Q-Learning

▪ The goal is room number-5


▪ The doors lead immediately to the goal have an instant reward of 100, doors not directly connected to the target room have
zero award
▪ Generate the reward matrix, where ‘-1’ represents null values, where there is no link present
Problem Solving on Q-Learning

▪ Set Learning rate=0.8 and assume initial state as Room 1


▪ Initialize the Q matrix as zero
Problem Solving on Q-Learning

▪ Set Learning rate=0.8 and considering initial state as Room 1


▪ Initialize the Q matrix as zero
▪ Suppose our agent’s current state=1, next state = 5, three possible actions could be taken (Go to action 1, 4 and 5)
▪ Q(state, action)=R(State, action)+ gamma * max (next state all actions)

▪ We have reached the goal state, which ends one iteration


Problem Solving on Q-Learning

▪ Repeat the same for every initial state and update the Q-table
▪ Lets consider current state as 3
▪ Possible actions are [1, 2,4]
▪ Lets chose action 1 from 3[ current state=3, next state=1]
▪ Possible actions for next state are [3,5]
▪ Compute the Q-value
Problem Solving on Q-Learning

▪ By considering each of the initial states and iterating over all we get the updated Q table as
▪ The best sequence will be the links with the highest values at each state

You might also like