ML Module 5 2022 PDF
ML Module 5 2022 PDF
Boosting Q-Learning
Ensembel
Learning
Topic Topic
2 3 4 5
1
1) Max Voting:
5 4 5 4 4 4
SIMPLE ENSEMBLE LEARNING
2) Averaging: In this method, we take an average of predictions from all the models and
use it to make the final prediction. Averaging can be used for making predictions in
regression problems or while calculating probabilities for classification problems.
For example, in the below case, the averaging method would take the average of all the
values.
i.e. (5+4+5+4+4)/5 = 4.4
5 4 5 4 4 4.4
SIMPLE ENSEMBLE LEARNING
3) Weighted Averaging: This is an extension of the averaging method. All models are
assigned different weights defining the importance of each model for prediction.
For instance, if two of the colleagues are critics, while others have no prior experience in
this field, then the answers by these two friends are given more importance as
compared to the other people.
rating 5 4 5 4 4 4.41
ADVANCED ENSEMBLE TECHNIQUES:
BAGGING
1) Bagging: The idea behind bagging is combining the results
of multiple models to get a generalized result. Here’s a
question:
▪ If you create all the models on the same set of data and
combine there is a high chance that these models will give
the same result. To solve this, bootstrapping technique is
used.
▪ Bootstrapping is a sampling technique in which we create
subsets of observations from the original dataset, with
replacement. The size of the subsets is the same as the
size of the original set.
▪ Bagging (or Bootstrap Aggregating) technique uses these
subsets (bags) to get a fair idea of the distribution
(complete set). The size of subsets created for bagging may
be less than the original set.
BOOTSTRAPING
Sampling
Bootstrap sampling
TYPES OF ENSEMBLE TECHNIQUES:
BAGGING
1. Multiple subsets are created from the original dataset,
selecting observations with replacement.
2. A base model (weak model) is created on each of these
subsets.
3. The models run in parallel and are independent of each
other.
4. The final predictions are determined by combining the
predictions from all the models.
ADVANCED ENSEMLE LEARNING:
BOOSTING ALGORITHM
Question: If a data point is incorrectly predicted by the Steps:
first model, and then the next (probably all models), will 1. A subset is created from the original dataset.
combining the predictions provide better results?
2. Initially, all data points are given equal weights.
Such situations are taken care of by boosting. 3. A base model is created on this subset.
4. This model is used to make predictions on the whole
Boosting:
dataset. Errors are calculated using the actual values
and predicted values.
▪ Boosting is a sequential process, where each
5. The observations which are incorrectly predicted, are
subsequent model attempts to correct the errors of
given higher weights.
the previous model.
Another model is created and predictions are made on
▪ The succeeding models are dependent on the previous
the dataset.
model.
6. Similarly, multiple models are created, each correcting
the errors of the previous model.
Common Types:
7. The final model (strong learner) is the weighted mean
of all the
1.AdaBoost (Adaptive Boosting)
2.Gradient Tree Boosting
3.XGBoost
ADVANCED ENSEMLE LEARNING:
BOOSTING ALGORITHM
Box 1:
▪ Assign equal weights to each data point and applied a decision
boundary to classify them as + (plus) or – (minus).
▪ The decision boundary (D1) has generated vertical line at left
side to classify the data points.
▪ This vertical line has incorrectly predicted three + (plus) as –
(minus).
▪ In such case, assign higher weights to these three + (plus) and
apply another decision boundary.
Box 2:
▪ The size of three incorrectly predicted + (plus) is bigger as
compared to rest of the data points.
▪ In this case, the second decision boundary (D2) will try to
predict them correctly.
▪ Now, a vertical line (D2) at right side of this box has classified
three mis-classified + (plus) correctly.
▪ But again, it has caused mis-classification errors. This time with
three -(minus). Again, we will assign higher weight to three –
(minus) and apply another decision stump.
ADVANCED ENSEMLE LEARNING:
BOOSTING ALGORITHM
Box 3:
Box 4:
▪ Here, we all decision boundaries D1, D2 and D3 are combined
to form a strong prediction having complex rule as compared
to individual weak learner.
▪ Finally it can be observed that this algorithm has classified
these observation quite well as compared to any of individual
weak learner.
Reinforcement Learning Revisited
▪ Agent(): An entity that can perceive/explore the environment and act upon it.
▪ Action(): Actions are the moves taken by an agent within the environment.
▪ State(): State is a situation returned by the environment after each action taken
by the agent.
▪ Policy(): Policy is a strategy applied by the agent for the next action based on the
current state.
Q-Learning
Agent: The program making decisions on how many ads are appropriate for a page.
Environment: The web page.
Action: One of three: (1) putting another ad on the page; (2) dropping an ad from
the page; (3) neither adding nor removing.
Action: One out of four moves (1) forward; (2) backward; (3) left; and (4) right.
It says that "If the agent is present in the current state S1,
performs an action a1 and move to the state s2, then the state
transition from s1 to s2 only depends on the current state and
future action and states do not depend on past actions, rewards,
or states."
Markov Property the current state transition does not depend on
any past action or state. Hence, MDP is an RL problem that
satisfies the Markov property.
EX: Chess game, the players only focus on the current state and
do not need to remember past actions or states.
Finite MDP:
A finite MDP is when there are finite states, finite rewards, and
finite actions. In RL, we consider only the finite MDP.
Markov Process:
Markov Process is a memoryless process with a sequence of
random states S1, S2, ....., St that uses the Markov Property.
Markov process is also known as Markov chain, which is a tuple (S,
P) on state S and transition function P. These two components (S
and P) can define the dynamics of the system.
Q-Learning
EX: robot has to cross a maze and reach the end point. There are mines,
and the robot can only move one tile at a time. If the robot steps onto a
mine, the robot is dead. The robot has to reach the end point in the
shortest time possible.
▪ The robot loses 1 point at each step. This is done so that the robot
takes the shortest path and reaches the goal as fast as possible.
▪ If the robot steps on a mine, the point loss is 100 and the game ends.
▪ If the robot gets power ⚡️, it gains 1 point.
▪ If the robot reaches the end goal, the robot gets 100 points.
Q-Learning
Training of Robot:
▪ Q-Table a simple lookup table where we calculate the maximum expected future
rewards for action at each state. Basically, this table will guide us to the best action
at each state.
▪ There will be four numbers of actions at each non-edge tile. When a robot is at a
state it can either move up or down or right or left.
▪ In the Q-Table, the columns are the actions and the rows are the states.
▪ Each Q-table score will be the maximum expected future reward that the robot will
get if it takes that action at that state. This is an iterative process
Q-Learning
Q-function
The Q-function uses the Bellman equation and takes two inputs:
state (s) and action (a).
Q-Learning
Q-Learning Algorithm
Exploration: Initially the epsilon rates will be higher. The robot will explore the
environment and randomly choose actions. The logic behind this is that the robot does
not know anything about the environment
Exploitation: As the robot explores the environment, the epsilon rate decreases and the
robot starts to exploit the environment.
Q-Learning Algorithm
The key elements are 1. For each (s,a) initialize the table entry Q(s, a) to
▪ Action performed by the agent is referred to as “a” zero
▪ State occurred by performing action is “s” 2. Observe the current state s
▪ The reward/feedback obtained for each action is “R” 3. Do:
▪ A discount factor is ϒ ▪ Select an action a and execute it
▪ Receive immediate reward r
Q(s,a)=r+ϒ * max Q(s’,a’) ▪ Observe the new state s’
▪ Update the table entry Q(s,a) as follows:
Q(state, action)=R(State, action)+ gamma * max (next state all Q(s,a)=r+ϒ max Q (s’,a’)
actions) ▪ s s’
Problem Solving on Q-Learning
▪ Suppose we have five rooms in a building which are connected by doors. Lets number the rooms from 0 to 4.
Door 1 and 4 lead into the building from outside to room number 5
▪ The rooms are represented in a graph, each room is a node and the door as a link
Problem Solving on Q-Learning
▪ Repeat the same for every initial state and update the Q-table
▪ Lets consider current state as 3
▪ Possible actions are [1, 2,4]
▪ Lets chose action 1 from 3[ current state=3, next state=1]
▪ Possible actions for next state are [3,5]
▪ Compute the Q-value
Problem Solving on Q-Learning
▪ By considering each of the initial states and iterating over all we get the updated Q table as
▪ The best sequence will be the links with the highest values at each state