0% found this document useful (0 votes)

2 views

ML Unit-5

The document provides an overview of Reinforcement Learning (RL), explaining its principles, components such as state and action spaces, reward functions, and algorithms like Q-learning and SARSA. It highlights the agent's learning process through feedback and experience in various environments, as well as applications of RL in fields like robotics, natural language processing, and gaming. Additionally, it discusses the Markov Decision Process (MDP) and the significance of Q-values in optimizing agent actions.

Uploaded by

ssummaya2911

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

ML Unit-5

Uploaded by

ssummaya2911

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Unit-5 NEHA UNNISA

Machine Learning Asst Prof

Reinforcement Learning: overview, example: getting lost, State and Action Spaces, The
Reward Function, Discounting, Action Selection, Policy, Markov decision processes, Q-
learning, uses of Reinforcement learning,
Applications of Machine Learning in various fields: Text classification, Image
Classification, Speech Recognition.

---------------------------------------------------------------------------------------------------------------

What is Reinforcement Learning?

o Reinforcement Learning is a feedback-based Machine learning technique in which an

agent learns to behave in an environment by performing the actions and seeing the
results of actions. For each good action, the agent gets positive feedback, and for each
bad action, the agent gets negative feedback or penalty.
o In Reinforcement Learning, the agent learns automatically using feedbacks without any
labeled data, unlike supervised learning.
o Since there is no labeled data, so the agent is bound to learn by its experience only.
o RL solves a specific type of problem where decision making is sequential, and the goal
is long-term, such as game-playing, robotics, etc.
o The agent interacts with the environment and explores it by itself. The primary goal of
an agent in reinforcement learning is to improve the performance by getting the
maximum positive rewards.
o The agent learns with the process of hit and trial, and based on the experience, it learns
to perform the task in a better way. Hence, we can say that "Reinforcement learning
is a type of machine learning method where an intelligent agent (computer program)
interacts with the environment and learns to act within that." How a Robotic dog
learns the movement of his arms is an example of Reinforcement learning.
o It is a core part of Artificial intelligence, and all AI agent works on the concept of
reinforcement learning. Here we do not need to pre-program the agent, as it learns from
its own experience without any human intervention.
o Example: Suppose there is an AI agent present within a maze environment, and his
goal is to find the diamond. The agent interacts with the environment by performing
some actions, and based on those actions, the state of the agent gets changed, and it also
receives a reward or penalty as feedback.
Unit-5 NEHA UNNISA
Machine Learning Asst Prof

o The agent continues doing these three things (take action, change state/remain in the
same state, and get feedback), and by doing these actions, he learns and explores the
environment.
o The agent learns that what actions lead to positive feedback or rewards and what actions
lead to negative feedback penalty. As a positive reward, the agent gets a positive point,
and as a penalty, it gets a negative point.

--------------------------------------------------------------------------------------------------------

Example:
The problem is as follows: We have an agent and a reward, with many hurdles in between.
The agent is supposed to find the best possible path to reach the reward. The following
problem explains the problem more easily.
Unit-5 NEHA UNNISA
Machine Learning Asst Prof

The above image shows the robot, diamond, and fire. The goal of the robot is to get the
reward that is the diamond and avoid the hurdles that are fired. The robot learns by trying
all the possible paths and then choosing the path which gives him the reward with the least
hurdles. Each right step will give the robot a reward and each wrong step will subtract the
reward of the robot. The total reward will be calculated when it reaches the final reward
that is the diamond.
-------------------------------------------------------------------------------------------

Getting Lost
Loss is the penalty for a bad prediction. That is, loss is a number indicating how bad the
model's prediction was on a single example. If the model's prediction is perfect, the loss is
zero; otherwise, the loss is greater.

----------------------------------------------------------------------------------------------------------
Unit-5 NEHA UNNISA
Machine Learning Asst Prof

State/ Observation Spaces and Action Spaces

The state space S is a set of all the states that the agent can transition to and action space A is
a set of all actions the agent can act out in a certain environment. There are also Partial
Observable cases, where the agent is unable to observe the complete state information of the
environment

----------------------------------------------------------------------------------------------------------

Reward function :
The goal of reinforcement learning is defined by the reward signal. At each state, the
environment sends an immediate signal to the learning agent, and this signal is known as
a reward signal. These rewards are given according to the good and bad actions taken by the
agent. The agent's main objective is to maximize the total number of rewards for good
actions. The reward signal can change the policy, such as if an action selected by the agent
leads to low reward, then the policy may change to select other actions in the future.

---------------------------------------------------------------------------------------------------------

Markov Decision Process

Markov Decision Process or MDP, is used to formalize the reinforcement learning

problems. If the environment is completely observable, then its dynamic can be modeled as
a Markov Process. In MDP, the agent constantly interacts with the environment and performs
actions; at each action, the environment responds and generates a new state.
Unit-5 NEHA UNNISA
Machine Learning Asst Prof

MDP is used to describe the environment for the RL, and almost all the RL problem can be
formalized using MDP.

MDP contains a tuple of four elements (S, A, Pa, Ra):

o A set of finite States S

o A set of finite Actions A
o Rewards received after transitioning from state S to state S', due to action a.
o Probability Pa.

MDP uses Markov property, and to better understand the MDP, we need to learn about it.

Markov Property:

It says that "If the agent is present in the current state S1, performs an action a1 and move
to the state s2, then the state transition from s1 to s2 only depends on the current state and
future action and states do not depend on past actions, rewards, or states."

Or, in other words, as per Markov Property, the current state transition does not depend on any
past action or state. Hence, MDP is an RL problem that satisfies the Markov property. Such as
in a Chess game, the players only focus on the current state and do not need to remember
past actions or states.

Finite MDP:

A finite MDP is when there are finite states, finite rewards, and finite actions. In RL, we
consider only the finite MDP.
Unit-5 NEHA UNNISA
Machine Learning Asst Prof

Markov Process:

Markov Process is a memoryless process with a sequence of random states S 1, S2, ....., St that
uses the Markov Property. Markov process is also known as Markov chain, which is a tuple (S,
P) on state S and transition function P. These two components (S and P) can define the
dynamics of the system.

-----------------------------------------------------------------------------------------------------------

Q-Learning

o Q-learning is an Off policy RL algorithm, which is used for the temporal

difference Learning. The temporal difference learning methods are the way of
comparing temporally successive predictions.
o It learns the value function Q (S, a), which means how good to take action "a"
at a particular state "s."
o The below flowchart explains the working of Q- learning:

o State Action Reward State action (SARSA):

o SARSA stands for State Action Reward State action, which is an on-
policy temporal difference learning method. The on-policy control method
selects the action for each state while learning using a specific policy.
o The goal of SARSA is to calculate the Q π (s, a) for the selected current policy
π and all pairs of (s-a).
Unit-5 NEHA UNNISA
Machine Learning Asst Prof

o The main difference between Q-learning and SARSA algorithms is that unlike
Q-learning, the maximum reward for the next state is not required for
updating the Q-value in the table.
o In SARSA, new action and reward are selected using the same policy, which
has determined the original action.
o The SARSA is named because it uses the quintuple
o Q(s, a, r, s',a'). Where,
s: original state
a: Original action
r: reward observed while following the states
s' and a': New state, action pair.
o Deep Q Neural Network (DQN):
o As the name suggests, DQN is a Q-learning using Neural networks.
o For a big state space environment, it will be a challenging and complex task to
define and update a Q-table.
o To solve such an issue, we can use a DQN algorithm. Where, instead of defining
a Q-table, neural network approximates the Q-values for each action and state.

Now, we will expand the Q-learning.

Q-Learning Explanation:
o Q-learning is a popular model-free reinforcement learning algorithm based on the
Bellman equation.
o The main objective of Q-learning is to learn the policy which can inform the agent
that what actions should be taken for maximizing the reward under what
circumstances.
o It is an off-policy RL that attempts to find the best action to take at a current state.
o The goal of the agent in Q-learning is to maximize the value of Q.
o The value of Q-learning can be derived from the Bellman equation. Consider the
Bellman equation given below:

In the equation, we have various components, including reward, discount factor (γ), probability,
and end states s'. But there is no any Q-value is given so first consider the below image:
Unit-5 NEHA UNNISA
Machine Learning Asst Prof

In the above image, we can see there is an agent who has three values options, V(s 1), V(s2),
V(s3). As this is MDP, so agent only cares for the current state and the future state. The agent
can go to any direction (Up, Left, or Right), so he needs to decide where to go for the optimal
path. Here agent will take a move as per probability bases and changes the state. But if we want
some exact moves, so for this, we need to make some changes in terms of Q-value. Consider
the below image:

Q- represents the quality of the actions at each state. So instead of using a value at each state,
we will use a pair of state and action, i.e., Q(s, a). Q-value specifies that which action is more
lubricative than others, and according to the best Q-value, the agent takes his next move. The
Bellman equation can be used for deriving the Q-value.

To perform any action, the agent will get a reward R(s, a), and also he will end up on a certain
state, so the Q -value equation will be:
Unit-5 NEHA UNNISA
Machine Learning Asst Prof

Hence, we can say that, V(s) = max [Q(s, a)]

The above formula is used to estimate the Q-values in Q-Learning.

What is 'Q' in Q-learning?

The Q stands for quality in Q-learning, which means it specifies the quality of an action taken
by the agent.

Q-table:

A Q-table or matrix is created while performing the Q-learning. The table follows the state and
action pair, i.e., [s, a], and initializes the values to zero. After each action, the table is updated,
and the q-values are stored within the table.

The RL agent uses this Q-table as a reference table to select the best action based on the q-
values.

----------------------------------------------------------------------------------------------------------

Applications of Reinforcement Learning

• Automated Robots. While most robots don't look like pop culture has led us to
believe, their capabilities are just as impressive. ...
• Natural Language Processing. ...
• Marketing and Advertising. ...
• Image Processing. ...
• Recommendation Systems. ...
• Gaming. ...
• Energy Conservation. ...
• Traffic Control.

------------------------------------------------------------------------------------------------------

Lecture 2
No ratings yet
Lecture 2
47 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
RL Ese Answers
No ratings yet
RL Ese Answers
16 pages
MLT Unit-5 notes
No ratings yet
MLT Unit-5 notes
17 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
Reinforcement
No ratings yet
Reinforcement
9 pages
lecture 9 Reiforcement learning (1)
No ratings yet
lecture 9 Reiforcement learning (1)
29 pages
UNIT VI
No ratings yet
UNIT VI
17 pages
RL RS-Unit_3 (1)
No ratings yet
RL RS-Unit_3 (1)
6 pages
RL Frra
No ratings yet
RL Frra
10 pages
Sections
No ratings yet
Sections
76 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
29 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
Unit 03 RL Problem
No ratings yet
Unit 03 RL Problem
9 pages
ML_Unit-4
No ratings yet
ML_Unit-4
10 pages
Unit 04 Finite Markov Decision Processes
No ratings yet
Unit 04 Finite Markov Decision Processes
8 pages
MDP
No ratings yet
MDP
10 pages
7- Reinforcement Learning
No ratings yet
7- Reinforcement Learning
23 pages
RL Unit 2
No ratings yet
RL Unit 2
11 pages
5.4-Reinforcement Learning-Part1-Introduction
No ratings yet
5.4-Reinforcement Learning-Part1-Introduction
15 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
Unit 5
No ratings yet
Unit 5
10 pages
Unit-5 (AI)
No ratings yet
Unit-5 (AI)
21 pages
Reinforcement Learning Model Based Planning Dynamic Programming
No ratings yet
Reinforcement Learning Model Based Planning Dynamic Programming
17 pages
Reinforcement Learning, Q-Learning
No ratings yet
Reinforcement Learning, Q-Learning
20 pages
AS02
No ratings yet
AS02
16 pages
Reinf 2
No ratings yet
Reinf 2
4 pages
Maai 6
No ratings yet
Maai 6
143 pages
Reinforcement learning
No ratings yet
Reinforcement learning
10 pages
Unit-5 ML Notes
No ratings yet
Unit-5 ML Notes
31 pages
Chapter 2
No ratings yet
Chapter 2
21 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
35 pages
PDF Unit-5(Full Unit)
No ratings yet
PDF Unit-5(Full Unit)
37 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
Introduction to Reinforcement Learning
No ratings yet
Introduction to Reinforcement Learning
19 pages
L12 Reinforcement Learning 2
No ratings yet
L12 Reinforcement Learning 2
26 pages
Reinforcement Learning2A
No ratings yet
Reinforcement Learning2A
88 pages
Reinforcement Learning Note
No ratings yet
Reinforcement Learning Note
16 pages
RL & DL Notes
No ratings yet
RL & DL Notes
43 pages
Rewards in Reinforcement Learning
No ratings yet
Rewards in Reinforcement Learning
12 pages
RL UNIT - II
No ratings yet
RL UNIT - II
20 pages
UNIT 4 (2)
No ratings yet
UNIT 4 (2)
6 pages
A Brief Introduction To Reinforcement Learning
No ratings yet
A Brief Introduction To Reinforcement Learning
4 pages
Reinforcement Learning and Robotics
No ratings yet
Reinforcement Learning and Robotics
35 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
35 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
11-DL-Deep Learning For Reinforcement Learning
No ratings yet
11-DL-Deep Learning For Reinforcement Learning
47 pages
5.5 Reinforcement Learning
No ratings yet
5.5 Reinforcement Learning
5 pages
RL
No ratings yet
RL
62 pages
RL Frra
No ratings yet
RL Frra
9 pages
RL Ese
No ratings yet
RL Ese
7 pages
DEEP RL - CONTENT BEYOND SYLLABUS
No ratings yet
DEEP RL - CONTENT BEYOND SYLLABUS
16 pages
114021
No ratings yet
114021
55 pages
10 ML Introduction to Reinforcement Learning
No ratings yet
10 ML Introduction to Reinforcement Learning
8 pages
DSA5102_lecture11
No ratings yet
DSA5102_lecture11
44 pages
tiếng anhi
No ratings yet
tiếng anhi
7 pages
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
From Everand
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
Luka Nikolic
No ratings yet
Hediwaw
No ratings yet
Hediwaw
22 pages
Project On Three Phase Fault Analysis With Auto Reset On Temporary Fault and Permanent Trip Otherwise
No ratings yet
Project On Three Phase Fault Analysis With Auto Reset On Temporary Fault and Permanent Trip Otherwise
106 pages
IM10!51!52 Ch6 Simultaneous Equations
No ratings yet
IM10!51!52 Ch6 Simultaneous Equations
24 pages
Oisd STD 116
50% (2)
Oisd STD 116
59 pages
01 PM U Presentation
No ratings yet
01 PM U Presentation
46 pages
Pello Brochure
No ratings yet
Pello Brochure
8 pages
Complete Download Quantum Cascade Lasers Jérôme Faist PDF All Chapters
No ratings yet
Complete Download Quantum Cascade Lasers Jérôme Faist PDF All Chapters
51 pages
Cat Syllabus PDF 2025
100% (1)
Cat Syllabus PDF 2025
31 pages
Unit5 Questions
No ratings yet
Unit5 Questions
17 pages
DCC Answer Class Test 1
No ratings yet
DCC Answer Class Test 1
12 pages
Soalan Peperiksaan Matematik Tingkatan 1 Kertas 1
77% (35)
Soalan Peperiksaan Matematik Tingkatan 1 Kertas 1
4 pages
MCA Full Syllabus 2019-2021 (Leet) PDF
No ratings yet
MCA Full Syllabus 2019-2021 (Leet) PDF
87 pages
Vibration-Book - SHABANA PDF
No ratings yet
Vibration-Book - SHABANA PDF
359 pages
Si tts520 10mhz To 520mhz Transmitter Test Set Amm B Automatic Modulation
No ratings yet
Si tts520 10mhz To 520mhz Transmitter Test Set Amm B Automatic Modulation
108 pages
Jurnal Preformulasi
No ratings yet
Jurnal Preformulasi
14 pages
Major phase-1
No ratings yet
Major phase-1
13 pages
Mining-Minerals - Geho-Heatbarrier Pump
100% (1)
Mining-Minerals - Geho-Heatbarrier Pump
2 pages
SQL
No ratings yet
SQL
2 pages
Continuous Gas Analyzers, in Situ
No ratings yet
Continuous Gas Analyzers, in Situ
32 pages
Earthing Transformer (ZNyn11) ZigZag Transformer Report Sample
50% (2)
Earthing Transformer (ZNyn11) ZigZag Transformer Report Sample
3 pages
LQ Suggested Solution & MC Anwser (3001)
No ratings yet
LQ Suggested Solution & MC Anwser (3001)
9 pages
Total Variance Explained
No ratings yet
Total Variance Explained
9 pages
CAPE Physics 2012 U1 P2 Mark Scheme
No ratings yet
CAPE Physics 2012 U1 P2 Mark Scheme
7 pages
2000 Prentice Hall, Inc. All
No ratings yet
2000 Prentice Hall, Inc. All
44 pages
COSO Checkweigher
No ratings yet
COSO Checkweigher
10 pages
Petroleum Engineering 311 Reservoir Petr
No ratings yet
Petroleum Engineering 311 Reservoir Petr
224 pages
DSC 12-Time Series Analysis Guidelines
No ratings yet
DSC 12-Time Series Analysis Guidelines
3 pages
Biodiesel Blend: Biofuels
No ratings yet
Biodiesel Blend: Biofuels
17 pages
Prod. Ucts Det. Ails
No ratings yet
Prod. Ucts Det. Ails
15 pages
Slides 3 - Internal Anatomy
100% (1)
Slides 3 - Internal Anatomy
45 pages