0% found this document useful (0 votes)

68 views8 pages

21ai020 & Reinforcement Learning UNIT 1-LM:1

nil

Uploaded by

jashwanthkumar.ad21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views8 pages

21ai020 & Reinforcement Learning UNIT 1-LM:1

nil

Uploaded by

jashwanthkumar.ad21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

21AI020 & REINFORCEMENT LEARNING

UNIT 1-LM:1
TOPIC: THE REINFORCEMENT LEARNING PROBLEM

INTRODUCTION:

o DEFINITION:"Reinforcement learning is a type of machine learning method

where an intelligent agent (computer program) interacts with the environment

and learns to act within that."

o Reinforcement Learning is a feedback-based Machine learning technique in

which an agent learns to behave in an environment by performing the actions

and seeing the results of actions. For each good action, the agent gets positive

feedback, and for each bad action, the agent gets negative feedback or penalty.

o The agent learns automatically using feedbacks without any labeled data,

o RL solves a specific type of problem where decision making is sequential, and

the goal is long-term, such as game-playing, robotics, etc.

o The primary goal of an agent in reinforcement learning is to improve the

performance by getting the maximum positive rewards.

o Example: Suppose there is an AI agent present within a maze environment, and

his goal is to find the diamond. The agent interacts with the environment by

performing some actions, and based on those actions, the state of the agent gets

changed, and it also receives a reward or penalty as feedback.

o The agent continues doing these three things (take action, change state/remain

in the same state, and get feedback), and by doing these actions, he learns and

explores the environment.

o The agent learns that what actions lead to positive feedback or rewards and

what actions lead to negative feedback penalty. As a positive reward, the agent

gets a positive point, and as a penalty, it gets a negative point.

Terms used in Reinforcement Learning
o Agent(): An entity that can perceive/explore the environment and act upon

it.

o Environment(): A situation in which an agent is present or surrounded by.

In RL, we assume the stochastic environment, which means it is random in

nature.

o Action(): Actions are the moves taken by an agent within the environment.

o State(): State is a situation returned by the environment after each action

taken by the agent.

o Reward(): A feedback returned to the agent from the environment to

evaluate the action of the agent.

o Policy(): Policy is a strategy applied by the agent for the next action based

on the current state.

o Value(): It is expected long-term retuned with the discount factor and

opposite to the short-term reward.

o Q-value(): It is mostly similar to the value, but it takes one additional

parameter as a current action (a).

Key Features of Reinforcement Learning
o In RL, the agent is not instructed about the environment and what actions

need to be taken.

o It is based on the hit and trial process.

o The agent takes the next action and changes states according to the feedback

of the previous action.

o The agent may get a delayed reward.

o The environment is stochastic, and the agent needs to explore it to reach to

get the maximum positive rewards.

Approaches to implement Reinforcement Learning

There are mainly three ways to implement reinforcement-learning in ML, which

are:

1. Value-based:

The value-based approach is about to find the optimal value function, which

is the maximum value at a state under any policy. Therefore, the agent

expects the long-term return at any state(s) under policy π.

2. Policy-based:

Policy-based approach is to find the optimal policy for the maximum future

rewards without using the value function. In this approach, the agent tries to
apply such a policy that the action performed in each step helps to maximize

the future reward.

The policy-based approach has mainly two types of policy:

o Deterministic: The same action is produced by the policy (π) at any

state.

o Stochastic: In this policy, probability determines the produced action.

3. Model-based: In the model-based approach, a virtual model is created for

the environment, and the agent explores that environment to learn it. There

is no particular solution or algorithm for this approach because the model

representation is different for each environment.

Types of Reinforcement learning

1.Positive Reinforcement:

The positive reinforcement learning means adding something to

increase the tendency that expected behavior would occur again.

 It impacts positively on the behavior of the agent and increases the

strength of the behavior.
 This type of reinforcement can sustain the changes for a long time, but
too much positive reinforcement may lead to an overload of states that
can reduce the consequences.

2.Negative Reinforcement:

The negative reinforcement learning is opposite to the positive

reinforcement as it increases the tendency that the specific behavior will

occur again by avoiding the negative condition. It can be more effective than

the positive reinforcement depending on situation and behavior, but it

provides reinforcement only to meet minimum behavior.

Examples
 A gazelle calf struggles to its feet minutes after

being born. Half an hour later it is running at

20 miles per hour.

 A master chess player makes a move. The

choice is informed both by planning—

anticipating possible replies and

counterreplies—and by im- mediate, intuitive

judgments of the desirability of particular

positions and moves.

 Consider the familiar child’s game of tic-tac-

toe. Two players take turns playing on a

three-by-three board. One player plays Xs

and the other Os until one player wins by

placing three marks in a row, horizontally,

vertically, or diagonally, as the X player has

in this game:

X O O
O X X
X
Figure 1.1: A sequence of tic-tac-toe moves. The solid lines
represent the moves taken during a game; the dashed lines
represent moves that we (our reinforcement learning player)
considered but did not make. Our second move was an
exploratory move, meaning that it was taken even though another
sibling move, the one leading to e∗, was ranked higher.
Exploratory moves do not result in any learning, but each of our
other moves does, causing backups as suggested by the curved
arrows and detailed in the text.

CMPE257 - W10C13 - Reinforcement Learning
No ratings yet
CMPE257 - W10C13 - Reinforcement Learning
161 pages
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
From Everand
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
Luka Nikolic
No ratings yet
DRL Final Notes
No ratings yet
DRL Final Notes
281 pages
DLMAIRIL01_Q4-2024_Session1
No ratings yet
DLMAIRIL01_Q4-2024_Session1
84 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
64 pages
Unit-5 ML Notes
No ratings yet
Unit-5 ML Notes
31 pages
Unit-5
No ratings yet
Unit-5
58 pages
AI unit -3.docx
No ratings yet
AI unit -3.docx
102 pages
UNIT-4
No ratings yet
UNIT-4
56 pages
Module 01
No ratings yet
Module 01
66 pages
RL & DL Notes
No ratings yet
RL & DL Notes
73 pages
Module 1
No ratings yet
Module 1
72 pages
Unit-5 Reinforcemnt and Q learning
No ratings yet
Unit-5 Reinforcemnt and Q learning
45 pages
RL & DL Notes
No ratings yet
RL & DL Notes
43 pages
REINFORCEMENT LEARNING-1
No ratings yet
REINFORCEMENT LEARNING-1
19 pages
Sara Reinforcement Learning
No ratings yet
Sara Reinforcement Learning
69 pages
Unit-8 - Reinforcement Learning
No ratings yet
Unit-8 - Reinforcement Learning
52 pages
unit 5 ml
No ratings yet
unit 5 ml
15 pages
Winter Semester 2023-24_CSE4037_ETH_AP2023246000594_2024-01-05_Reference-Material-I
No ratings yet
Winter Semester 2023-24_CSE4037_ETH_AP2023246000594_2024-01-05_Reference-Material-I
35 pages
3.RL Unit 3
No ratings yet
3.RL Unit 3
31 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Reinforcement Learning, Q-Learning
No ratings yet
Reinforcement Learning, Q-Learning
20 pages
lecture 9 Reiforcement learning (1)
No ratings yet
lecture 9 Reiforcement learning (1)
29 pages
UNIT-3
No ratings yet
UNIT-3
29 pages
unit4(AI)2024.docx-1
No ratings yet
unit4(AI)2024.docx-1
22 pages
Lecture RL
No ratings yet
Lecture RL
37 pages
Unit No. 05 - Reinforced and Deep Learning
No ratings yet
Unit No. 05 - Reinforced and Deep Learning
44 pages
MLT Unit-5 notes
No ratings yet
MLT Unit-5 notes
17 pages
Reinforcemnet Learning
No ratings yet
Reinforcemnet Learning
8 pages
Reinforcement Learning: Nazia Bibi
100% (1)
Reinforcement Learning: Nazia Bibi
61 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
L11 Reinforcement Learning 1
No ratings yet
L11 Reinforcement Learning 1
18 pages
ML-10
No ratings yet
ML-10
9 pages
L-14 - Reinforcement-L-d-07062024-111949am
No ratings yet
L-14 - Reinforcement-L-d-07062024-111949am
22 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
ML UNIT 5
No ratings yet
ML UNIT 5
13 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
29 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
Reinforcement learning
No ratings yet
Reinforcement learning
10 pages
Unit V
100% (1)
Unit V
24 pages
Unit-5 (AI)
No ratings yet
Unit-5 (AI)
21 pages
Unit 1 - Reinforcement Learning,Overfitting, Training, Validation Sets, Metrics, Bias and Variance
No ratings yet
Unit 1 - Reinforcement Learning,Overfitting, Training, Validation Sets, Metrics, Bias and Variance
16 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
15 pages
Unit 5
No ratings yet
Unit 5
45 pages
Unit 5 ML 3year
No ratings yet
Unit 5 ML 3year
17 pages
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
No ratings yet
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
64 pages
RL Vishnu Sankar
No ratings yet
RL Vishnu Sankar
26 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
25 pages
RL Unit 1
100% (1)
RL Unit 1
26 pages
RL
No ratings yet
RL
8 pages
Unit I
No ratings yet
Unit I
8 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
8 pages
Unit 5
No ratings yet
Unit 5
10 pages
Reinforcement learning is an autonomous
No ratings yet
Reinforcement learning is an autonomous
3 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
12 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
5 pages
Reinforcement
No ratings yet
Reinforcement
9 pages
Assignment_15_Modern_AI
No ratings yet
Assignment_15_Modern_AI
3 pages
DBeaver_v_25_0_ea
No ratings yet
DBeaver_v_25_0_ea
1,179 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
5 pages
Ms Excel Notes English
50% (2)
Ms Excel Notes English
11 pages
Fmsarrpe (Sarrp Operator Manual)
No ratings yet
Fmsarrpe (Sarrp Operator Manual)
155 pages
005 Hoklas - Intertek Hong Kong
No ratings yet
005 Hoklas - Intertek Hong Kong
136 pages
Quality Stage Student Guide
No ratings yet
Quality Stage Student Guide
89 pages
Fluke 985 - Manual
No ratings yet
Fluke 985 - Manual
32 pages
Ccboot
No ratings yet
Ccboot
52 pages
DBMS Notes IB
100% (1)
DBMS Notes IB
27 pages
6051 DDD 5 B 285 A
No ratings yet
6051 DDD 5 B 285 A
7 pages
Tape Deck Tascam DA-302 Manual
No ratings yet
Tape Deck Tascam DA-302 Manual
33 pages
Alpha Anywhere QuickStartGuide v12
No ratings yet
Alpha Anywhere QuickStartGuide v12
139 pages
EPPM4433-Final Report Western Digital (Malaysia) SDN BHD
No ratings yet
EPPM4433-Final Report Western Digital (Malaysia) SDN BHD
25 pages
Cramer's Rule: Applying Determinants To Solve Systems of Equations 2x2 & 3x3
No ratings yet
Cramer's Rule: Applying Determinants To Solve Systems of Equations 2x2 & 3x3
19 pages
Sanchezetal 2023 Shoppingpreferences
No ratings yet
Sanchezetal 2023 Shoppingpreferences
11 pages
WMA02 01 Que 20150126 PDF
No ratings yet
WMA02 01 Que 20150126 PDF
48 pages
Stone - de 123 DOJ Response To MTC Crowdstrike Reports
89% (9)
Stone - de 123 DOJ Response To MTC Crowdstrike Reports
4 pages
Foglight Install Unix Embedded Mysql
No ratings yet
Foglight Install Unix Embedded Mysql
65 pages
1.
No ratings yet
1.
5 pages
Machines Get Smarter Article
No ratings yet
Machines Get Smarter Article
8 pages
Inglês - Advérbios - Adverbs.
No ratings yet
Inglês - Advérbios - Adverbs.
18 pages
Nomad Internet Launches Nomad Dragon- The Best Verizon
No ratings yet
Nomad Internet Launches Nomad Dragon- The Best Verizon
5 pages
Keyboard Shortcuts
No ratings yet
Keyboard Shortcuts
5 pages
Parametric Modeling A Simple Tool
No ratings yet
Parametric Modeling A Simple Tool
6 pages
MCQ
100% (1)
MCQ
29 pages
Virtual Health Clinic Web
No ratings yet
Virtual Health Clinic Web
2 pages
StatementofAccount 5012200653 3102022151739
No ratings yet
StatementofAccount 5012200653 3102022151739
2 pages
Case Study Fall of Quest - Leadership Power & Influence
No ratings yet
Case Study Fall of Quest - Leadership Power & Influence
3 pages
Curriculum Viate
No ratings yet
Curriculum Viate
3 pages
Lavarias Lorenz Jay Cveng 3-A L1-04: Last Name: First Name: Section: Worksheet Code
No ratings yet
Lavarias Lorenz Jay Cveng 3-A L1-04: Last Name: First Name: Section: Worksheet Code
1 page
Positive Negative Design Rubric
No ratings yet
Positive Negative Design Rubric
1 page