0% found this document useful (0 votes)
68 views8 pages

21ai020 & Reinforcement Learning UNIT 1-LM:1

nil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views8 pages

21ai020 & Reinforcement Learning UNIT 1-LM:1

nil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

21AI020 & REINFORCEMENT LEARNING

UNIT 1-LM:1
TOPIC: THE REINFORCEMENT LEARNING PROBLEM

INTRODUCTION:

o DEFINITION:"Reinforcement learning is a type of machine learning method

where an intelligent agent (computer program) interacts with the environment

and learns to act within that."

o Reinforcement Learning is a feedback-based Machine learning technique in

which an agent learns to behave in an environment by performing the actions

and seeing the results of actions. For each good action, the agent gets positive

feedback, and for each bad action, the agent gets negative feedback or penalty.

o The agent learns automatically using feedbacks without any labeled data,

o RL solves a specific type of problem where decision making is sequential, and

the goal is long-term, such as game-playing, robotics, etc.

o The primary goal of an agent in reinforcement learning is to improve the

performance by getting the maximum positive rewards.

o Example: Suppose there is an AI agent present within a maze environment, and

his goal is to find the diamond. The agent interacts with the environment by

performing some actions, and based on those actions, the state of the agent gets

changed, and it also receives a reward or penalty as feedback.


o The agent continues doing these three things (take action, change state/remain

in the same state, and get feedback), and by doing these actions, he learns and

explores the environment.

o The agent learns that what actions lead to positive feedback or rewards and

what actions lead to negative feedback penalty. As a positive reward, the agent

gets a positive point, and as a penalty, it gets a negative point.


Terms used in Reinforcement Learning
o Agent(): An entity that can perceive/explore the environment and act upon

it.

o Environment(): A situation in which an agent is present or surrounded by.

In RL, we assume the stochastic environment, which means it is random in

nature.

o Action(): Actions are the moves taken by an agent within the environment.

o State(): State is a situation returned by the environment after each action

taken by the agent.

o Reward(): A feedback returned to the agent from the environment to

evaluate the action of the agent.

o Policy(): Policy is a strategy applied by the agent for the next action based

on the current state.

o Value(): It is expected long-term retuned with the discount factor and

opposite to the short-term reward.

o Q-value(): It is mostly similar to the value, but it takes one additional

parameter as a current action (a).


Key Features of Reinforcement Learning
o In RL, the agent is not instructed about the environment and what actions

need to be taken.

o It is based on the hit and trial process.

o The agent takes the next action and changes states according to the feedback

of the previous action.

o The agent may get a delayed reward.

o The environment is stochastic, and the agent needs to explore it to reach to

get the maximum positive rewards.

Approaches to implement Reinforcement Learning


There are mainly three ways to implement reinforcement-learning in ML, which

are:

1. Value-based:

The value-based approach is about to find the optimal value function, which

is the maximum value at a state under any policy. Therefore, the agent

expects the long-term return at any state(s) under policy π.

2. Policy-based:

Policy-based approach is to find the optimal policy for the maximum future

rewards without using the value function. In this approach, the agent tries to
apply such a policy that the action performed in each step helps to maximize

the future reward.

The policy-based approach has mainly two types of policy:

o Deterministic: The same action is produced by the policy (π) at any

state.

o Stochastic: In this policy, probability determines the produced action.

3. Model-based: In the model-based approach, a virtual model is created for

the environment, and the agent explores that environment to learn it. There

is no particular solution or algorithm for this approach because the model

representation is different for each environment.

Types of Reinforcement learning


1.Positive Reinforcement:

The positive reinforcement learning means adding something to


increase the tendency that expected behavior would occur again.

 It impacts positively on the behavior of the agent and increases the


strength of the behavior.
 This type of reinforcement can sustain the changes for a long time, but
too much positive reinforcement may lead to an overload of states that
can reduce the consequences.

2.Negative Reinforcement:

The negative reinforcement learning is opposite to the positive

reinforcement as it increases the tendency that the specific behavior will

occur again by avoiding the negative condition. It can be more effective than

the positive reinforcement depending on situation and behavior, but it

provides reinforcement only to meet minimum behavior.

Examples
 A gazelle calf struggles to its feet minutes after

being born. Half an hour later it is running at

20 miles per hour.

 A master chess player makes a move. The

choice is informed both by planning—

anticipating possible replies and

counterreplies—and by im- mediate, intuitive

judgments of the desirability of particular

positions and moves.


 Consider the familiar child’s game of tic-tac-

toe. Two players take turns playing on a

three-by-three board. One player plays Xs

and the other Os until one player wins by

placing three marks in a row, horizontally,

vertically, or diagonally, as the X player has

in this game:

X O O
O X X
X
Figure 1.1: A sequence of tic-tac-toe moves. The solid lines
represent the moves taken during a game; the dashed lines
represent moves that we (our reinforcement learning player)
considered but did not make. Our second move was an
exploratory move, meaning that it was taken even though another
sibling move, the one leading to e∗, was ranked higher.
Exploratory moves do not result in any learning, but each of our
other moves does, causing backups as suggested by the curved
arrows and detailed in the text.

You might also like