0% found this document useful (0 votes)
514 views

What Is Reinforcement Learning

Reinforcement learning is a machine learning method where an agent learns to take actions in an environment to maximize rewards. It works by evaluating actions using a reward system, rather than relying on examples. Some key applications of reinforcement learning include robotics, business optimization, and educational training systems. It is well-suited for problems involving sequential decision-making and human interaction, but may not be the best approach if sufficient labeled data is available for supervised learning.

Uploaded by

ranamzeeshan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
514 views

What Is Reinforcement Learning

Reinforcement learning is a machine learning method where an agent learns to take actions in an environment to maximize rewards. It works by evaluating actions using a reward system, rather than relying on examples. Some key applications of reinforcement learning include robotics, business optimization, and educational training systems. It is well-suited for problems involving sequential decision-making and human interaction, but may not be the best approach if sufficient labeled data is available for supervised learning.

Uploaded by

ranamzeeshan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Reinforcement Learning: What is, Algorithms, Applications,

Example
What is Reinforcement Learning?
Reinforcement Learning is defined as a Machine Learning method that is concerned with how software agents should
take actions in an environment. Reinforcement Learning is a part of the deep learning method that helps you to
maximize some portion of the cumulative reward.

This neural network learning method helps you to learn how to attain a complex objective or maximize a specific
dimension over many steps.

In Reinforcement Learning tutorial, you will learn:

 What is Reinforcement Learning?


 Important terms used in Deep Reinforcement Learning method
 How Reinforcement Learning works?
 Reinforcement Learning Algorithms
 Characteristics of Reinforcement Learning
 Types of Reinforcement Learning
 Learning Models of Reinforcement
 Reinforcement Learning vs. Supervised Learning
 Applications of Reinforcement Learning
 Why use Reinforcement Learning?
 When Not to Use Reinforcement Learning?
 Challenges of Reinforcement Learning

Important terms used in Deep Reinforcement Learning method


Here are some important terms used in Reinforcement AI:

 Agent: It is an assumed entity which performs actions in an environment to gain some reward.
 Environment (e): A scenario that an agent has to face.
 Reward (R): An immediate return given to an agent when he or she performs specific action or task.
 State (s): State refers to the current situation returned by the environment.
 Policy (?): It is a strategy which applies by the agent to decide the next action based on the current state.
 Value (V): It is expected long-term return with discount, as compared to the short-term reward.
 Value Function: It specifies the value of a state that is the total amount of reward. It is an agent which should be
expected beginning from that state.
 Model of the environment: This mimics the behavior of the environment. It helps you to make inferences to be
made and also determine how the environment will behave.
 Model based methods: It is a method for solving reinforcement learning problems which use model-based
methods.
 Q value or action value (Q): Q value is quite similar to value. The only difference between the two is that it takes
an additional parameter as a current action.

How Reinforcement Learning works?


Let's see some simple example which helps you to illustrate the reinforcement learning mechanism.

Consider the scenario of teaching new tricks to your cat

 As cat doesn't understand English or any other human language, we can't tell her directly what to do. Instead, we
follow a different strategy.
 We emulate a situation, and the cat tries to respond in many different ways. If the cat's response is the desired
way, we will give her fish.
 Now whenever the cat is exposed to the same situation, the cat executes a similar action with even more
enthusiastically in expectation of getting more reward(food).
 That's like learning that cat gets from "what to do" from positive experiences.
 At the same time, the cat also learns what not do when faced with negative experiences.

Explanation about the example:


How Reinforcement
Learning works

In this case,

 Your cat is an agent that is exposed to the environment. In this case, it is your house. An example of a state could
be your cat sitting, and you use a specific word in for cat to walk.
 Our agent reacts by performing an action transition from one "state" to another "state."
 For example, your cat goes from sitting to walking.
 The reaction of an agent is an action, and the policy is a method of selecting an action given a state in expectation
of better outcomes.
 After the transition, they may get a reward or penalty in return.

Reinforcement Learning Algorithms


There are three approaches to implement a Reinforcement Learning algorithm.

Value-Based:
In a value-based Reinforcement Learning method, you should try to maximize a value function V(s). In this method, the
agent is expecting a long-term return of the current states under policy ?.

Policy-based:
In a policy-based RL method, you try to come up with such a policy that the action performed in every state helps you to
gain maximum reward in the future.

Two types of policy-based methods are:

 Deterministic: For any state, the same action is produced by the policy ?.
 Stochastic: Every action has a certain probability, which is determined by the following equation.Stochastic Policy :

n{a\s) = P\A, = a\S, =S]

Model-Based:
In this Reinforcement Learning method, you need to create a virtual model for each environment. The agent learns to
perform in that specific environment.

Characteristics of Reinforcement Learning


Here are important characteristics of reinforcement learning
 There is no supervisor, only a real number or reward signal
 Sequential decision making
 Time plays a crucial role in Reinforcement problems
 Feedback is always delayed, not instantaneous
 Agent's actions determine the subsequent data it receives

Types of Reinforcement Learning


Two kinds of reinforcement learning methods are:

Positive:
It is defined as an event, that occurs because of specific behavior. It increases the strength and the frequency of the
behavior and impacts positively on the action taken by the agent.

This type of Reinforcement helps you to maximize performance and sustain change for a more extended period.
However, too much Reinforcement may lead to over-optimization of state, which can affect the results.

Negative:
Negative Reinforcement is defined as strengthening of behavior that occurs because of a negative condition which
should have stopped or avoided. It helps you to define the minimum stand of performance. However, the drawback of
this method is that it provides enough to meet up the minimum behavior.

Learning Models of Reinforcement


There are two important learning models in reinforcement learning:

 Markov Decision Process


 Q learning

Markov Decision Process


The following parameters are used to get a solution:
 Set of actions- A
 Set of states -S
 Reward- R
 Policy- n
 Value- V

The mathematical approach for mapping a solution in reinforcement Learning is recon as a Markov Decision Process or
(MDP).

Q-Learning
Q learning is a value-based method of supplying information to inform which action an agent should take.

Let's understand this method by the following example:

 There are five rooms in a building which are connected by doors.


 Each room is numbered 0 to 4
 The outside of the building can be one big outside area (5)
 Doors number 1 and 4 lead into the building from room 5

Next, you need to associate a reward value to each door:

 Doors which lead directly to the goal have a reward of 100


 Doors which is not directly connected to the target room gives zero reward
 As doors are two-way, and two arrows are assigned for each room
 Every arrow in the above image contains an instant reward value

Explanation:

In this image, you can view that room represents a state

Agent's movement from one room to another represents an action

In the below-given image, a state is described as a node, while the arrows show the action.
For example, an agent traverse from room number 2 to 5

 Initial state = state 2


 State 2-> state 3
 State 3 -> state (2,1,4)
 State 4-> state (0,5,3)
 State 1-> state (5,3)
 State 0-> state 4

Reinforcement Learning vs. Supervised Learning


Parameters Reinforcement Learning Supervised Learning

Decision style reinforcement learning helps you to take your In this method, a decision is made on the input
decisions sequentially. given at the beginning.

Works on Works on interacting with the environment. Works on examples or given sample data.

Dependency on In RL method learning decision is dependent. Supervised learning the decisions which are
decision Therefore, you should give labels to all the independent of each other, so labels are given
dependent decisions. for every decision.

Best suited Supports and work better in AI, where human It is mostly operated with an interactive
interaction is prevalent. software system or applications.

Example Chess game Object recognition

Applications of Reinforcement Learning


Here are applications of Reinforcement Learning:

 Robotics for industrial automation.


 Business strategy planning
 Machine learning and data processing
 It helps you to create training systems that provide custom instruction and materials according to the requirement
of students.
 Aircraft control and robot motion control

Why use Reinforcement Learning?


Here are prime reasons for using Reinforcement Learning:

 It helps you to find which situation needs an action


 Helps you to discover which action yields the highest reward over the longer period.
 Reinforcement Learning also provides the learning agent with a reward function.
 It also allows it to figure out the best method for obtaining large rewards.

When Not to Use Reinforcement Learning?


You can't apply reinforcement learning model is all the situation. Here are some conditions when you should not use
reinforcement learning model.

 When you have enough data to solve the problem with a supervised learning method
 You need to remember that Reinforcement Learning is computing-heavy and time-consuming. in particular when
the action space is large.

Challenges of Reinforcement Learning


Here are the major challenges you will face while doing Reinforcement earning:

 Feature/reward design which should be very involved


 Parameters may affect the speed of learning.
 Realistic environments can have partial observability.
 Too much Reinforcement may lead to an overload of states which can diminish the results.
 Realistic environments can be non-stationary.

Summary:
 Reinforcement Learning is a Machine Learning method
 Helps you to discover which action yields the highest reward over the longer period.
 Three methods for reinforcement learning are 1) Value-based 2) Policy-based and Model based learning.
 Agent, State, Reward, Environment, Value function Model of the environment, Model based methods, are some
important terms using in RL learning method
 The example of reinforcement learning is your cat is an agent that is exposed to the environment.
 The biggest characteristic of this method is that there is no supervisor, only a real number or reward signal
 Two types of reinforcement learning are 1) Positive 2) Negative
 Two widely used learning model are 1) Markov Decision Process 2) Q learning
 Reinforcement Learning method works on interacting with the environment, whereas the supervised learning
method works on given sample data or example.
 Application or reinforcement learning methods are: Robotics for industrial automation and business strategy
planning
 You should not use this method when you have enough data to solve the problem
 The biggest challenge of this method is that parameters may affect the speed of learning

You might also like