What Is Reinforcement Learning
What Is Reinforcement Learning
Example
What is Reinforcement Learning?
Reinforcement Learning is defined as a Machine Learning method that is concerned with how software agents should
take actions in an environment. Reinforcement Learning is a part of the deep learning method that helps you to
maximize some portion of the cumulative reward.
This neural network learning method helps you to learn how to attain a complex objective or maximize a specific
dimension over many steps.
Agent: It is an assumed entity which performs actions in an environment to gain some reward.
Environment (e): A scenario that an agent has to face.
Reward (R): An immediate return given to an agent when he or she performs specific action or task.
State (s): State refers to the current situation returned by the environment.
Policy (?): It is a strategy which applies by the agent to decide the next action based on the current state.
Value (V): It is expected long-term return with discount, as compared to the short-term reward.
Value Function: It specifies the value of a state that is the total amount of reward. It is an agent which should be
expected beginning from that state.
Model of the environment: This mimics the behavior of the environment. It helps you to make inferences to be
made and also determine how the environment will behave.
Model based methods: It is a method for solving reinforcement learning problems which use model-based
methods.
Q value or action value (Q): Q value is quite similar to value. The only difference between the two is that it takes
an additional parameter as a current action.
As cat doesn't understand English or any other human language, we can't tell her directly what to do. Instead, we
follow a different strategy.
We emulate a situation, and the cat tries to respond in many different ways. If the cat's response is the desired
way, we will give her fish.
Now whenever the cat is exposed to the same situation, the cat executes a similar action with even more
enthusiastically in expectation of getting more reward(food).
That's like learning that cat gets from "what to do" from positive experiences.
At the same time, the cat also learns what not do when faced with negative experiences.
In this case,
Your cat is an agent that is exposed to the environment. In this case, it is your house. An example of a state could
be your cat sitting, and you use a specific word in for cat to walk.
Our agent reacts by performing an action transition from one "state" to another "state."
For example, your cat goes from sitting to walking.
The reaction of an agent is an action, and the policy is a method of selecting an action given a state in expectation
of better outcomes.
After the transition, they may get a reward or penalty in return.
Value-Based:
In a value-based Reinforcement Learning method, you should try to maximize a value function V(s). In this method, the
agent is expecting a long-term return of the current states under policy ?.
Policy-based:
In a policy-based RL method, you try to come up with such a policy that the action performed in every state helps you to
gain maximum reward in the future.
Deterministic: For any state, the same action is produced by the policy ?.
Stochastic: Every action has a certain probability, which is determined by the following equation.Stochastic Policy :
Model-Based:
In this Reinforcement Learning method, you need to create a virtual model for each environment. The agent learns to
perform in that specific environment.
Positive:
It is defined as an event, that occurs because of specific behavior. It increases the strength and the frequency of the
behavior and impacts positively on the action taken by the agent.
This type of Reinforcement helps you to maximize performance and sustain change for a more extended period.
However, too much Reinforcement may lead to over-optimization of state, which can affect the results.
Negative:
Negative Reinforcement is defined as strengthening of behavior that occurs because of a negative condition which
should have stopped or avoided. It helps you to define the minimum stand of performance. However, the drawback of
this method is that it provides enough to meet up the minimum behavior.
The mathematical approach for mapping a solution in reinforcement Learning is recon as a Markov Decision Process or
(MDP).
Q-Learning
Q learning is a value-based method of supplying information to inform which action an agent should take.
Explanation:
In the below-given image, a state is described as a node, while the arrows show the action.
For example, an agent traverse from room number 2 to 5
Decision style reinforcement learning helps you to take your In this method, a decision is made on the input
decisions sequentially. given at the beginning.
Works on Works on interacting with the environment. Works on examples or given sample data.
Dependency on In RL method learning decision is dependent. Supervised learning the decisions which are
decision Therefore, you should give labels to all the independent of each other, so labels are given
dependent decisions. for every decision.
Best suited Supports and work better in AI, where human It is mostly operated with an interactive
interaction is prevalent. software system or applications.
When you have enough data to solve the problem with a supervised learning method
You need to remember that Reinforcement Learning is computing-heavy and time-consuming. in particular when
the action space is large.
Summary:
Reinforcement Learning is a Machine Learning method
Helps you to discover which action yields the highest reward over the longer period.
Three methods for reinforcement learning are 1) Value-based 2) Policy-based and Model based learning.
Agent, State, Reward, Environment, Value function Model of the environment, Model based methods, are some
important terms using in RL learning method
The example of reinforcement learning is your cat is an agent that is exposed to the environment.
The biggest characteristic of this method is that there is no supervisor, only a real number or reward signal
Two types of reinforcement learning are 1) Positive 2) Negative
Two widely used learning model are 1) Markov Decision Process 2) Q learning
Reinforcement Learning method works on interacting with the environment, whereas the supervised learning
method works on given sample data or example.
Application or reinforcement learning methods are: Robotics for industrial automation and business strategy
planning
You should not use this method when you have enough data to solve the problem
The biggest challenge of this method is that parameters may affect the speed of learning