What Is Reinforcement Learning
What Is Reinforcement Learning
In the above diagram, the agent is represented in a particular state. The agent takes action in an
environment to achieve a particular task. As a result of the performed task, the agent receives feedback
as a reward or punishment.
The working of reinforcement learning can be understood by the approach of a master chess player.
Exploration − Just like how a chess play considers various possible move and their outcome,
the agent also explores different actions to understand their effects and learns which action
would lead to better result.
Exploitation − The chess player also uses intuition, based on past experiences to make
decisions that seem right. Similarly, the agent uses knowledge gained from previous
experiences to make best choices.
Explore our latest online courses and learn new skills at your own pace. Enroll and become a certified
expert to boost your career.
Policy − It defines the learning agent's way of behaving at a given time. A policy is a mapping
from perceived states of the environment to actions to be taken when in those states.
Reward Signal − It defines the goal of a reinforcement learning problem. It is a numerical score
received to the agent by the environment. This reward signal defines what are the good and bad
events for the agent.
Value function − It specifies what is good in the long run. The value is the total amount of
reward an agent can expect to accumulate over the future, starting from that state.
Model − Models are used for planning, which means deciding on a course of action by
considering possible future situations before they are actually experienced.
States(S) − Represents all the situations in which an agent can find itself.
Action(A) − The choices available for the agent from the gives states.
Transition Probabilities(P) − The likelihood of moving from one state to another as a result of a
specific action.
Rewards(R) − Feedback received after transitioning to a new state due to an action, indication
the outcome's desirability.
Policy( ) − A strategy that defines the action to take in each state for achieving a reward.
Steps in Reinforcement Learning Process
Here are the major steps involved in reinforcement learning methods −
Step 1 − First, we need to prepare an agent with some initial set of strategies.
Step 3 − Next, select the optimal policy regards the current state of the environment and
perform important action.
Step 4 − Now, the agent can get corresponding reward or penalty as per accordance with the
action taken by it in previous step.
Step 6 − At last, repeat steps 2-5 until the agent got to learn & adopt the optimal policies.
Reinforcement learning model can adapt to wide range of environments including static and
dynamic.
Reinforcement learning can be used to solve wide range of problems, including decision
making, prediction and optimization.
Reinforcement learning depends on the quality of the reward function, if it is poorly designed,
the model can never get better with its performance.
The designing and tuning of reinforcement learning can be complex and requires expertise.
1. Robotics
In Natural Language Processing (NLP), Reinforcement learning is used to enhance the performance of
chatbots by managing complex dialogues and improving user interactions. Additionally, this learning
approach is also used to train models for tasks like summarizations.