Exp-14 Reinforcement Learning
Exp-14 Reinforcement Learning
Faculty of Technology
Department of Information and Communication Technology
Subject: Artificial Aim: To train an agent based on the environment to find the shortest path using
Intelligence (01CT0616) Reinforcement Learning
Experiment No: 14 Date: Enrolment No: 92210133009
Aim: To train an agent based on the environment to find the shortest path using Reinforcement Learning
Theory:
Reinforcement learning is an area of Machine Learning. It is about taking suitable action to maximize reward in
a particular situation. It is employed by various software and machines to find the best possible behavior or path
it should take in a specific situation. Reinforcement learning differs from supervised learning in a way that in
supervised learning the training data has the answer key with it so the model is trained with the correct answer
itself whereas in reinforcement learning, there is no answer but the reinforcement agent decides what to do to
perform the given task. In the absence of a training dataset, it is bound to learn from its experience.
Reinforcement Learning (RL) is the science of decision making. It is about learning the optimal behavior in an
environment to obtain maximum reward. In RL, the data is accumulated from machine learning systems that
use a trial-and-error method. Data is not part of the input that we would find in supervised or unsupervised
machine learning.
Reinforcement learning uses algorithms that learn from outcomes and decide which action to take next. After
each action, the algorithm receives feedback that helps it determine whether the choice it made was correct,
neutral or incorrect. It is a good technique to use for automated systems that have to make a lot of small
decisions without human guidance.
Reinforcement learning is an autonomous, self-teaching system that essentially learns by trial and error. It
performs actions with the aim of maximizing rewards, or in other words, it is learning by doing in order to achieve
the best outcomes.
Example:
The problem is as follows: We have an agent and a reward, with many hurdles in between. The agent is supposed
to find the best possible path to reach the reward. The following problem explains the problem more easily.
1|Page
Marwadi University
Faculty of Technology
Department of Information and Communication Technology
Subject: Artificial Aim: To train an agent based on the environment to find the shortest path using
Intelligence (01CT0616) Reinforcement Learning
Experiment No: 14 Date: Enrolment No: 92210133009
The above image shows the robot, diamond, and fire. The goal of the robot is to get the reward that is the
diamond and avoid the hurdles that are fired. The robot learns by trying all the possible paths and then
choosing the path which gives him the reward with the least hurdles. Each right step will give the robot a
reward and each wrong step will subtract the reward of the robot. The total reward will be calculated when it
reaches the final reward that is the diamond.
Main points in Reinforcement learning –
• Input: The input should be an initial state from which the model will start
• Output: There are many possible outputs as there are a variety of solutions to a particular problem
• Training: The training is based upon the input, The model will return a state and the user will
decide to reward or punish the model based on its output.
• The model keeps continues to learn.
• The best solution is decided based on the maximum reward.
Types of Reinforcement:
There are two types of Reinforcement:
1. Positive: Positive Reinforcement is defined as when an event, occurs due to a particular
behavior, increases the strength and the frequency of the behavior. In other words, it has a
positive effect on behavior.
Advantages of reinforcement learning are:
• Maximizes Performance
• Sustain Change for a long period of time
• Too much Reinforcement can lead to an overload of states which can diminish the
results
2. Negative: Negative Reinforcement is defined as strengthening of behavior because a negative
condition is stopped or avoided. Advantages of reinforcement learning:
• Increases Behavior
• Provide defiance to a minimum standard of performance
• It Only provides enough to meet up the minimum behavior Reinforcement learning
elements are as follows:
1. Policy
2. Reward function
3. Value function
4. Model of the environment
2|Page
Marwadi University
Faculty of Technology
Department of Information and Communication Technology
Subject: Artificial Aim: To train an agent based on the environment to find the shortest path using
Intelligence (01CT0616) Reinforcement Learning
Experiment No: 14 Date: Enrolment No: 92210133009
Policy: Policy defines the learning agent behavior for given time period. It is a mapping from perceived states
of the environment to actions to be taken when in those states.
Reward function: Reward function is used to define a goal in a reinforcement learning problem.A reward
function is a function that provides a numerical score based on the state of the environment
Value function: Value functions specify what is good in the long run. The value of a state is the total amount of
reward an agent can expect to accumulate over the future, starting from that state. Model of the
environment: Models are used for planning.
Credit assignment problem: Reinforcement learning algorithms learn to generate an internal value for the
intermediate states as to how good they are in leading to the goal. The learning decision maker is called the
agent. The agent interacts with the environment that includes everything outside the agent.
The agent has sensors to decide on its state in the environment and takes action that modifies its state.
The reinforcement learning problem model is an agent continuously interacting with an environment. The
agent and the environment interact in a sequence of time steps. At each time step t, the agent receives the
state of the environment and a scalar numerical reward for the previous action, and then the agent then
selects an action.
Reinforcement learning is a technique for solving Markov decision problems. Reinforcement learning uses a
formal framework defining the interaction between a learning agent and its environment in terms of states,
actions, and rewards. This framework is intended to be a simple way of representing essential features of the
artificial intelligence problem.
Various Practical Applications of Reinforcement Learning –
3|Page
Marwadi University
Faculty of Technology
Department of Information and Communication Technology
Subject: Artificial Aim: To train an agent based on the environment to find the shortest path using
Intelligence (01CT0616) Reinforcement Learning
Experiment No: 14 Date: Enrolment No: 92210133009
1.
A model of the environment is known, but an analytic solution is not available;
2. Only a simulation model of the environment is given (the subject of simulation-based optimization)
3. The only way to collect information about the environment is to interact with it.
4|Page
Marwadi University
Faculty of Technology
Department of Information and Communication Technology
Subject: Artificial Aim: To train an agent based on the environment to find the shortest path using
Intelligence (01CT0616) Reinforcement Learning
Experiment No: 14 Date: Enrolment No: 92210133009
2. What are the key components of a reinforcement learning system and explain them.
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
5|Page
Marwadi University
Faculty of Technology
Department of Information and Communication Technology
Subject: Artificial Aim: To train an agent based on the environment to find the shortest path using
Intelligence (01CT0616) Reinforcement Learning
Experiment No: 14 Date: Enrolment No: 92210133009
Program (Code):
import numpy as np
import pandas as pd
import pylab as pl
import networkx as nx #give output as graph
edges=[(0,1),(1,2),(1,5),(5,6),(1,3),(8,9),(4,7),(9,10),(2,4),(0,6),(6,7),
(8,9),(7,8),(1,7),(10,8),(10,10)]
goal=10
G=nx.Graph()
G.add_edges_from(edges)
fig=nx.spring_layout(G)
nx.draw_networkx_nodes(G,fig)
nx.draw_networkx_edges(G,fig)
nx.draw_networkx_labels(G,fig)
pl.show()
Matrix_size=11
R=np.matrix(np.ones((Matrix_size,Matrix_size))) #Squar matrix
R*=-1
for edge in edges:
if(edge[1]==goal):
R[edge]=100
else:
R[edge]=0
if(edge[0]==goal):
R[edge[::-1]]=100
else:
R[edge[::-1]]=0
print(R)
Q=np.matrix(np.zeros([Matrix_size,Matrix_size]))
def next_action(state):
current_option=R[state,] #Whole row will come so blank sapce is there
available_actions=np.where(current_option>-1)[1]
print(available_actions)
6|Page
Marwadi University
Faculty of Technology
Department of Information and Communication Technology
Subject: Artificial Aim: To train an agent based on the environment to find the shortest path using
Intelligence (01CT0616) Reinforcement Learning
Experiment No: 14 Date: Enrolment No: 92210133009
next_action=int(np.random.choice(available_actions,1))
return next_action
next_action(0)
def update_Qmatrix(current_state,action,gamma):
max_index=np.where(Q[action,]==max(Q[action,]))[1]
if(max_index.shape[0]>1):
max_index=int(np.random.choice(max_index,1))
else:
max_index=int(max_index)
Q[current_state,action]=R[current_state,action]+(gamma*Q[action,max_inde
x])
if(np.max(Q)>0):
return(np.sum(Q/np.max(Q)*100))
else:
return 0
scores=[]
for i in range(1000):
initial_state=np.random.randint(0,int(Q.shape[0]))
action=next_action(initial_state)
score=update_Qmatrix(initial_state,action,0.75)
scores.append(score)
Q/np.max(Q)*100
pl.plot(scores)
current_state=2
step=[current_state]
while current_state!=10:
next_step=np.where(Q[current_state,]==np.max(Q[current_state,]))[1]
if(next_step.shape[0]>1):
next_step=int(np.random.choice(next_step,1))
else:
7|Page
Marwadi University
Faculty of Technology
Department of Information and Communication Technology
Subject: Artificial Aim: To train an agent based on the environment to find the shortest path using
Intelligence (01CT0616) Reinforcement Learning
Experiment No: 14 Date: Enrolment No: 92210133009
next_step=int(next_step)
step.append(next_step)
current_state=next_step
print(step)
Results:
Observation:
_____________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________
_____________________________________________________________________________
8|Page
Marwadi University
Faculty of Technology
Department of Information and Communication Technology
Subject: Artificial Aim: To train an agent based on the environment to find the shortest path using
Intelligence (01CT0616) Reinforcement Learning
Experiment No: 14 Date: Enrolment No: 92210133009
CODE:-
import numpy as np
import pandas as pd
import pylab as pl
import networkx as nx #give output as graph
maze = [
["1", "1", "1", "1", "1", "1", "1"],
["0", "0", "0", "0", "0", "0", "1"],
["1", "1", "1", "0", "1", "0", "1"],
["1", "0", "0", "0", "0", "0", "1"],
["1", "0", "1", "1", "1", "0", "1"],
["1", "0", "0", "0", "0", "0", "G"],
]
maze = np.array(maze)
rewards = {
'goal': 100,
'obstacle': -100,
'step': -1
}
9|Page
Marwadi University
Faculty of Technology
Department of Information and Communication Technology
Subject: Artificial Aim: To train an agent based on the environment to find the shortest path using
Intelligence (01CT0616) Reinforcement Learning
Experiment No: 14 Date: Enrolment No: 92210133009
if action == 'U':
next_state = (max(0, i-1), j)
elif action == 'D':
next_state = (min(maze.shape[0]-1, i+1), j)
elif action == 'L':
next_state = (i, max(0, j-1))
elif action == 'R':
next_state = (i, min(maze.shape[1]-1, j+1))
def q_learning(num_episodes):
for episode in range(num_episodes):
state = (5, 0) # starting state
done = False
10 | P a g e
Marwadi University
Faculty of Technology
Department of Information and Communication Technology
Subject: Artificial Aim: To train an agent based on the environment to find the shortest path using
Intelligence (01CT0616) Reinforcement Learning
Experiment No: 14 Date: Enrolment No: 92210133009
# Update Q-value
Q[state[0], state[1], actions.index(action)] += alpha *
(reward + gamma * np.max(Q[next_state[0], next_state[1]]) - Q[state[0],
state[1], actions.index(action)])
state = next_state
q_learning(num_episodes=1000)
def find_optimal_path():
state = (5, 0)
path = [state]
return path
optimal_path = find_optimal_path()
print("Optimal Path:")
for i, j in optimal_path:
print(f"({i}, {j}) -> ", end='')
print("Goal")
OUTPUT:-
11 | P a g e