0% found this document useful (0 votes)

42 views

AI Lab

The document discusses game search algorithms like minimax algorithm. It explains the minimax algorithm which is a recursive algorithm used for decision making in games. It searches the game tree and assumes both players play optimally, with one player trying to maximize their score and the other minimizing it.

Uploaded by

2dkrv5rjmj

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views

AI Lab

Uploaded by

2dkrv5rjmj

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

ST.

SOLDIER INSTITUTE OF
ENGINEERING & TECHNOLOGY
SESSION – (2019-2023)
PRACTICAL FILE OF ARTIFICIAL
INTELLIGENCE LAB
(BTCS 605-18)
INDEX
SR. NO. AIM PAGE NO. SIGNATURE
1. Write a programme to conduct 1 – 12
uninformed and informed
search.
2. Write a programme to conduct 13 – 17
game search.
3. Write a programme to 18 – 23
construct a Bayesian network
from given data.
4. Write a programme to infer 24 – 28
from the Bayesian network.

5. Write a programme to run 29 – 33

value and policy iteration in a
grid world.
6. Write a programme to do 34 - 40
reinforcement learning in a
grid world
EXPERIMENT N0. 1
AIM : Write a program to conduct uniformed and informed search.
THEORY :
Uninformed Search Algorithm
Uninformed search is a class of general-purpose search algorithms which
operates in brute force-way. Uninformed search algorithms do not have
additional information about state or search space other than how to
traverse the tree, so it is also called blind search.
Following are the various types of uninformed search algorithms:
1.Breadth-first Search
2.Depth-first Search
1. Breadth-first Search:
Breadth-first search is the most common search strategy for traversing a
tree or graph. This algorithm searches breadthwise in a tree or graph, so
it is called breadth-first search. BFS algorithm starts searching from the
root node of the tree and expands all successor node at the current level
before moving to nodes of next level. The breadth-first search algorithm
is an example of a general-graph search algorithm. Breadth-first search
implemented using FIFO queue data structure.
Code for BFS :
#include<bits/stdc++.h>
using namespace std;
// This class represents a directed graph using
// adjacency list representation
class Graph
{
int V; // No. of vertices
// Pointer to an array containing adjacency
// lists
vector<list<int>> adj;
public:
Graph(int V); // Constructor
// function to add an edge to graph
void addEdge(int v, int w);
// prints BFS traversal from a given source s
void BFS(int s);
};
Graph::Graph(int V)
{
this->V = V;
adj.resize(V);
}
void Graph::addEdge(int v, int w)
{
adj*v+.push_back(w); // Add w to v’s list.
}
void Graph::BFS(int s)
{
// Mark all the vertices as not visited
vector<bool> visited;
visited.resize(V,false);
// Create a queue for BFS
list<int> queue;
// Mark the current node as visited and enqueue it
visited[s] = true;
queue.push_back(s);
while(!queue.empty())
{
// Dequeue a vertex from queue and print it
s = queue.front();
cout << s << " ";
queue.pop_front();
// Get all adjacent vertices of the dequeued
// vertex s. If a adjacent has not been visited,
// then mark it visited and enqueue it
for (auto adjecent: adj[s])
{
if (!visited[adjecent])
{
visited[adjecent] = true;
queue.push_back(adjecent);
}
}
}
}

// Driver program to test methods of graph class

int main()
{
// Create a graph given in the above diagram
Graph g(4);
g.addEdge(0, 1);
g.addEdge(0, 2);
g.addEdge(1, 2);
g.addEdge(2, 0);
g.addEdge(2, 3);
g.addEdge(3, 3);

cout << "Following is Breadth First Traversal "

<< "(starting from vertex 2) \n";
g.BFS(2);

return 0;
}
2. Depth-first Search
Depth-first search isa recursive algorithm for traversing a tree or graph
data structure. It is called the depth-first search because it starts from the
root node and follows each path to its greatest depth node before
moving to the next path. DFS uses a stack data structure for its
implementation. The process of the DFS algorithm is similar to the BFS
algorithm
EXPERIMENT NO. 1

AIM : Write a program to conduct uniformed and informed search

Output:
Following is Breadth First Traversal (starting from vertex 2)
2031
Code for DFS:
#include <bits/stdc++.h>
using namespace std;
// Graph class represents a directed graph
// using adjacency list representation
class Graph {
public:
map<int, bool> visited;
map<int, list<int> > adj;
// function to add an edge to graph
void addEdge(int v, int w);
// DFS traversal of the vertices
// reachable from v
void DFS(int v);
};
void Graph::addEdge(int v, int w)
{
adj*v+.push_back(w); // Add w to v’s list.
}
void Graph::DFS(int v)
{
// Mark the current node as visited and
// print it
visited[v] = true;
cout << v << " ";
// Recurse for all the vertices adjacent
// to this vertex
list<int>::iterator i;
for (i = adj[v].begin(); i != adj[v].end(); ++i)
if (!visited[*i])
DFS(*i);
}
// Driver code
int main()
{
// Create a graph given in the above diagram
Graph g;
g.addEdge(0, 1);
g.addEdge(0, 2);
g.addEdge(1, 2);
g.addEdge(2, 0);
g.addEdge(2, 3);
g.addEdge(3, 3);

cout << "Following is Depth First Traversal"

" (starting from vertex 2) \n";
g.DFS(2);

return 0;
}
Informed Search:
Informed Search algorithms have information on the goal state which
helps in more efficient searching. This information is obtained by a
function that estimates how close a state is to the goal state.
In the informed search main algorithm which is given below:
1.A* Search Algorithm
A* search is the most commonly known form of best-first search. It uses
heuristic function h(n), and cost to reach the node n from the start state
g(n). It has combined features of UCS and greedy best-first search, by
which it solve the problem efficiently. A* search algorithm finds the
shortest path through the search space using the heuristic function. This
search algorithm expands less search tree and provides optimal result
faster. A* algorithm is similar to UCS except that it uses g(n)+h(n) instead
of g(n).
Code for A* algorithm
#include <iostream>
#include "source/AStar.hpp"
int main()
{
AStar::Generator generator;
// Set 2d map size.
generator.setWorldSize({25, 25});
// You can use a few heuristics : manhattan, euclidean or octagonal.
generator.setHeuristic(AStar::Heuristic::euclidean);
generator.setDiagonalMovement(true);
std::cout << "Generate path ... \n";
// This method returns vector of coordinates from target to source.
auto path = generator.findPath({0, 0}, {20, 20});
for(auto& coordinate : path) {
std::cout << coordinate.x << " " << coordinate.y << "\n";
}
}
OUTPUT OF A* algorithm
OUTPUT :
Following is Depth First Traversal (starting from vertex 2)
2013
EXPERIMENT N0. 2
AIM : Write a program to conduct game search.
THEORY : Game playing was one of the first tasks undertaken in Artificial
Intelligence. Game theory has its history from 1950, almost from the days
when computers became programmable. The very first game that is been
tackled in AI is chess. Initiators in the field of game theory in AI were
Konard Zuse (the inventor of the first programmable computer and the
first programming language), Claude Shannon (the inventor of
information theory), Norbert Wiener (the creator of modern control
theory), and Alan Turing. Since then, there has been a steady progress in
the standard of play, to the point that machines have defeated human
champions (although not every time) in chess and backgammon, and are
competitive in many other games.
Types of Game
1. Perfect Information Game: In which player knows all the possible
moves of himself and opponent and their results.
E.g. Chess. 2
2. Imperfect Information Game: In which player does not know all the
possible moves of the opponent.
E.g. Bridge since all the cards are not visible to player
Mini-Max Algorithm in Artificial Intelligence:
Mini-max algorithm is a recursive or backtracking algorithm which is used
in decision-making and game theory. It provides an optimal move for the
player assuming that opponent is also playing optimally. Mini-Max
algorithm uses recursion to search through the game-tree. Min-Max
algorithm is mostly used for game playing in AI. Such as Chess, Checkers,
tic-tac-toe, go, and various tow-players game. This Algorithm computes
the minimax decision for the current state .In this algorithm two players
play the game, one is called MAX and other is called MIN. Both the
players fight it as the opponent player gets the minimum benefit while
they get the maximum benefit. Both Players of the game are opponent of
each other, where MAX will select the maximized value and MIN will
select the minimized value. The minimax algorithm performs a depth-first
search algorithm for the exploration of the complete game tree. The
minimax algorithm proceeds all the way down to the terminal node of
the tree, then backtrack the tree as the recursion.
Code for minmax algorithm :
// A simple C++ program to find
// maximum score that
// maximizing player can get.
#include<bits/stdc++.h>
using namespace std;

// Returns the optimal value a maximizer can obtain.

// depth is current depth in game tree.
// nodeIndex is index of current node in scores[].
// isMax is true if current move is
// of maximizer, else false
// scores[] stores leaves of Game tree.
// h is maximum height of Game tree
int minimax(int depth, int nodeIndex, bool isMax,
int scores[], int h)
{

// Terminating condition. i.e

// leaf node is reached

if (depth == h)

return scores[nodeIndex];

// If current move is maximizer,

// find the maximum attainable

// value

if (isMax)

return max(minimax(depth+1, nodeIndex*2, false, scores, h),

minimax(depth+1, nodeIndex*2 + 1, false, scores, h));

// Else (If current move is Minimizer), find the minimum

// attainable value

else

return min(minimax(depth+1, nodeIndex*2, true, scores, h),

minimax(depth+1, nodeIndex*2 + 1, true, scores, h));

// A utility function to find Log n in base 2

int log2(int n)

return (n==1)? 0 : 1 + log2(n/2);

// Driver code

int main()
{

// The number of elements in scores must be

// a power of 2.

int scores[] = {3, 5, 2, 9, 12, 5, 23, 23};

int n = sizeof(scores)/sizeof(scores[0]);

int h = log2(n);

int res = minimax(0, 0, true, scores, h);

cout << "The optimal value is : " << res << endl;

return 0;

}
EXPERIMENT N0. 2

AIM : Write a program to conduct game search

Output:

The optimal value is: 12

EXPERIMENT N0. 3
AIM : Write a program to construct a Bayesian network from given data
THEORY: A Bayesian network is a directed acyclic graph in which each
edge corresponds to a conditional dependency, and each node
corresponds to a unique random variable. Bayesian network consists of
two major parts: a directed acyclic graph and a set of conditional
probability distributions
1. The directed acyclic graph is a set of random variables represented
by nodes.
2. The conditional probability distribution of a node (random variable)
is defined for every possible outcome of the preceding causal
node(s).
For illustration, consider the following example. Suppose we attempt to
turn on our computer, but the computer does not start
(observation/evidence). We would like to know which of the possible
causes of computer failure is more likely. In this simplified illustration, we
assume only two possible causes of this misfortune: electricity failure and
computer malfunction.
The corresponding directed acyclic graph is depicted in below figure

Electricity failure Computer

causes malfunction

evidence
Computer failure
The goal is to calculate the posterior conditional probability distribution
of each of the possible unobserved causes given the observed evidence,
i.e. P [Cause | Evidence].
Data Set:
Title: Heart Disease Databases
The Cleveland database contains 76 attributes, but all published
experiments refer to using a
subset of 14 of them. In particular, the Cleveland database is the only one
that has been used
by ML researchers to this date. The "Heartdisease" field refers to the
presence of heart disease in the patient. It is integer valued from 0 (no
presence) to 4.
Database: 0 1 2 3 4 Total
Cleveland: 164 55 36 35 13 303
Attribute Information:
1. age: age in years
2. sex: sex (1 = male; 0 = female)
3. cp: chest pain type

-anginal pain
4. trestbps: resting blood pressure (in mm Hg on admission to the
hospital)
5. chol: serum cholestoral in mg/dl
6. fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
7. restecg: resting electrocardiographic results

-T wave abnormality (T wave inversions and/or ST

elevation
or depression of > 0.05 mV)
hypertrophy by
Estes'
criteria
8. thalach: maximum heart rate achieved
9. exang: exercise induced angina (1 = yes; 0 = no)
10. oldpeak = ST depression induced by exercise relative to rest
11.slope: the slope of the peak exercise ST segment
ping

12. ca = number of major vessels (0-3) colored by flourosopy

13. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
14.Heartdisease: It is integer valued from 0 (no presence) to 4. Diagnosis
of heart disease (angiographic disease status)
Some instance from the dataset:
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal Heart
disease
63 1 1 145 1 1 2 150 0 2.3 3 0 6 0
67 1 4 160 4 0 2 108 1 1.5 2 3 3 2
67 1 4 120 4 0 2 129 1 2.6 2 2 7 1
41 0 2 130 2 0 2 172 0 1.4 1 O 3 0
62 0 4 140 4 0 2 160 0 3.6 3 2 3 3
60 1 4 130 206 0 2 132 1 2.4 2 2 7 4
Code :
import numpy as np
import csv
import pandas as pd
from pgmpy.models import BayesianModel
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.inference import VariableElimination
#read Cleveland Heart Disease data
heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?',np.nan)
#display the data
print('Few examples from the dataset are given below')
print(heartDisease.head())
#Model Bayesian Network
Model=BayesianModel([('age','trestbps'),('age','fbs'),
('sex','trestbps'),('exang','trestbps'),('trestbps','heartdise
ase'),('fbs','heartdisease'),('heartdisease','restecg'),
('heartdisease','thalach'),('heartdisease','chol')])
#Learning CPDs using Maximum Likelihood Estimators
print('\n Learning CPD using Maximum likelihood estimators')
model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)
# Inferencing with Bayesian Network
print('\n Inferencing with Bayesian Network:')
HeartDisease_infer = VariableElimination(model)
#computing the Probability of HeartDisease given Age
print('\n 1. Probability of HeartDisease given Age=30')
q=HeartDisease_infer.query(variables=['heartdisease'],evidence
={'age':28})
print(q['heartdisease'])
#computing the Probability of HeartDisease given cholesterol
print('\n 2. Probability of HeartDisease given cholesterol=100')
q=HeartDisease_infer.query(variables=['heartdisease'],evidence
={'chol':100})
print(q['heartdisease'])
EXPERIMENT N0. 3
AIM : Write a program to construct a Bayesian network from given data
Output:
Few examples from the dataset are given below
Age sex cp trestbps ... slope ca thal heartdisease
0 63 1 1 145 … 3 0 6 0
1 67 1 4 160 … 2 3 3 2
2 67 1 4 120 … 2 2 7 1
3 37 1 3 130 … 3 0 3 0
4 41 0 2 130 … 1 0 3 0
EXPERIMENT N0. 4
AIM: Write a program to infer from the Bayesian network.
THEORY : An acyclic directed graph is used to create a Bayesian network,
which is a probability model. It’s factored by utilizing a single conditional
probability distribution for each variable in the model, whose distribution
is based on the parents in the graph. The simple principle of probability
underpins Bayesian models. So, first, let’s define conditional probability
and joint probability distribution.
Conditional Probability
Conditional probability is a measure of the likelihood of an event
occurring provided that another event has already occurred (through
assumption, supposition, statement, or evidence). If A is the event of
interest and B is known or considered to have occurred, the conditional
probability of A given B is generally stated as P(A|B) or, less frequently,
PB(A) if A is the event of interest and B is known or thought to have
occurred. This can also be expressed as a percentage of the likelihood of
B crossing with A:

Joint Probability
The chance of two (or more) events together is known as the joint
probability. The sum of the probabilities of two or more random variables
is the joint probability distribution. For example, the joint probability of
events A and B is expressed formally as:
The letter P is the first letter of the alphabet (A and B).
The upside-down capital “U” operator or, in some situations, a comma “,”
represents the “and” or conjunction.
P(A ^ B)
P(A, B)
By multiplying the chance of event A by the likelihood of event B, the
combined probability for occurrences A and B is calculated.
Posterior Probability
In Bayesian statistics, the conditional probability of a random occurrence
or an ambiguous assertion is the conditional probability given the
relevant data or background. “After taking into account the relevant
evidence pertinent to the specific subject under consideration,”
“posterior” means in this case. The probability distribution of an
unknown quantity interpreted as a random variable based on data from
an experiment or survey is known as the posterior probability
distribution.
Inferencing with Bayesian Network
In this demonstration, we’ll use Bayesian Networks to solve the well-
known Monty Hall Problem. Let me explain the Monty Hall problem to
those of you who are unfamiliar with it:
This problem entails a competition in which a contestant must choose
one of three doors, one of which conceals a price. The show’s host
(Monty) unlocks an empty door and asks the contestant if he wants to
swap to the other door after the contestant has chosen one. The decision
is whether to keep the current door or replace it with a new one. It is
preferable to enter by the other door because the price is more likely to
be higher. To come out from this ambiguity let’s model this with a
Bayesian network.
CODE :
from pgmpy.models import BayesianNetwork
from pgmpy.factors.discrete import TabularCPD
import networkx as nx
import pylab as plt
# Defining Bayesian Structure
model = BayesianNetwork([('Guest', 'Host'), ('Price', 'Host')])
# Defining the CPDs:
cpd_guest = TabularCPD('Guest', 3, [[0.33], [0.33], [0.33]])
cpd_price = TabularCPD('Price', 3, [[0.33], [0.33], [0.33]])
cpd_host = TabularCPD('Host', 3, [[0, 0, 0, 0, 0.5, 1, 0, 1, 0.5],
[0.5, 0, 1, 0, 0, 0, 1, 0, 0.5],
[0.5, 1, 0, 1, 0.5, 0, 0, 0, 0]],
evidence=['Guest', 'Price'], evidence_card=[3, 3])
# Associating the CPDs with the network structure.
model.add_cpds(cpd_guest, cpd_price, cpd_host)
model.check_model()
# Infering the posterior probability
from pgmpy.inference import VariableElimination
infer = VariableElimination(model)
posterior_p = infer.query(['Host'], evidence={'Guest': 2, 'Price': 2})
print(posterior_p)
EXPERIMENT N0. 4
AIM: Write a program to infer from the Bayesian network
OUTPUT :
EXPERIMENT NO. 5
AIM: Write a program to run value and policy iteration in a grid world
THEORY : Value Iteration
With the tools we have explored until now, a new question arises: why
do we need to consider an initial policy at all? The idea of the value
iteration algorithm is that we can compute the value function without a
policy. Instead of letting the policy, π, dictate which actions are selected,
we will select those actions that maximize the expected reward:

CODE FOR VALUE ITERATION :

def valueIteration(self, gridWorld, gamma = 1):
self.resetPolicy() # ensure empty policy before calling evaluatePolicy
V_old = None
V_new = np.repeat(0, gridWorld.size())
convergedCellIndices = np.zeros(0)
while len(convergedCellIndices) != len(V_new):
V_old = V_new
V_new = self.evaluatePolicySweep(gridWorld, V_old, gamma,
convergedCellIndices)
convergedCellIndices = self.findConvergedCells(V_old, V_new)
greedyPolicy = findGreedyPolicy(V_new, gridWorld, self.gameLogic)
self.setPolicy(greedyPolicy)
self.setWidth(gridWorld.getWidth())
self.setHeight(gridWorld.getHeight())
return(V_new)
POLICY ITERATION :
A simple strategy for this is a greedy algorithm that iterates over all the
cells in the grid and then chooses the action that maximizes the expected
reward according to the value function.
This approach implicitly determines the action-value function, which is
defined as
Qπ(s,a)=∑s′Pass′*Rass′+γVπ(s′)+Qπ(s,a)=∑s′Pss′a*Rss′a+γVπ(s′)+
The improvePolicy function determines the value function of a policy (if
it’s not available yet) and then calls findGreedyPolicy to identify the
optimal action for every state:
def improvePolicy(policy, gridWorld, gamma = 1):
policy = copy.deepcopy(policy) # dont modify old policy
if len(policy.values) == 0:
# policy needs to be evaluated first
policy.evaluatePolicy(gridWorld)
greedyPolicy = findGreedyPolicy(policy.getValues(), gridWorld, \
policy.gameLogic, gamma)
policy.setPolicy(greedyPolicy)
return policy

def findGreedyPolicy(values, gridWorld, gameLogic, gamma = 1):

# create a greedy policy based on the values param
stateGen = StateGenerator()
greedyPolicy = [Action(Actions.NONE)] * len(values)
for (i, cell) in enumerate(gridWorld.getCells()):
gridWorld.setActor(cell)
if not cell.canBeEntered():
continue
maxPair = (Actions.NONE, -np.inf)
for actionType in Actions:
if actionType == Actions.NONE:
continue
proposedCell = gridWorld.proposeMove(actionType)
if proposedCell is None:
# action is nonsensical in this state
continue
Q = 0.0 # action-value function
proposedStates = stateGen.generateState(gridWorld, actionType, cell)
for proposedState in proposedStates:
actorPos = proposedState.getIndex()
transitionProb = gameLogic.getTransitionProbability(cell, proposedState,
actionType, gridWorld)
reward = gameLogic.R(cell, proposedState, actionType)
expectedValue = transitionProb * (reward + gamma * values[actorPos])
Q += expectedValue
if Q > maxPair[1]:
maxPair = (actionType, Q)
gridWorld.unsetActor(cell) # reset state
greedyPolicy[i] = Action(maxPair[0])
return greedyPolicy
EXPERIMENT NO. 5
AIM : Write a program to run value and policy iteration in a grid world.
OUTPUT :
OUTPUT :
EXPERIMENT NO : 6
AIM : Write a program to do reinforcement learning in a grid world .

THEORY : Reinforcement Learning (RL) involves decision making

under uncertainty which tries to maximize return over successive
states.There are four main elements of a Reinforcement Learning system:
a policy, a reward signal, a value function. The policy is a mapping from
the states to actions or a probability distribution of actions. Every action
the agent takes results in a numerical reward. The agent’s sole purpose is
to maximize the reward in the long run.
Reinforcement Learning involves decision making under uncertainty
which tries to maximize return over successive states.There are four main
elements of a Reinforcement Learning system: a policy, a reward signal, a
value function. The policy is a mapping from the states to actions or a
probability distribution of actions. Every action the agent takes results in
a numerical reward. The agent’s sole purpose is to maximize the reward
in the long run. The reward indicates the immediate return, a value
function specifies the return in the long run. Value of a state is the
expected reward that an agent can accrue The agent/robot takes an
action in At in state St and moves to state S’t anf gets a reward Rt+1 as
shown
An agent will seek to maximize the overall return as it transition across states
The expected return can be expressed as
where is the expected return in time t and the discounted expected
return in time t+1

A policy is a mapping from states to probabilities of selecting each possible action. If the
agent is following policy at time t, then is the probability that = a if = s.

The value function of a state s under a policy , denoted , is the expected return when
starting in s and following thereafter

This can be written as

Similarly the action value function gives the expected return when taking an action ‘a’ in
state ‘s’

These are Bellman’s equation for the state value function

The Bellman equations give the equation for each of the state

The Bellman optimality equations give the optimal policy of choosing specific actions in
specific states to achieve the maximum reward and reach the goal efficiently. They are given
as

The Bellman equations cannot be used directly in goal directed problems and dynamic
programming is used instead where the value functions are computed iteratively
In the problem below the Maze has 2 end states as shown in the corner. There are four
possible actions in each state up, down, right and left. If an action in a state takes it out of
the grid then the agent remains in the same state. All actions have a reward of -1 while the
end states have a reward of 0
This is shown as

where the reward for any transition is Rt=−1Rt=−1 except the transition to the end states at
the corner which have a reward of 0. The policy is a uniform policy with all actions being
equi-probable with a probability of 1/4 or 0.25

1. Gridworld-1
In [1]:
import numpy as np
import random
In [2]:
gamma = 1 # discounting rate
gridSize = 4
rewardValue = -1
terminationStates = [[0,0], [gridSize-1, gridSize-1]]
actions = [[-1, 0], [1, 0], [0, 1], [0, -1]]
numIterations = 1000

The action value provides the next state for a given action in a state and the accrued reward
In [3]:
def actionValue(initialPosition,action):
if initialPosition in terminationStates:
finalPosition = initialPosition
reward=0
else:
#Compute final position
finalPosition = np.array(initialPosition) + np.array(action)
reward= rewardValue
# If the action moves the finalPosition out of the grid, stay in same cell
if -1 in finalPosition or gridSize in finalPosition:
finalPosition = initialPosition
reward= rewardValue

#print(finalPosition)
return finalPosition, reward

1a. Bellman Update

In [4]:
# Initialize valueMap and valueMap1
valueMap = np.zeros((gridSize, gridSize))
valueMap1 = np.zeros((gridSize, gridSize))
states = [[i, j] for i in range(gridSize) for j in range(gridSize)]
In [5]:
def policy_evaluation(numIterations,gamma,theta,valueMap):
for i in range(numIterations):
delta=0
for state in states:
weightedRewards=0
for action in actions:
finalPosition,reward = actionValue(state,action)
weightedRewards += 1/4* (reward + gamma *
valueMap[finalPosition[0],finalPosition][1])
valueMap1[state[0],state[1]]=weightedRewards
delta =max(delta,abs(weightedRewards-valueMap[state[0],state[1]]))
valueMap = np.copy(valueMap1)
if(delta < 0.01):
print(valueMap)
break
In [6]:
valueMap = np.zeros((gridSize, gridSize))
valueMap1 = np.zeros((gridSize, gridSize))
states = [[i, j] for i in range(gridSize) for j in range(gridSize)]
policy_evaluation(1000,1,0.001,valueMap)
[[ 0. -13.89528403 -19.84482978 -21.82635535]
[-13.89528403 -17.86330422 -19.84586777 -19.84482978]
[-19.84482978 -19.84586777 -17.86330422 -13.89528403]
[-21.82635535 -19.84482978 -13.89528403 0. ]]
In [7]:
valueMap = np.zeros((gridSize, gridSize))
valueMap1 = np.zeros((gridSize, gridSize))
states = [[i, j] for i in range(gridSize) for j in range(gridSize)]
pi = np.ones((gridSize,gridSize))/4
pi1 = np.chararray((gridSize, gridSize))
pi1[:] = 'a'
In [8]:
# Compute the value state function for the Grid
def policy_evaluate(states,actions,gamma,valueMap):
#print("iterations=",i)
for state in states:
weightedRewards=0
for action in actions:
finalPosition,reward = actionValue(state,action)
weightedRewards += 1/4* (reward + gamma * valueMap[finalPosition[0],finalPosition][1])
# Set the computed weighted rewards to valueMap1
valueMap1[state[0],state[1]]=weightedRewards
# Copy to original valueMap
valueMap = np.copy(valueMap1)
return(valueMap)
In [9]:
def argmax(q_values):
idx=np.argmax(q_values)
return(np.random.choice(np.where(a==a[idx])[0].tolist()))

# Compute the best action in each state

def greedify_policy(state,pi,pi1,gamma,valueMap):
q_values=np.zeros(len(actions))
for idx,action in enumerate(actions):
finalPosition,reward = actionValue(state,action)
q_values[idx] += 1/4* (reward + gamma * valueMap[finalPosition[0],finalPosition][1])
# Find the index of the action for which the q_value is
idx=q_values.argmax()
pi[state[0],state[1]]=idx
if(idx == 0):
pi1[state[0],state[1]]='u'
elif(idx == 1):
pi1[state[0],state[1]]='d'
elif(idx == 2):
pi1[state[0],state[1]]='r'
elif(idx == 3):
pi1[state[0],state[1]]='l'

In [10]:
def improve_policy(pi, pi1,gamma,valueMap):
policy_stable = True
for state in states:
old = pi[state].copy()
# Greedify policy for state
greedify_policy(state,pi,pi1,gamma,valueMap)
if not np.array_equal(pi[state], old):
policy_stable = False
print(pi)
print(pi1)
return pi, pi1, policy_stable
In [11]:
def policy_iteration(gamma, theta):
valueMap = np.zeros((gridSize, gridSize))
pi = np.ones((gridSize,gridSize))/4
pi1 = np.chararray((gridSize, gridSize))
pi1[:] = 'a'
policy_stable = False
print("here")
while not policy_stable:
valueMap = policy_evaluate(states,actions,gamma,valueMap)
pi,pi1, policy_stable = improve_policy(pi,pi1, gamma,valueMap)
return valueMap, pi,pi1
In [12]:
theta=0.1
valueMap, pi,pi1 = policy_iteration(gamma, theta)
[[0. 3. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 1.]
[0. 0. 2. 0.]]
[[b'u' b'l' b'u' b'u']
[b'u' b'u' b'u' b'u']
[b'u' b'u' b'u' b'd']
[b'u' b'u' b'r' b'u']]
[[0. 3. 3. 0.]
[0. 0. 0. 1.]
[0. 0. 1. 1.]
[0. 2. 2. 0.]]
[[b'u' b'l' b'l' b'u']
[b'u' b'u' b'u' b'd']
[b'u' b'u' b'd' b'd']
[b'u' b'r' b'r' b'u']]
[[0. 3. 3. 1.]
[0. 0. 1. 1.]
[0. 0. 1. 1.]
[0. 2. 2. 0.]]
[[b'u' b'l' b'l' b'd']
[b'u' b'u' b'd' b'd']
[b'u' b'u' b'd' b'd']
[b'u' b'r' b'r' b'u']]
[[0. 3. 3. 1.]
[0. 0. 1. 1.]
[0. 0. 1. 1.]
[0. 2. 2. 0.]]
[[b'u' b'l' b'l' b'd']
[b'u' b'u' b'd' b'd']
[b'u' b'u' b'd' b'd']
[b'u' b'r' b'r' b'u']]
EXPERIMENT NO : 6
AIM : Write a program to do reinforcement learning in a grid world
output:
The valueMap shows the optimal path from any state

Lab Manual Ocs351 Artificial Intelligence and Machine Learning Fundamentals
No ratings yet
Lab Manual Ocs351 Artificial Intelligence and Machine Learning Fundamentals
41 pages
Test 01
No ratings yet
Test 01
6 pages
AI Lab Student Sample File With Pages Removed
No ratings yet
AI Lab Student Sample File With Pages Removed
14 pages
ADSA Experiment - No3.1
No ratings yet
ADSA Experiment - No3.1
9 pages
Lab 3 Sol
No ratings yet
Lab 3 Sol
5 pages
Graphs Trees Algorithms
No ratings yet
Graphs Trees Algorithms
11 pages
20BCS5977_DAA LAB WORKSHEET 3.1
No ratings yet
20BCS5977_DAA LAB WORKSHEET 3.1
4 pages
Exp NO 8
No ratings yet
Exp NO 8
6 pages
Task 5
No ratings yet
Task 5
9 pages
DSA Lab-9: - Arjav Kanadia IEC2020101
No ratings yet
DSA Lab-9: - Arjav Kanadia IEC2020101
17 pages
Full Added and Subtractor
No ratings yet
Full Added and Subtractor
10 pages
Cpunit Iv
No ratings yet
Cpunit Iv
41 pages
Lab Sheet 7-1
No ratings yet
Lab Sheet 7-1
3 pages
DFS Algorithm
No ratings yet
DFS Algorithm
7 pages
Data Struct Assgn19
No ratings yet
Data Struct Assgn19
25 pages
Ai 1 To 4 Final With Algo
No ratings yet
Ai 1 To 4 Final With Algo
12 pages
DFS (1)
No ratings yet
DFS (1)
8 pages
BFS,DFS
No ratings yet
BFS,DFS
6 pages
CS3491 ALML LAB MANUAL - Master (1) (1)
No ratings yet
CS3491 ALML LAB MANUAL - Master (1) (1)
38 pages
AI 0ML-1 (1) Merged
No ratings yet
AI 0ML-1 (1) Merged
47 pages
Assignment 5
No ratings yet
Assignment 5
64 pages
BFS and DFS
No ratings yet
BFS and DFS
18 pages
Cs-3491 Lab Manual
No ratings yet
Cs-3491 Lab Manual
44 pages
Aiml Lab New Updated
No ratings yet
Aiml Lab New Updated
42 pages
AI RECORD MANUAL
No ratings yet
AI RECORD MANUAL
88 pages
AI_Lab_08
No ratings yet
AI_Lab_08
6 pages
238801235 - Sayed Mizanur Rahman(project-7).pdf
No ratings yet
238801235 - Sayed Mizanur Rahman(project-7).pdf
9 pages
Alg Lab Manual4,5,6
No ratings yet
Alg Lab Manual4,5,6
6 pages
dsa 18to20
No ratings yet
dsa 18to20
10 pages
RubickMidTSA3 1
No ratings yet
RubickMidTSA3 1
16 pages
Cascasc
No ratings yet
Cascasc
18 pages
Document 1
No ratings yet
Document 1
106 pages
Assignment No-3
No ratings yet
Assignment No-3
8 pages
Graph
No ratings yet
Graph
7 pages
(BFS) Breadth First Search Is A Traversal Technique in Which We Traverse All The Nodes of The Graph in A Breadth-Wise Motion. in BFS, We Traverse
No ratings yet
(BFS) Breadth First Search Is A Traversal Technique in Which We Traverse All The Nodes of The Graph in A Breadth-Wise Motion. in BFS, We Traverse
10 pages
MMTE-001 P 1
No ratings yet
MMTE-001 P 1
35 pages
Unit-2
No ratings yet
Unit-2
17 pages
Dsa Report
No ratings yet
Dsa Report
17 pages
Krishna 3.3
No ratings yet
Krishna 3.3
4 pages
Lecture 7
No ratings yet
Lecture 7
26 pages
Experiment 9
No ratings yet
Experiment 9
5 pages
Binary Search Tree With Tree Tranversal Techniques: EX - NO:9 Date
No ratings yet
Binary Search Tree With Tree Tranversal Techniques: EX - NO:9 Date
19 pages
AIML Lab manual final
No ratings yet
AIML Lab manual final
43 pages
Daa Lab Manual
No ratings yet
Daa Lab Manual
48 pages
Daa Practical File Prabhjot
No ratings yet
Daa Practical File Prabhjot
38 pages
Experiment Name_graph
No ratings yet
Experiment Name_graph
4 pages
Ai Lab Manual 202P
No ratings yet
Ai Lab Manual 202P
23 pages
AI LAB
No ratings yet
AI LAB
68 pages
Algorithm BFS DFS
No ratings yet
Algorithm BFS DFS
5 pages
Nabil Mohammed
No ratings yet
Nabil Mohammed
10 pages
ADA Lab Manual
No ratings yet
ADA Lab Manual
28 pages
Dijkstra Program
No ratings yet
Dijkstra Program
6 pages
Ai Assign
No ratings yet
Ai Assign
44 pages
EX. No:1 Implementation of Graph Search Algorithms
No ratings yet
EX. No:1 Implementation of Graph Search Algorithms
55 pages
Assignment 2: Chandramouli Sharma 10CO21 Vivek Harilal Bapodra 10CO101
No ratings yet
Assignment 2: Chandramouli Sharma 10CO21 Vivek Harilal Bapodra 10CO101
17 pages
Assignment No-2
No ratings yet
Assignment No-2
8 pages
Graph Algorithms_ A Comprehensive Guide
No ratings yet
Graph Algorithms_ A Comprehensive Guide
23 pages
Ai LP-II Lab Manual
No ratings yet
Ai LP-II Lab Manual
42 pages
Nebil Cover
No ratings yet
Nebil Cover
10 pages
A Star: Fundamentals and Applications
From Everand
A Star: Fundamentals and Applications
Fouad Sabry
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Fidji - 1870-1872 - Fiji Times Express Stamps
No ratings yet
Fidji - 1870-1872 - Fiji Times Express Stamps
4 pages
Holloway & Biley 2011 - Being a qualitative researcher
100% (1)
Holloway & Biley 2011 - Being a qualitative researcher
8 pages
Scotland The Making and Unmaking of The Nation C 1100 1707 Vol 1 The Scottish Nation Origins To C 1500 1st Edition Bob Harris
100% (5)
Scotland The Making and Unmaking of The Nation C 1100 1707 Vol 1 The Scottish Nation Origins To C 1500 1st Edition Bob Harris
84 pages
Notes: 2.3.2 Navigating in The File System
No ratings yet
Notes: 2.3.2 Navigating in The File System
6 pages
Lesson 1
No ratings yet
Lesson 1
5 pages
Lesson 15 - Writing Emails
No ratings yet
Lesson 15 - Writing Emails
48 pages
RICEFW in 2022 Quick Tips - Saptutorials - in
No ratings yet
RICEFW in 2022 Quick Tips - Saptutorials - in
11 pages
Maths Project File Term 2
No ratings yet
Maths Project File Term 2
19 pages
IlDikko-CharlesJuniortheChimpXS
No ratings yet
IlDikko-CharlesJuniortheChimpXS
17 pages
Mobile Application Development Lab Manual
No ratings yet
Mobile Application Development Lab Manual
60 pages
High Note L2 - Contents
No ratings yet
High Note L2 - Contents
2 pages
Pride & Prejudice Stage 6 - Answer Key
No ratings yet
Pride & Prejudice Stage 6 - Answer Key
3 pages
O Captain My Captain
100% (2)
O Captain My Captain
3 pages
A Song of Erin - Dunhill - ABRSM Grade 1 Piano Exam Syllabus 2025 & 2026
No ratings yet
A Song of Erin - Dunhill - ABRSM Grade 1 Piano Exam Syllabus 2025 & 2026
1 page
5 Maths Past Paper Mye 2018 19
No ratings yet
5 Maths Past Paper Mye 2018 19
14 pages
Game IPA With Answer
No ratings yet
Game IPA With Answer
6 pages
Baitap Nguphap - Updated 2020
No ratings yet
Baitap Nguphap - Updated 2020
91 pages
3.2: Solving Linear Systems With Two Variables: The Substitution Method
No ratings yet
3.2: Solving Linear Systems With Two Variables: The Substitution Method
14 pages
tag question practice sheet
No ratings yet
tag question practice sheet
2 pages
Install Catalina On VIrtualBox
No ratings yet
Install Catalina On VIrtualBox
17 pages
Docusign DOCUMENT AUTOMATION CHEAT SHEET
No ratings yet
Docusign DOCUMENT AUTOMATION CHEAT SHEET
6 pages
Christology ScribD
No ratings yet
Christology ScribD
9 pages
1 TB
0% (1)
1 TB
64 pages
Report - RFid Based Security System
50% (6)
Report - RFid Based Security System
46 pages
Dust of Snow Summary
No ratings yet
Dust of Snow Summary
5 pages
Articulation of Vowels
100% (1)
Articulation of Vowels
6 pages
"Reading The Image" Art Appreciation
100% (1)
"Reading The Image" Art Appreciation
11 pages
Denotation, Connotation, and Imagery
No ratings yet
Denotation, Connotation, and Imagery
23 pages
Lesson Plan Gr 7 Creative Arts Dance T1 W 1, 2
No ratings yet
Lesson Plan Gr 7 Creative Arts Dance T1 W 1, 2
3 pages

AI Lab

Uploaded by

AI Lab

Uploaded by

ST.

5. Write a programme to run 29 – 33

// Driver program to test methods of graph class

cout << "Following is Breadth First Traversal "

AIM : Write a program to conduct uniformed and informed search

cout << "Following is Depth First Traversal"

// Returns the optimal value a maximizer can obtain.

// Terminating condition. i.e

// If current move is maximizer,

// find the maximum attainable

return max(minimax(depth+1, nodeIndex*2, false, scores, h),

minimax(depth+1, nodeIndex*2 + 1, false, scores, h));

// Else (If current move is Minimizer), find the minimum

return min(minimax(depth+1, nodeIndex*2, true, scores, h),

minimax(depth+1, nodeIndex*2 + 1, true, scores, h));

// A utility function to find Log n in base 2

return (n==1)? 0 : 1 + log2(n/2);

// The number of elements in scores must be

int scores[] = {3, 5, 2, 9, 12, 5, 23, 23};

int res = minimax(0, 0, true, scores, h);

AIM : Write a program to conduct game search

The optimal value is: 12

Electricity failure Computer

-T wave abnormality (T wave inversions and/or ST

12. ca = number of major vessels (0-3) colored by flourosopy

CODE FOR VALUE ITERATION :

def findGreedyPolicy(values, gridWorld, gameLogic, gamma = 1):

THEORY : Reinforcement Learning (RL) involves decision making

This can be written as

These are Bellman’s equation for the state value function

1a. Bellman Update

# Compute the best action in each state

You might also like