0% found this document useful (0 votes)
3 views

unit 3

Uploaded by

udaybhole3104
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

unit 3

Uploaded by

udaybhole3104
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

Syllabus

• Game Theory, Optimal Decisions in Games,


Heuristic Alpha–Beta Tree Search, Monte
Carlo Tree Search, Stochastic Games, Partially
Observable Games, Limitations of Game
Search Algorithms, Constraint Satisfaction
Problems (CSP), Constraint Propagation:
Inference in CSPs, Backtracking Search for
CSPs
• Examples of CSPs
• Sudoku: Filling a 9×9 grid with digits so that each
row, column, and 3×3 subgrid contains all digits
from 1 to 9 without repetition.
• Map Coloring: Coloring a map with a limited
number of colors so that no adjacent regions
share the same color.
• N-Queens: Placing N queens on an N×N
chessboard so that no two queens threaten each
other.
Unit 3 : Adversarial Search and Game
Theory
• Adversarial search is a search, where we examine the problem
which arises when we try to plan ahead of the world and other
agents are planning against us.

• But, there might be some situations where more than one agent is
searching for the solution in the same search space, and this
situation usually occurs in game playing.

• The environment with more than one agent is termed as multi-agent


environment, in which each agent is an opponent of other agent and
playing against each other. Each agent needs to consider the action
of other agent and effect of that action on their performance.

• So, Searches in which two or more players with conflicting
goals are trying to explore the same search space for the
solution, are called adversarial searches, often known as
Games.
• Games are modeled as a Search problem and heuristic
evaluation function, and these are the two main factors which
help to model and solve games in AI.
Game Theory
• Game theory is basically a branch of mathematics that is
used to typical strategic interaction between different
players (agents), all of which are equally rational, in a
context with predefined rules (of playing or maneuvering)
and outcomes.
• Every player or agent is a rational entity who is selfish and
tries to maximize the reward to be obtained using a
particular strategy.
• All the players abide by certain rules in order to receive a
predefined playoff- a reward after a certain outcome.
• Hence, a GAME can be defined as a set of players, actions,
strategies, and a final playoff for which all the players are
competing.
• Perfect information: A game with the perfect
information is that in which agents can look into
the complete board. Agents have all the
information about the game, and they can see
each other moves also. Examples are Chess,
Checkers, Go, etc.
• Imperfect information: If in a game agents do not
have all information about the game and not
aware with what's going on, such type of games
are called the game with imperfect information,
such as tic-tac-toe, Battleship, blind, Bridge, etc.
• Deterministic games: Deterministic games are those
games which follow a strict pattern and set of rules for
the games, and there is no randomness associated with
them. Examples are chess, Checkers, Go, tic-tac-toe,
etc.
• Non-deterministic games: Non-deterministic are those
games which have various unpredictable events and
has a factor of chance or luck. This factor of chance or
luck is introduced by either dice or cards. These are
random, and each action response is not fixed. Such
games are also called as stochastic games.
Example: Backgammon, Monopoly, Poker, etc.
Significance of game theory in AI

• The significance of game theory in AI lies in its


ability to model complex decision-making
scenarios, enabling AI systems to strategize and
optimize their actions based on the potential
decisions of other entities.
• In the context of AI, game theory serves as a
fundamental tool for understanding and
predicting behavior in competitive or cooperative
environments, allowing AI to adapt its strategies
to achieve optimal outcomes in diverse settings.
Optimal Decision Making in Games

• Let us start with games with two players,


whom we’ll refer to as MAX and MIN for
obvious reasons. MAX is the first to move, and
then they take turns until the game is finished.
• At the conclusion of the game, the victorious
player receives points, while the loser receives
penalties.
• A game can be formalized as a type of search
problem that has the following elements:
Key elements in game theory
• S0: The initial state, which specifies how the game is set up at the start.
• PLAYER(s): Defines which player has the move in a state.
• ACTIONS(s): Returns the set of legal moves in a state.
• RESULT(s, a): The transition model, which defines the result of a move.
• TERMINAL-TEST(s): A terminal test, which is true when the game is over
and false TERMINAL STATES otherwise. States where the game has ended
are called terminal states.
• UTILITY(s, p): A utility function (also called an objective function or payoff
function), defines the final numeric value for a game that ends in terminal state
s for a player p.
In chess, the outcome is a win, loss, or draw, with values +1, 0, or 1 2 .
Some games have a wider variety of possible outcomes; the payoffs in
backgammon range from 0 to +192.
A zero-sum game is (confusingly) defined as one where the total payoff to all
players is the same for every instance of the game.
• The financial gains of one party that cause an
equal amount of loss for the other party. The
net change in wealth in these situations is zero,
meaning the loss of one party is beneficial to
another party.
• The initial state, ACTIONS function, and
RESULT function define the game tree for the
game—a tree where the nodes are game states
and the edges are moves.
Game’s Utility Function

• The optimal strategy can be found from the minimax


value of each node, which we express as MINIMAX,
given a game tree (n).
• Assuming that both players play optimally from there
through the finish of the game, the utility (for MAX) of
being in the corresponding state is the node’s minimax
value.
• The usefulness of a terminal state is obviously its
minimax value.
• Furthermore, if given the option, MAX prefers to shift
to a maximum value state, whereas MIN wants to move
to a minimum value state. So here’s what we’ve got:
• Minimax is a kind of backtracking algorithm that is
used in decision making and game theory to find the
optimal move for a player, assuming that your
opponent also plays optimally.
It is widely used in two player turn-based games such as
Tic-Tac-Toe, Backgammon, Mancala, Chess, etc.
In Minimax the two players are called maximizer and
minimizer.
The maximizer tries to get the highest score possible
while the minimizer tries to do the opposite and get the
lowest score possible.
• Time complexity : O(b^d) b is the branching
factor and d is count of depth or ply of graph
or tree.
• Space Complexity : O(bd) where b is
branching factor into d is maximum depth of
tree similar to DFS.
• Let’s use these definitions to analyze the game tree shown in the figure
above.
• The game’s UTILITY function provides utility values to the terminal nodes
on the bottom level.
• Because the first MIN node, B, has three successor states with values of 3,
12, and 8, its minimax value is 3.
• Minimax value 2 is also used by the other two MIN nodes.
• The root node is a MAX node, with minimax values of 3, 2, and 2,
resulting in a minimax value of 3.
• We can also find the root of the minimax decision: action a1 is the best
option for MAX since it leads to the highest minimax value.

• This concept of optimal MAX play requires that MIN plays optimally as
well—it maximizes MAX’s worst-case outcome.
• What happens if MIN isn’t performing at its best?
• Then it’s a simple matter of demonstrating that MAX can perform even
better. Other strategies may outperform the minimax method against
suboptimal opponents, but they will always outperform optimal opponents.

Heuristic alpha beta tree
search(Alpha-Beta Pruning)
• Alpha-Beta pruning is not actually a new algorithm, but
rather an optimization technique for the minimax
algorithm.
• It reduces the computation time by a huge factor. This
allows us to search much faster and even go into
deeper levels in the game tree.
• It cuts off branches in the game tree which need not be
searched because there already exists a better move
available.
• It is called Alpha-Beta pruning because it passes 2 extra
parameters in the minimax function, namely alpha and
beta.
• Let’s define the parameters alpha and beta.
• Alpha is the best value that
the maximizer currently can guarantee at that
level or above.
Beta is the best value that
the minimizer currently can guarantee at that
level or below.
• While backtracking ,node values are passed to
upper nodes instead of alpha ,beta values
• We will only pass alpha ,beta values to child
nodes.
So far this is how our game tree looks. The 9 is crossed
out because it was never computed.
MCTS
• It is a probabilistic and heuristic driven search algorithm
that combines the classic tree search implementations
alongside machine learning principles of reinforcement
learning.
In tree search, there’s always the possibility that the current
best action is actually not the most optimal action.
In such cases, MCTS algorithm becomes useful as it continues
to evaluate other alternatives periodically during the learning
phase by executing them, instead of the current perceived
optimal strategy. This is known as the ” exploration-
exploitation trade-off “. (The exploration-exploitation trade-
off is a dilemma that arises when choosing between exploring
new options and exploiting existing knowledge.)
Here are some reasons why MCTS is
commonly used:
• Handling Complex and Strategic Games:
• Unknown or Imperfect Information
• Optimizing Exploration and Exploitation
• Scalability and Parallelization:
• Monte Carlo Tree Search (MCTS) algorithm:
In MCTS, nodes are the building blocks of the
search tree.
• These nodes are formed based on the outcome of
a number of simulations.
• The process of Monte Carlo Tree Search can be
broken down into four distinct steps, viz.,
selection, expansion, simulation and
backpropagation.
• Each of these steps is explained in details below:
Selection
• In this process, the MCTS algorithm traverses the
current tree from the root node using a specific strategy.
• The strategy uses an evaluation function to optimally
select nodes with the highest estimated value.
• MCTS uses the Upper Confidence Bound (UCB)
formula applied to trees as the strategy in the selection
process to traverse the tree.
• It balances the exploration-exploitation trade-off.
During tree traversal, a node is selected based on some
parameters that return the maximum value. The
parameters are characterized by the formula that is
typically used for this purpose is given below.
• Expansion: In this process, a new child node is added to the
tree to that node which was optimally reached during the
selection process.
• Simulation: In this process, a simulation is performed by
choosing moves or strategies until a result or predefined
state is achieved.
• Backpropagation: After determining the value of the
newly added node, the remaining tree must be updated. So,
the backpropagation process is performed, where it
backpropagates from the new node to the root node. During
the process, the number of simulation stored in each node is
incremented. Also, if the new node’s simulation results in a
win, then the number of wins is also incremented.
Stochastic Games
• Stochastic games represent dynamic
interactions in which the environment changes
in response to players' behavior.
• Scientist Shapley says, "In a stochastic game
the play proceeds by steps from position to
position, according to transition probabilities
controlled jointly by the two players"
• A stochastic game is played by a set of players. In
each stage of the game, the play is in a given state
(or position, in Shapley's language), taken from a
set of states, and every player chooses an action
from a set of available actions.
• The collection of actions that the players choose,
together with the current state, determine the
stage payoff that each player receives, as well as
determines probability distribution according to
which the new state that the player will visit.
https://youtu.be/xXE5AwzNQ2s?si=Yo
O2HN-PnCvay8ea
• Partially Observable Games, often referred to as
Partially Observable Markov Decision Processes
(POMDPs), are a class of problems and models in
artificial intelligence that involve decision-making in
situations where an agent's observations do not provide
complete information about the underlying state of the
environment.
• POMDPs are an extension of Markov Decision
Processes (MDPs) to scenarios where uncertainty and
partial observability are significant factors. They are
commonly used to model and solve problems in various
domains, including robotics, healthcare, finance, and
game playing.
Key Characteristics of Partially
Observable Games (POMDPs):

Partial Observability: In POMDPs, the agent's observations are
incomplete and do not directly reveal the true state of the
environment. This introduces uncertainty, as the agent must reason
about the possible states given its observations.
Hidden States: The environment's true state, also known as the
hidden state, evolves according to a probabilistic process. The
agent's observations provide noisy or incomplete information about
this hidden state.
Belief State: To handle partial observability, the agent maintains a
belief state, which is a probability distribution over possible hidden
states. The belief state captures the agent's uncertainty about the true
state of the environment.
• Action and Observation: The agent takes actions based on its
belief state, and it receives observations that depend on the hidden
state. These observations help the agent update its belief state and
make decisions.
Objective and Policy: The agent's goal is to find a policy—a
mapping from belief states to actions—that maximizes a specific
objective, such as cumulative rewards or long-term expected utility.
Solving Partially Observable Games (POMDPs):
Solving POMDPs is challenging due to the added complexity of
partial observability. Traditional techniques used for MDPs, such as
dynamic programming and value iteration, are not directly
applicable to POMDPs. Instead, specialized algorithms and
techniques are developed to address the partial observability:
• Belief Space Methods: These methods work directly in the space of belief
states and involve updating beliefs based on observations and actions.
Techniques like the POMDP forward algorithm and backward induction are
used to compute optimal policies.

Particle Filtering: Particle filters are used to maintain an approximation of


the belief state using a set of particles, each representing a possible state
hypothesis.

Point-Based Methods: These methods focus on selecting a subset of belief


states (points) that are critical for decision-making. Techniques like PBVI
(Point-Based Value Iteration) and POMCP (Partially Observable Monte
Carlo Planning) fall under this category.

Approximate Solutions: Due to the complexity of exact solutions,


approximate methods such as online planning, heuristic-based policies, and
reinforcement learning techniques are often employed to find near-optimal
solutions.
Applications of Partially Observable
Games:

Partially Observable Games have numerous real-world applications, including:


Robotics: Robot navigation, exploration, and manipulation tasks in uncertain and
partially observable environments.
Healthcare: Optimal patient treatment scheduling and management under uncertainty.
Financial Planning: Portfolio optimization, trading, and risk management in financial
markets.
Game Playing: Modeling opponents in games with hidden information, such as poker
and strategic board games.
Partially Observable Games (POMDPs) are a powerful framework for modeling
decision-making under uncertainty and partial observability. They provide a way to
represent and solve problems where agents must reason about hidden states and make
optimal decisions based on incomplete observations.
Limitation of game search Algorithm
• Because calculating optimal decisions in
complex games is intractable (hard to deal
with), all game search algorithms must make
assumptions and approximations.
• Alpha-beta search uses the heuristic evaluation
function as approximation.
• Monte Carlo search computes an approximate
average over a random selection of playouts.
• The choice of which algorithm to use depends in
part on the features of each game. That is to say,
each game may select different options in game
search. When the branching factor is high or it is
difficult to define an evaluation function, Monte
Carlo search is preferred. But both algorithms
suffer from fundamental limitations.
• For alpha-beta search:
• First limitation of alpha-beta search is its
vulnerability to errors in the heuristic function.
• Second limitation of alpha-beta search is that they are
designed to calculate the values of legal moves.
However, sometimes there is one move that is
obviously best. In that case, there is no reason to waste
computation time to figure out the value of the move —
it is better to make the move.
• Third limitation of alpha-beta search is it do all their
reasoning at the level of individual moves. That is to
say, human play games differently than machine
learning.
• Four limitation of alpha-beta search is the ability to
incorporate machine learning into game search process.
• For Monte Carlo search:
• First limitation of Monte Carlo search is that they are
designed to calculate the values of legal moves. However,
sometimes there is one move that is obviously best. In that
case, there is no reason to waste computation time to figure
out the value of the move — it is better to make the move.

• Second limitation of Monte Carlo search is it do all their


reasoning at the level of individual moves. That is to say,
human play games differently than machine learning.
• Third limitation of Monte Carlo search is the ability to
incorporate machine learning into game search process.
Constraint Satisfaction Problems
(CSP) in Artificial Intelligence
• Finding a solution that meets a set of
constraints is the goal of constraint satisfaction
problems (CSPs), a type of AI issue.
• Finding values for a group of variables that
fulfill a set of restrictions or rules is the aim of
constraint satisfaction problems.
• For tasks including resource allocation,
planning, scheduling, and decision-making,
CSPs are frequently employed in AI.
• There are mainly three basic components in
the constraint satisfaction problem:
• Variables: The things that need to be determined
are variables. Variables in a CSP are the objects
that must have values assigned to them in order to
satisfy a particular set of constraints. Boolean,
integer, and categorical variables are just a few
examples of the various types of variables. for
instance, could stand in for the many puzzle cells
that need to be filled with numbers in a sudoku
puzzle.
• Domains: The range of potential values that a
variable can have is represented by domains.
Depending on the issue, a domain may be
finite or limitless. For instance, in Sudoku, the
set of numbers from 1 to 9 can serve as the
domain of a variable representing a problem
cell.
• Constraints: The guidelines that control how
variables relate to one another are known as
constraints. Constraints in a CSP define the ranges
of possible values for variables. Unary
constraints, binary constraints, and higher-order
constraints are only a few examples of the various
sorts of constraints. For instance, in a sudoku
problem, the restrictions might be that each row,
column, and 3×3 box can only have one instance
of each number from 1 to 9.
Constraint Propagation in AI

• Introduction to Constraint Propagation


• Constraint propagation is a fundamental
concept in constraint satisfaction problems
(CSPs). A CSP involves variables that must be
assigned values from a given domain while
satisfying a set of constraints. Constraint
propagation aims to simplify these problems
by reducing the domains of variables, thereby
making the search for solutions more efficient.
• Key Concepts
• Variables: Elements that need to be assigned values.
• Domains: Possible values that can be assigned to the
variables.
• Constraints: Rules that define permissible combinations of
values for the variables.
• How Constraint Propagation Works
• Constraint propagation works by iteratively narrowing
down the domains of variables based on the constraints.
This process continues until no more values can be
eliminated from any domain. The primary goal is to reduce
the search space and make it easier to find a solution.
• Steps in Constraint Propagation
• Initialization: Start with the initial domains of
all variables.
• Propagation: Apply constraints to reduce the
domains of variables.
• Iteration: Repeat the propagation step until a
stable state is reached, where no further
reduction is possible.
• Scheduling
• In scheduling problems, tasks must be assigned to time slots without
conflicts. Constraint propagation helps by reducing the possible time slots
for each task based on constraints like availability and dependencies.
• Planning
• AI planning involves creating a sequence of actions to achieve a goal.
Constraint propagation simplifies the planning process by reducing the
possible actions at each step, ensuring that the resulting plan satisfies all
constraints.
• Resource Allocation
• In resource allocation problems, resources must be assigned to tasks in a
way that meets all constraints, such as capacity limits and priority rules.
Constraint propagation helps by narrowing down the possible assignments,
making the search for an optimal allocation more efficient.
constraint Satisfaction Problems (CSP)
algorithms:
• The backtracking algorithm is a depth-first search
algorithm that methodically investigates the search space
of potential solutions up until a solution is discovered that
satisfies all the restrictions.
• The method begins by choosing a variable and giving it a
value before repeatedly attempting to give values to the
other variables.
• The method returns to the prior variable and tries a
different value if at any time a variable cannot be given a
value that fulfills the requirements.
• Once all assignments have been tried or a solution that
satisfies all constraints has been discovered, the algorithm
ends.
• Steps in Backtracking
• Initialization: Start with an empty assignment.
• Selection: Choose an unassigned variable.
• Assignment: Assign a value to the chosen variable.
• Consistency Check: Check if the current assignment is
consistent with the constraints.
• Recursion: If the assignment is consistent, recursively try
to assign values to the remaining variables.
• Backtrack: If the assignment is not consistent, or if further
assignments do not lead to a solution, undo the last
assignment (backtrack) and try the next possible value.
• Role of Backtracking in Solving CSPs
• Advantages
• Simplicity: The algorithm is easy to implement
and understand.
• Effectiveness: It works well for many practical
CSPs, especially when combined with heuristics.
• Flexibility: Can be adapted and optimized with
various strategies like variable ordering and
constraint propagation.
• Step 1: Define "is_safe" function
• This function checks if it's safe to place a
queen at the position board[row][col].
• Step 2: Define the solve_n_queens Function
• This function attempts to solve the N-Queens
problem by placing queens one column at a
time.
• It uses recursion to place queens and
backtracks if a solution cannot be found.
• Step 3: Define the print_board Function
• This function prints the board configuration
with queens placed.
• Step 4: Define the n_queens Function
• This function initializes the board and calls
the solve_n_queens function to solve the
problem.
• If a solution is found, it prints the board.
Otherwise, it indicates that no solution exists.
Issues in Solving CSP Efficiently

• Exponential Complexity: The search space for CSPs grows


exponentially with the number of variables, making it
computationally expensive.
• Constraint Propagation Overhead: The overhead of
enforcing constraints and maintaining consistency can be
significant, impacting efficiency.
• Inefficient Variable and Value Ordering: Poor choices in
variable and value ordering can lead to unnecessary search
and backtracking.
• Symmetry: Symmetry in the problem can lead to
redundant search, increasing computational burden.
• Large Domains: CSPs with large domains can make it
challenging to find feasible solutions efficiently.
Solutions To solve Problem

• Constraint Propagation Techniques: Utilize techniques


like arc consistency (AC-3, AC-4) and path consistency
to reduce the search space by enforcing constraints
more efficiently.
• Variable and Value Ordering Heuristics: Implement
intelligent heuristics like Minimum Remaining Values
(MRV), Degree Heuristic, and Least Constraining Value
(LCV) to guide the search process effectively.
• Symmetry Breaking: Apply symmetry-breaking
constraints to eliminate redundant search paths and
reduce the overall search space.
• Domain Reduction Techniques: Use domain
reduction techniques like forward checking and
constraint propagation to prune the search space
and focus on promising regions.
• Intelligent Search Algorithms: Employ advanced
search algorithms like backjumping, constraint
learning, and intelligent variable selection to
minimize unnecessary search and backtracking.
• By addressing these issues through the suggested
solutions, the efficiency of solving CSPs can be
significantly improved.
What are Heuristic Functions?

• Heuristic functions are strategies or methods that


guide the search process in AI algorithms by providing
estimates of the most promising path to a solution.
They are often used in scenarios where finding an exact
solution is computationally infeasible. Instead,
heuristics provide a practical approach by narrowing
down the search space, leading to faster and more
efficient problem-solving.
• Heuristic functions transform complex problems into
more manageable subproblems by providing estimates
that guide the search process. This approach is
particularly effective in AI planning, where the goal is
to sequence actions that lead to a desired outcome.
Search Algorithm

• Search algorithms are fundamental to AI,


enabling systems to navigate through problem
spaces to find solutions. These algorithms can be
classified into uninformed (blind) and informed
(heuristic) searches. Uninformed search
algorithms, such as breadth-first and depth-first
search, do not have additional information about
the goal state beyond the problem definition. In
contrast, informed search algorithms use
heuristic functions to estimate the cost of
reaching the goal, significantly improving search
efficiency
• A* Algorithm
• The A* algorithm is one of the most widely
used heuristic search algorithms. It uses both
the actual cost from the start node to the
current node (g(n)) and the estimated cost
from the current node to the goal (h(n)). The
total estimated cost (f(n)) is the sum of these
two values:
• f(n)=g(n)+h(n)
• Greedy Best-First Search
• The Greedy Best-First Search algorithm selects
the path that appears to be the most
promising based on the heuristic function
alone. It prioritizes nodes with the lowest
heuristic cost (h(n)), but it does not
necessarily guarantee the shortest path to the
goal
• Hill-Climbing Algorithm
• The Hill-Climbing algorithm is a local search
algorithm that continuously moves towards
the neighbor with the lowest heuristic cost. It
resembles climbing uphill towards the goal
but can get stuck in local optima.
What is constraint
• The guidelines that control how variables
relate to one another are known as
constraints.
• Constraints in a CSP define the ranges of
possible values for variables.
• Unary constraints, binary constraints, and
higher-order constraints are only a few
examples of the various sorts of constraints.
• Types of Constraints in CSP
• Several types of constraints can be used in a Constraint
satisfaction problem in artificial intelligence, including:
• Unary Constraints:
A unary constraint is a constraint on a single variable.
For example, Variable A not equal to “Red”.
• Binary Constraints:
A binary constraint involves two variables and specifies
a constraint on their values. For example, a constraint
that two tasks cannot be scheduled at the same time
would be a binary constraint.
• Global Constraints:
Global constraints involve more than two
variables and specify complex relationships
between them. For example, a constraint that
no two tasks can be scheduled at the same
time if they require the same resource would
be a global constraint.

You might also like