0% found this document useful (0 votes)
9 views57 pages

Artificial Intelligence

The document discusses the foundational concepts of artificial intelligence, focusing on rational agents and their interactions with environments. It outlines the characteristics of intelligent agents, including their perceptual inputs and actions, and introduces the PEAS framework for specifying task environments. Additionally, it categorizes task environments based on various dimensions and describes different types of agent programs, including reflex, model-based, goal-based, utility-based, and learning agents.

Uploaded by

spankymagic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views57 pages

Artificial Intelligence

The document discusses the foundational concepts of artificial intelligence, focusing on rational agents and their interactions with environments. It outlines the characteristics of intelligent agents, including their perceptual inputs and actions, and introduces the PEAS framework for specifying task environments. Additionally, it categorizes task environments based on various dimensions and describes different types of agent programs, including reflex, model-based, goal-based, utility-based, and learning agents.

Uploaded by

spankymagic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 57

Artificial Intelligence - Stuart

Russel
1 Basic concepts
Agent – something that acts

Rational agent

 One that acts so as to achieve the best outcome


 Or when there is uncertainty – the best expected outcome

2 Intelligent agents
2.1 Agents and Environments
Agent – anything that can be viewed as

1. perceiving its environment – though sensors –


2. and acting on its environment – through actuators

Percept – an agents perceptual inputs at any given instant

Percept sequence – the complete history of everything the agent has ever perceived

 Agents choice of action


 Can depend on

Choice of action at any given instant

 Can depend on the entire percept sequence observed to date


 But not on anything it hasn’t perceived

Agents behaviour

 Described by the agent function that maps any percept sequence to an


action
 If we can specify the agents choice of action for every possible percept
sequence – then have said everything there is to say about the agent

Agent function – abstract mathematical function

Agent program – concrete implementation – running within some system

But are the consequences of the agents behaviour?

 When an agent is put into an environment – it generates a sequence of


actions –
 According to the percepts it receives
 This sequence of actions – cause the environment to go through a
sequence of states
 If a sequence is desirable – then the agent has performed well
 Need to capture the notion of desirability using a performance measure
 That evaluates any given sequence of environmental states

NB – we consider environment states – not agent states

 If we define success in terms of an agents opinion of its own performance



 the agent could achieve perfect rationality simply by deluding itself that its
performance was perfect
 NB – derive performance measures appropriate for the circumstance

Design performance measures according to what you want in the environment

 Not how you think the agent should behave – NB


What is rational depends on four things

1. The performance measure that defines the criterion for success


2. The agents prior knowledge of the environment
3. The actions that the agent can perform
4. The agents percept sequence to date
Def of rational agent

 For each possible percept sequence


 A rational agent should select an action that is expected to maximize its
performance measure
 Given the evidence provided by the percept sequence
 And whatever built in knowledge the agent has

What is the performance measure?

What is known about the environment?

What sensors and actuators does the agent have?

Rationality maximizes expected performance

 Perfection maximizes actual performance

Rational choice depends only on percept sequence to date

Must not allow agent to inadvertently make under-intelligent activities

 Example – if don’t look both ways before crossing the road


 Percept sequence wont tell the agent that a car is coming from the left
 Need an informative percept sequence
 Need to do things to maximize expected performance

Can also do things to modify future percepts

 i.e. information gathering

Need to gather information

And to learn from what It perceives

Initial configuration – has prior knowledge

 As agent gains experience


 May modify and augment that knowledge

If know everything – don’t need to learn/perceive

 Just need to act correctly

Sphex is unable to learn that its plan is failing – and so does not change it.

To extent that the agent relies on prior knowledge of its designer – rather than its
own percepts – the agent lacks autonomy - NB

Rational agent should be autonomous – should learn what it can to compensate for
partial or incorrect prior knowledge.

Incorporation of learning allows to design a rational agent that will succeed in a vast
variety of environments.

Task environments – the problems to which rational agents are solutions

The flavour of the task environment directly affects the appropriate design for the
agent program

2.2 Specifying the task environment


PEAS – categorized using the PEAS framework

1. Performance measure
2. Environment
3. Actuators
4. Sensors

First step – specify the tasks environment as fully as possible

Example for automated taxi driver:


Agent Type Performance Environment Actuators Sensors
Measure

First Question – what is the performance measure to which we would like our agent
to aspire?

For an automated taxi driver:

 Getting to the correct destination


 Minimizing
- fuel consumption
- wear and tear
- trip time
- violations of traffic laws
 maximizing
- safety and passenger comfort
- profits

Q2 – what is the environment that the agent will face?

 E.g. roads
 Traffic, pedestrians, stray animals, road works

Actuators

 Control over engine through accelerator


 Control over steering and breaking

Sensors

Q – what actions can the agent take?

 Behavior
 What percepts does the environment generate?
 Performance measure?

Dimension of task environment determines the appropriate agent design – and


applicability of each of the principle families of techniques for agent implementation

Dynamic environments are continuously asking the agent what it wants to do

 If it hasn’t decided what it wants to do


 That counts as deciding to do nothing

Taxi driving is dynamic

 The other cars and the taxi itself keep moving


 While the driving algorithm deliberates what to do next

I need to be executing – while deliberating

 I should be executing last years plans

Behaviour of an agent – the action that is performed after any given sequence of
percepts

Basic variables that an Agent holds


 The constructor/init function
 The pointer to the program function – i.e. a program pointer variable
 The program function must take a percept variable as input

program function – specifies the actions of the agent based on its percept variable
2.3 Properties of task environments:
Want dimensions along which task environments can be categorized

2.3.1 1. Fully observable vs partially observable

Fully Observable - When agents sensors give it access to the complete state of the
environment at each point in time.

 i.e. if the sensors detect all aspects that are relevant to the choice of action
 relevance depends on the performance measure

Partially observable –

 sensors may be noisy or inaccurate (e.g. sensors just cant perceive


everything)
 or because parts of the state are missing from the sensor data

Unobservable –

 If the agent has no sensors at all


 Or if the sensors cant give any information about the environment

2.3.2 2. Deterministic vs Non-deterministic

Deterministic – if the next state of the environment is completely determined by the


current state and action executed by the agent – then environment is deterministic

Non-deterministic – next state of environment is not completely determined by the


state and the action executed by the agent in that state

Stochastic – if the model explicitly deals with probabilities

2.3.3 3. Episodic vs Sequential

Episodic – agents experience is divided into atomic episodes

 In each episode, agent receives a percept – then performs a single action


 And the next episode does not depend on the actions taken in previous
episodes

Sequential – the current decision could affect all future decisions

 E.g. chess

2.3.4 4. Static vs Dynamic

Dynamic – the environment can change while the agent is deliberating – we say the
environment is dynamic for the agent

 Continuously asking the agent what the agent wants to do


 If hasn’t yet decided – counts as deciding to do nothing

Static – cant change while the agent is deliberating

 Agent doesn’t have to keep looking at the world while it is deciding on an


action
 And also doesn’t need to worry about the passage of time

Semi-dynamic –

 Environment itself does not change with the passage of time


 But the agents performance score does

2.3.5 5. Discrete vs Continuous

Discrete vs continuous distinction applies to:

 the state of the environment


 the time is handled
 the percepts and actions of the agent

Discrete –
2.3.6 6. Known vs unknown

We call it a known (or unknown) environment but really it refers to the knowledge of
the agent (or the programmer/designers) about the laws of the environment

Known environment:

 outcomes (or outcome probabilities) for all actions are given

Unknown – agent will have to learn how it works in order to make good decisions

2.3.7 7. Single vs Multi-agent

When must we treat an object as an agent?

 Is object B’s behaviour best described as maximizing a performance


measure whose value depends on Agent A’s behaviour
 If yes, must treat it as an agent

Competitive – if B is trying to maximize its own performance measure, which


minimizes A’s performance measure

2.4 The structure of agents


Behaviour – the action that is performed after any given sequence of percepts

The job of AI – is to design the an agent program that implements the agent
function

Agent function – mapping from percepts to actions

 May depend on the entire percept history


e.g. of an agent function:

Agent architecture – the computing device, with physical sensors and actuators,
that the agent program will run on.

 architecture makes the percepts from the sensors available to the program
 runs the program
 feeds the program’s action choices to the actuators as they are generated

2.5 Agent Programs


Agent program – same structure throughout – take the current percept as input
from the sensors – return an action to the actuators.

 Only looks at current percept?


 No choice but to take just the current percept, because nothing more is
available from the environment
 If agents actions need to depend on the entire percept sequence – then the
agent will have to remember the percepts

2.5.1 1. Simple Reflex agent:

Select actions on the basis of the current percept – ignoring the rest of the percept
history.

Decision of what to do is based only on its current percept

 Need a mapping from percepts to actions


 i.e. its based on condition (percept) – action rules
General structure of a simple reflex agent:

 takes a percept as input


 maintains a static mapping – rules, a dict – set of condition action rules
 Interpret-input(percept) – interprets the input from the sensor (because
sensor could be some other data type, e.g. a distance gauge for collision
objects)
 We store the interpreted version in a variable called state
 We then use the RULE-MATCH(state, rules) – to return the correct rule for
dealing with that state / the first rule in the set of rules that matches the given
state description
 We then call the rules recommend action i.e. and store it inside a variable
called action
 Return the action the agent should take

Reflex behaviour – look at the scenario – do an action

INTERPRET-INPUT function

 Generates an abstracted description of the current state from the percept

Rules-and-matching conceptual – NB

 Can be replaced with different implementations


 Such as Neural networks
 So you need to know state – and just choose the action that must be carried
out
 Based on whichever technique
Limitation

 Can only make decisions on the basis of the current percept


 i.e. need to have a fully observable environment

2.5.2 2. Model Based reflex agents

To handle partially observable environments

 agent can keep track of the parts of the environment it cant see now
 i.e. agent maintains an internal state, that depends on the percept history,
 and thereby reflects at least some of the unobserved aspects of the current
state

Diagram

To update this internal state as time goes by – need two types of knowledge to be
encoded inside the agent program:

1. Information about how the world changes over time – divided into 2 equal
parts (Transition model)
a. the effects of the agents actions on the state of the environment
b. how the world evolves independently of the agent
2. Need information about how the state of the world is reflected in the agent’s
percepts – sensor model
a. What indicators in the environment does the agent perceive, that allows it
to conclude a certain state

Transition model and sensor model together allow the agent to keep track of the
state of the world

 Agent that uses these two models is called a model based agent

The idea:

 Agent maintains an internal state


 Then the agent receives its latest (current) percept
 This percept is combined with the old internal state –
 To generate the updated description of the current state
 Based on the agents model of how the world works

Mostly not possible to know exactly what the world is like

 So agent doesn’t keep track of it exactly


 Maintains a best guess

2.5.3 3. Goal based agent

Knowing something about current state is not always enough to decide what to do .
Agent needs some sort of goal information – that describes the situations that are
desirable

 Agent program can combine this with the model (of how the world works)
 To describe the situations that are desirable

Different to condition-action rules, because now asking:

1. What will happen if I do a specific sequence of actions


2. And will this make me happy?

Reflex agent

 Brakes when it sees a red light – because that’s what its condition-action rule
specifies
 It has no idea why

Goal based agent

 Brakes when it sees a red light


 Because that’s the only action that it predicts will achieve its goal of not
hitting other cars
2.5.4 4. Utility based agents

Goals alone – not enough to generate high quality behaviour – because only provide
a crude binary description between happy and unhappy states

You need a performance measure

 Because some actions are better than others


 Need to compare different world states according to exactly how happy that
would make the agent
 Use the world “utility” instead of happy

Performance measure – assigns a score to any given sequence of environment


states

 To distinguish between more or less desirable states

Agents utility function – an internalization of its performance measure

 Provided that the internal performance measure and the external


performance measure are in agreement
 Agent who chooses to maximize its internal performance measure will
maximize its external performance measure

Utility function needs to specify appropriate tradeoff – when goals conflict

Most environments are partially observable and non-deterministic

 Rational utility based agent chooses the action that maximizes the expected
utility of the action outcomes
 i.e. the utility the agent expects to derive, on average, given the probabilities
and utilities of each outcome

5. Learning Agents

In general – learning in an intelligent agent – can be summarized as a process of


modification of each component to bring the components into closer agreement with
he available feedback information – thereby improving overall performance of the
agent.

How do agent programs come into being?

 Build learning machines – first build an abstract learning machine


 Then teach them
 Preferred method for creating state of the art systems
Need to make a distinction between the learning element and the performance
element.

2.5.4.1 Component One – The performance element:

The performance element is responsible for selecting external actions


 It takes percepts and decides on actions
 what we previously thought of as the entire agent – example the reflex agent

Performance element encodes the performance standard (or external performance


standard – because we think of it as separate from the agent) – so we receive an a
percept – but we need to know whether this is a good or bad thing
Performance standard

 distinguishes part of the incoming percept as a reward (or penalty)


 that provides direct feedback on the quality of the agent’s behavior

For humans

 human choices can provide information about human preferences


 i.e. look at the choices that the human makes – indicated by the action it
takes o- this will allow you to see what its preferences are

e.g. chess – sensors might tell us that we have checkmated an opponent

 but it is the performance standard that tells us that this is a good thing
 percept itself does not say it is good – just that it is what has happened in
the environment
 performance standard must be fixed
 must be thought of as outside the agent – because agent must not be able
to modify the performance standard to fit its own behaviour

Example for taxis

 external performance standard must inform the agent that the loss of tips
is a negative contribution to its overall performance
 then agent might be able to learn which actions contribute to its utility

2.5.4.2 Component two - Learning element:

Learning agent can be divided into four conceptual components:

Responsible for making improvements

Design of learning element depends on the design of the performance element

When designing an agent that learns a certain capability – question is

 NOT – how am I going to learn this


 The right question is – what kind of performance measure will my agent
use once it has learned how?

Learning elements – use feedback from the critic – on how the agent is doing and
determines how the performance element should be modified to do better in future

2.5.4.2.1 A. The Critic:

The Critic tells the performance agent how well agent is doing with respect to a fixed
performance standard
 Necessary – because percepts themselves provide no indication of success
 NB – must think of critic as being outside of agent altogether – because
agent must not modify it to fit the agents own behaviour

So the agent perceives the environment via its senses – it passes that information to
the critic which tells it whether its performance is good or bad depending on the
external performance measure

2.5.4.2.2 B. Problem generator

 Responsible for suggesting actions that will lead to new and informative
experiences
 If performance measure had its way – would keep doing actions that that are
best, given what it knows but not willing to explore – I.e. not willing to do
some suboptimal actions to discover better actions for the long run.
 Problem generators job – suggest these exploratory actions

Scientists do this – objective – modify own brain to identify a better theory

General learning agent:

Simplest cases – involve learning from the percept sequence

 We observe pairs of successive states of the environment


 So after we do an action – we learn – “What my actions do?”
 And “How the world evolves” in response to my actions
E.g for the taxi driver:

 Does an action – exerts a certain braking pressure when driving on a wet


road
 Soon finds out how much deceleration is achieved
 And whether it skids off the road

Problem generator would – identify parts of the model that are in need of
improvement – and suggest experiments –

 such as trying out the brakes on different road surfaces under different
conditions

Example:

 external performance standard must inform the agent that the loss of tips is a
negative contribution to its overall performance

3 Solving problems by search
Solution is always a fixed sequence of actions.

Intelligent agents – want to maximize performance measure

 To achieve this – can simplify by adopting a goal


 And aim at satisfying it

Goal formulation –

 Based on the current situation


 And the agents performance measure
 first step to problem solving

Goal

 Set of world states


 Those states where the goal is satisfied

Agents task

 Find out how to act


 Now and in the future
 So that it reaches its goal state

Problem formulation

 Process of deciding what actions and states to consider given a goal

Map – provides agent with information on states it might get itself into and actions it
can take
 Use this information to consider subsequent stages of a hypothetical
journey
 Via each choice of action

In general

 An agent with several immediate options of unknown value can decide


what to do by first examining future actions that eventually lead to states of
known value

Assume environment is observable

 Agent always knows the current state

Discrete – for any given state, only finitely many actions to choose from

Assume environment is known

 Agent knows which states are reached by each action

Deterministic – each action has exactly one outcome

 I.e. will be only one possible percept after that action


 So can recommend only one possible action again

Under these assumptions – the solution is a fixed sequence of actions

Search – process of looking for a sequence of actions

Search algorithm

 Takes a problem as input


 Returns a solution in the form of a sequence of actions

General procedure:

1. Formulate problem
2. Use search algorithm to find solution
3. Then execute actions that solution recommends
- Execution phase

While agent is executing a solution – it ignores its percepts because it knows in


advance what they will be

3.1 Well defined problems and solutions

Problem can be defined formally by 5 components:

1. Initial state that the agent starts in


2. Description of possible actions available to the agent

 Given a particular state s


 Action(s) returns the set of actions that can be executed in s
 Say that - Each of these actions is applicable to s

3. Description of what each action does

 Formal name – transition model – specified by a function Results(s,a) –


returns the result from doing action a in state s
 Successor – means any state reachable from a single action
 Together – the initial state, actions and transition model, implicitly define
the state space
- Set of all states reachable from the initial state by any sequence of
actions
- The state space forms a directed network or graph in which the nodes
are states and the links between the nodes are actions
- A path in the state space is a sequence of states connected by a
sequence of actions
 Goal test – determines whether a given state is a goal state?
- Goal can be an explicit set of possible states- Test checks whether the
given state is a goal state
- Sometimes goal state is an abstract property
 Path cost function
- Assigns a numeric cost to each path
- Problem-solving agent chooses a cost function that reflects its own
performance measure
- Assumption – cost of a path can be described as the sum of the costs
of the individual actions along the path
- Step costs of taking action a in state s to reach state s’ is denoted
c(s,a,s’)

Can use these elements to define a problem that can be gathered into a single data
structure – that is given as input to a problem-solving algorithm

Solution to a problem – an action sequence that leads from the initial state to the
goal state

Solution quality – measured by path cost function

 Optimal solution has lowest path cost among all solutions

So the problem formulation consists of:

1. States
2. Initial state
3. Actions
4. Transition model
5. Goal test

3.2 3.2. Search algorithms


Record what has been done in the goal test

 State space will also record what has been done

3.3 Measuring problem solving performance


Four categories for evaluating problem solving performance:

1. Completeness:

 Is the algorithm always guaranteed to find a solution where there is one


 And to correctly report failure when there isn’t one

2. Cost optimality:

 Does it find a solution with the lowest path cost of all solutions

3. Time complexity:

 How long does it take to find a solution


 Can be measured in seconds
 Or by number of states and actions

4. Space complexity:

 How much memory is needed to perform the search

3.4 Uninformed Search


3.4.1 Breadth first search

Suitable when all actions have the same cost

Way it works:

1. First expand root node


2. Then each successor of the root node
3. Then the group of successor of those nodes
4. You never expand nodes of depth d – 1 before expanding all the nodes of
depth d

Can also implement as a call to the best-first-search – where the evaluation function
is the depth of the node

 Then all nodes of depth d will be occupy the same priority in the queue

Instead of doing this – do a unique implementation of the breadth-first-search as


opposed to adjusting the evaluation function of the best-first-search algorithm

Implementation details:

 Implement as a normal fifo queue


 So nodes added first will be expanded first

Reached can be a name instead of a mapping of name to state

 Because if we have reached a state at depth d –


 We can never find a better path to the state at depth – d + n

Always finds optimal solution because

 If it is generating nodes at depth d


 It has already generated all the nodes at depth d – 1
 So if the solution is at depth d – 1 , it would have been found, the algorithm
would have returned success – with the a shorter path cost the d

4 Search in Complex environments


4.1 Local Search and Optimization problems
Local Search – operate by searching from a start state to neighbouring states

 Without keeping track of the paths (no need for a function called getPath)
– but also may allow cycles
 Nor the set of states that have been reached (no need for a reached map
or set)
 i.e. just looking for the goal?
 Its about getting a snapshot of the state we want

Advantages:

 low memory usage

Disadvantages:

 because don’t keep track of states reached – might allow cycles

Local search can be used to solve optimization problems

 in which aim is to find the best state according to an objective function

To think of local search – use a state landscape

 each state corresponds to a point on this landscape (along the x axis)


 and the objective function maps the states to numerical values
 The output value – called the elevation – the elevation is produced by the
objective function
If elevation corresponds objective function (maximizing the objective function)

 Then the aim is to find the highest peak – global maximum


 We call the process hill climbing

If aim is to minimize cost

 Then the aim is to find the lowest valley – global minimum


 We call the process gradient descent

4.1.1 Hill Climbing

At each step the current node is replaced by the best neighbour

 So current is a node – not a state


The algorithm works by keeping track of one current state (via a node) on each state

 And moves to the neighboring state with the highest value


 It heads in the direction of that provides the steepest ascent
 It terminates when it reaches a “peak” (because might not be an actual
peak) – just means nothing close by is higher than we are – i.e. might be
in a local peak
 But hill climbing does not look beyond the immediate neighbours of the
current state

To implement

 Can use the negative of the heuristic function as the objective function
 i.e. climb locally to the state with the smallest heuristic distance to the goal

Hill climbing is sometimes called greedy local search

 because it grabs a good neighbour (an immediate successor state)


 without considering what would happen after that

Hill climbing can make rapid progress toward a solution

But can get stuck for any one of the following reasons
1. Local maxima – local maximum is peak that is higher than any of its
neigbouring states but lower than the global maximum
- Hill-climbing algorithms that land up “close” to a local maximum – will
be drawn towards the peak of that local maximum
- But then wont be able to get beyond it
- For the eight queens problem – means every move of a single queen
produces a worse state
2. Ridges

Figure 1. Visual Representation of Ridges in Hill-Climbing Search.

- Ridges result in a sequence of local maxima – that is difficult for greedy


algorithms to navigate
- From each local maximum – all the available neighbours point
downwards.
3. Plateaus
- Plateau – a flat area of the state space landscape
- Can be a flat local maximum – from which no uphill exit exists –
- Hill climbing can get lost on the plateau

Possible solutions:

1. Allow sideways movements


- i.e. try to keep going hoping that the plateau is a shoulder – i.e. that
there is a neck close by
- can limit number of sideways moves to e.g 100
2. Stochastic hill-climbing
- Chooses a random uphill move from the set of uphill moves
- Probability of selecting that move can vary with the steepness of the
move
3. First choice hill-climbing
- Generates successors randomly – until one is found that is better than
the current state
- i.e. out of the set of neighbours – randomly pick one – if its better than
current state – expand it
- good strategy when state has many successors / neighbours
4. random restart hill-climbing
- if at first you don’t succeed, try, try and try again
- i.e. keep doing hill-climbing algorithm from randomly generated initial
states until you find a solution

Let each hill-climbing search (on a specific problem) – have probability p of success

 then the expected number of restarts required is 1/p i.e.


 e.g if succeeds only 14 in 100 attempts
 meaning, every 7th attempt is success
 need roughly 7 iterations to find a solution – 6 failures and 1 success

Expected number of steps (meaning how many nodes must we expand/how many
states we must test) is

 number of iterations until a successful solution = 7 – i.e 6 failures and 1


success
 6 failures * 3 steps per failure = 18 expansions before reaching a
successful state
 1 success * 4 steps per success = 4 expansions in a successful state
 Total steps = 18 + 4 = 22

Success of hill-climbing depends on the shape of the state-space landscape – i.e.


the geometric picture

4.1.2 Simulated Annealing

For hill-climbing

 It never makes downhill moves toward lower value (or higher cost)
 Therefore vulnerable to getting stuck in a local maximum

Extreme opposite:

 A random walk – that moves to a successor without considering the value


 It will eventually stumble on the global max (if the search is complete)
 But will be extremely inefficient

Try to combine hill climbing with random walk – to mix completeness and efficiency

Switch viewpoint form simulated annealing to gradient descent

 i.e. minimizing cost

Good explanation:

 imagine task of getting a ping-pong ball into the deepest crevice in a very
bumpy surface
 if we just roll the ball – will land up in a local minimum
 but if we shake the surface – we can bounce the ball out of the local
minimum – perhaps in a deeper local minimum

In short – simulated- annealing solution:

 start by shaking hard (i.e. at a high temperature)


 then gradually reduce the temperature of the shaking (lower temperature)

General structure of simulated annealing:

 instead of picking the best move, pick a random move


 if it improves the situation, it is always accepted.
 Otherwise, the algorithm accepts the move with some probability less than
1

Explanation:

Remember in simulated annealing – we are trying to minimize cost – why?


In simulated annealing – the decision to accept/reject a move that worsens the
current solution is influenced by:

1. How much the move worsens the situation (“badness of the move”) quantified
by E;
- ΔE represents the change in the evaluation function due to a move
- If ΔE is positive the move worsens the situation,
- If negative – it improves the situation
2. The current temperature T, of the system

The probability of accepting a worse move

 decreases exponentially with the increase in ΔE and


 decreases with the temperature

Express this probability mathematically as:

P(ΔE, T) = e-E / T

For E > 0:

P(ΔE, T) = 1 / e E / T

The larger it is positive, the greater eE / T , so the smaller e-E / T , is.

 So larger the reduction in E, the less likely the move is to be accepted.


 To find min, we if E1 > E2 then, then E is positive – so we accept the move
because we are looking for the min cost
 But we don’t compute the probability – we accept the solution

For E < 0:

 => E2 > E1 , so the situation has gotten worse


 So we accept the step with probability e-E / T, do we use the absolute
value?

Cooling down
 As the algorithm progresses , T is gradually reduced according to a
cooling schedule
 We divide delta E by a smaller number, which increases the exponent of e,
therefore e increases , and 1/e becomes smaller
 Decreases the likelihood of accepting worse solutions.

Value(next) – value (current)

4.1.3 Local beam search

Local beam search – keeps track of k states rather than just one.

 Begins with k randomly generated states


 At each step – all the successors of all k states are generated
- If any of those successors is the goal – returns
 Otherwise – selects k best successors from the complete list of the
successors of the k initial states – then repeats

Downfall

 Can become clustered in a small region of state space


 Can apply stochastic beam search – analogous to stochastic hill climbing
 Stochastic beam search – chooses successors with probability
proportional to the successors value
4.2 Search with non-deterministic actions
4.2.1 Introduction

Chapter 3 assumed:

 Fully observable – the agent has the ability to know all information from
the environment relevant to the choice being made
 Deterministic – next state is completely determined by the current state
and the agents choice of action in that state
- I.e. no external factors will affect the state at all

Allowed the agent to:

 Look at the initial state


 calculate a sequence of actions that would lead to the goal
 end execute the actions without ever checking for new information

But now – partially observable

 sensors don’t give complete information on the state the environment is


actually in
 i.e. the agent does know for sure what state its in

and Non-deterministic – agent doesn’t know what state it transitions to after taking
an action

so now – result of performing an action isn’t a single state – it’s a set of possible sets

 if we think of an action as a relation called A then:


 A(s) = {a set of possible states that might result}
 The A-relative set of s – now called the Belief State
 Set of physical states that the agent believes are possible
4.2.2 Erratic vacuum world

Now we generalize the results function:

 Use a RESULTS function that returns a set of possible outcome states


 Instead of a single state

e.g

Result(1, suck) = { 5, 7 }

Now if we start in 1 then no single sequence of actions solves the problem

 Because if we land up in state 5 we haven’t achieved the goal state

So need a conditional plan:

Starting in state 1:

[suck, if State = 5 then [right, Suck] else [] ]

 Because there’s dirt in 1 – need to suck


 If we land up in 7 – do nothing we are done
 Otherwise if we land up in 5 – then go the right and suck – which leads us
to 8

So now the solution is if-else-then steps

 So solutions are trees rather than sequences (think discrete math)


 The conditional (if) tests to see what the current state is
 Something the agent will know at runtime but doesn’t know at planning
time

4.2.3 And-or-search trees

How do find these contingent solutions to non-deterministic problems?

Start – by constructing search trees


 But different to search trees in deterministic environments

In deterministic environments

 The only branches are produced due to the agents own choices – in each
state
 i.e. or can do this or that action – which will product an exact result
 called OR nodes

In a non-deterministic environment branching is also introduced by the unknown (or


environmental) result of performing an action

 i.e. action that is performed in a given state is now also somewhat dictated
by the environment
 call these and nodes
 i.e. the set of results that might occur

These two kinds of nodes alternate leading to an AND-OR tree


In the diagram

 state nodes are OR nodes – because in a state an agent chooses an


action
 i.e I can do this OR I can do that
 the green nodes are called AND nodes
 because this, AND, that , AND , this could happen (could be the result
state)

A solution of and AND-OR search problem is a sub-tree of the complete search tree,
that:

1. has a goal at every leaf


2. specifies only one action at each of its or nodes
3. includes every outcome branch at each of its AND nodes (i.e. accounts for
every possibility)
If current state is identical to a state on the path form the root – then returns failure?

Recursive depth-first search of AND-OR tree

4.3 Search in partially observable environments


Partial observability – the agents percepts are not enough to know the exact state of
the environment

So some actions must be used to reduce the uncertainty

4.3.1 Searching with no observation:

Situation – agents percepts provide no information at all

 Called a sensorless problem (conformant problem)

The solutions don’t rely on sensors working properly

Produces a sensorless plan?


e.g in vacuum world (deterministic)

 If the agent doesn’t know where it is – its initial belief state is { 1, 2 , 3, 4 ,


5, 6, 7, 8 }
 i.e. it could be anywhere
 but can still execute an action, e.g. right – and know that will land up in { 2,
4, 6, 8}
 so the agent gains information without perceiving anything
 i.e. you can use logic to figure out where you are
 if the state space is finite and deterministic
 after [right, suck] , agent will land up in one of { 4, 8}
 and after -> [Right, suck, left, suck] – agent will land up in goal state 7
 say that the agent can coerce the environment into state 7

The solution to a sensorless problem is sequence of actions – not a conditional plan


(because there is no sensing)

 i.e. looking for a sequence of actions that coerces the environment into
the goal state
 we search in the space of belief states – instead of the physical states

So the solution to a sensorless problem is always a sequence of actions

 because the percepts always return nothing


 there is no sensing occurring

4.3.1.1 Formulation of the sensorless problem

1. States

The belief-state space – is every subset of the physical states – i.e. the power set of
the set of physical states

2. Initial State

Typically – the entire set of physical states – sense the agent could be anywhere,
provided it knows absolutely nothing about where it currently is
3. Actions

Problem is that the belief state is a set that could contain multiple different states e.g.

b = {s1, s2 }

Now the agent is either in s1 or s2

But the possible actions that could be performed while in state s1 may not the be
same as the set of possible actions that could be performed in state s2.

 So how do we know which actions can be performed while in belief state


b?
 i.e. what is Actions(b) ? – the ACTIONS-relative set of the belief state b

Depends on what the effect will be of applying an illegal action to a certain state:

 if no effect – take the Union of ACTIONS(s1) and ACTIONS(s2)


 if catastrophic effect – take intersection

4. Transition Model: (RESULT function)

For deterministic actions – result of applying an action is a single belief state:

b’ = RESULT(b, a) = { s’ : s’ is an element of Result(s,a), for s element of b}

5. Goal Test

The agent possibly achieves the goal if some state s in the belief state satisfies the
goal test of the underlying problem.

 i.e. IS-GOAL(s) returns true

The agent necessarily achieves the goal if every state in the belief state is the goal
state

6. Action cost

Same action might have different costs in different states


After constructing the problem in this way – can solve the problem with any search
algorithm.

We can also keep track of belief states that have been reached:

 if a belief set has been reached say {1, 3, 5, 7} that is a superset of


another belief state say {5, 7}
 then we can discard the superset belief state i.e. prune it
 because if {1, 3, 5, 7} is a solution it means that each of the elements in
that set is a solution – so any subset of that set is also a solution
 thus we don’t need to try to solve the more difficult superset
 we can just use the actions to get to the easier subset and solve that
 similarly if a superset is found to be a solution then any subset of that
superset is guaranteed to be solveable

To solve the memory problem –

 use incremental belief-state search


 i.e. build up the solution one state at a time
 so you start with initial set to the first element of the belief state – try to find
a solution – check if it works for another element
 if not – try to find a different solution
 need to find a particular solution that works for each element of the belief
state

4.3.2 Searching in partially observable environments

Partially observable environments – means we can gather information on some of


the environment, but not all of it.

We implement the what the agent can perceive while in a given state as

 PERCEPT(s)
 Returns the percept received by the agent in a given state

If non-deterministic
 Then use PERCEPTS(s)
 Which returns a set of possible states

If fully observable – then PERCEPT(s) = s

 i.e. what the agent can see is exactly the state of the environment

In a sensorless problem – PERCEPTS(s) = null

So we perceive – and try to guess which state we are in?

 we perceive return some info – and use it to update which state we believe
we are in

4.3.2.1 Can think of the transition model between belief states in partially
observable problems as occurring in three stages:

1. Prediction stage

Prediction stage computes the belief state resulting from the an action.

i.e. RESULT(b, a)

 To emphasize that it’s a prediction – use notation – b hat


 B hat = RESULT(b, a)
 Where hat over the b means ‘estimated’ – so we are saying the estimated
belief state
 Use the term PREDICT(b, a) as a synonym for RESULT(b, a)
 Where an action the belief state – and get a bunch states out that we
could land up in

2. Possible Percepts

Computes the set of percepts that could be observed in the predicted belief state

 so using the letter o for observation


 POSSIBLE-PERCEPTS(b hat) = { o : o = PERCEPT(s) and s element of b
hat}
 i.e. go through each element of the belief state (that new state)
 and we think about – what would we observe if we were in that state?
 for each s in the belief state – compute PERCEPT(s) i.e. compute the
percept that we would see if we were in the state
 this is PERCEPT(s) – the set of all things we would observe for every state
in the new belief state

3. Update Stage:

The update stage computes, for each possible percept, the belief state that would
result from the percept

The new belief state b0 is the set of states in b hat, that could have produced the
percept (for each percept in the possible percepts set)

B0 = { s: o = PERCEPT(s) and s element of b hat}

 the set of all states in b hat


 such for some o element of POSSIBLE-PERCEPTS, o P s

Putting these three stage together, we get:

RESULTS(b, s) = { b0 : b0 = UPDATE(PREDICT(b, a)) and o is an element of


POSSIBLE-PERCEPTS(PREDICT(b,a))

4.3.3 Solving partially observable problems


4.3.4 An agent for partially observable environments

An agent for a partially observable environment:

1. Formulates a problem
2. Calls a search algorithm to solve it
3. And executes the solution

Solution is a conditional plan


5 Adversarial search and games
5.1 Game theory
5.1.1 Two-Player zero-sum games

Games most studied called:

1. Deterministic
- A single action by a player produces a single, predictable resulting
state
2. Two-player
3. Turn-taking
4. Perfect information
- Fully observable – means each player they are currently in
5. Zero-sum
- Means what is good for one player is just as bad for the other player
- No win-win outcome (a competitive situation)

For games we have different terminology

 Use the word “move” as a synonym for action


 And “position” as a synonym for state

Basic concepts – call our players:

 MAX
- MAX moves first – then the players take turns until the game stops
- At end of game points are awarded to the winning player
- And penalties are given to the loser
 MIN

Formally define a game as follows:

1. S0 – the initial state – specifies how the game is set up at the start
2. TO-MOVE(s) – returns the player whose turn it is to move in state s
3. ACTION(s) – the set of legal moves in state s
4. RESULT(s, a) – The transition model
- Defines the state that results from taking action a in state s
5. IS-TERMINAL(s) – a terminal test – true when the game is over – and false
otherwise
- States where the game has ended – called terminal states
6. UTILITY(s, p)
- A utility function – also called objective function or payoff function
- Defines the final numeric value to player p – with the game ends in
terminal state s
- In chess – outcome is a win, loss, or draw

5.1.1.1 Basic properties:

The initial state, ACTIONS function, and RESULT function


 Define a state space graph
 Graph where the vertices are states
 And edges are possible actions that can be performed while in that state
 And a state may be reached by multiple paths (sequences of actions)

Can superimpose a search tree over the search graph

Define a complete game tree

 A search tree that follows every sequence of moves all the way to the
terminal state
 i.e. try to find the conclusion of every possible sequence of moves
 Game tree may be infinite

Leaf nodes in the search tree correspond to terminal states

 Each leaf node must have a number


 Corresponding to the utility value of the terminal state – from the point of
view of MAX
 High numbers – good for MAX (conversely bad for MIN)
 Low number – bad for MAX (conversely good for MIN)
5.2 Optimal Decisions in games
Two players, max and min

Max strategy:

 A conditional plan – contingent strategy specifying a response to each of


MIN’s possible moves

For games that have a binary outcome (i.e. a 1 if Max wins, and 0 if max losses –
with only two players) – can use AND-OR search

 Then we generate a conditional plan


 i.e. for these games – def of a winning strategy for the game – the same
as the solution to a non-deterministic planning problem
 i.e. do x, if in s0 then do y, else nothing
 desirable outcome – must be guaranteed – no matter what the side does –
i.e. no matter which move MIN makes

But for games with multiple outcome scores

 need a more general search that AND-OR -> the more general algorithm is
called minimax search algorithm
 i.e. each search problem can be implemented using some or other search
algorithm
 we can either use one that we know – or we must define a new one

Everything has a tight meaning.

 “move” – in some contexts means – both players have taken action , but
could also mean, a single player has moved

For now – define Ply – to mean – one move, by one player

 Resulting in moving – one level deeper in the game tree


 Because we model a game tree, with each level (depth) - corresponding to
each of the possible states that a player might be in
Utility – corresponds only to a terminal state

Minimax value – the utility (for Max) of being in that state

 Assuming that both players will play optimally from that state to the
end of the game
 The minimax value of terminal state – just its utility

In a non-terminal state – MAX prefers to move to a state of maximum value

 i.e. max chooses the action that will result in the highest minimax value

MIN – prefers a state of minimum value

 i.e. min choose the action that results in the min value for MAX

So we define mimimax as:

MINIMAX(s) =

UTILITY(s, MAX) if IS-TERMINAL(s)

argmaxaActions(s)MINIMAX(RESULT(s, a)) if To_MOVE(s) = MAX

argminaAction(s)MINIMAX(RESULT(s,a)) if TO-MOVE(s) = MIN

So the utility value of each node – is the max minimax value from the minimax
values of its successors?

To find the optimal strategy

 given that we have game tree


 to find the optimal strategy – we work out the minimax value of each state
in the tree
 then the optimal actions is the action that leads to the state with the
highest minimax value
The terminal nodes – get their utility values (minimax value for a terminal state) –
from the UTILITY function

5.2.1 The minimax search algorithm

Once we can compute the minimax value for a state i.e. MINIMAX(s)

 Can turn it into a search algorithm


 That finds the best move for max by:
- Computing the resulting state of all the actions MAX can try
- Then computing the minimax value for each of those states
- And choosing the action that leads to the state that has the highest
MINIMAX value

MINIMAX algorithm is a recursive algorithm

 It proceeds to the leafs of the tree – computes the utility


 Then backs up the minimax value through the tree as the recursion
unwinds

The optimal move – the move that leads to a terminal state with max utility
Minimax algorithm – performs a complete depth-first exploration of the game tree

If max depth of the game tree is m

And there are b legal moves

Time complexity for MINIMAX – O(bm)

Space complexity –

 O(bm) – for algorithm that generates all the actions at once


 O(m) – for an algorithm that generates actions one at a time

Exponential complexity – makes MINIMAX impractical for complex games

 E.g. chess has branching factor of 35


 Average depth of 80 ply
 3580 states – not feasible
 But by approximating minimax analysis – can derive more practical
algorithms

Algorithm for calculating minimax decisions

 It returns the action corresponding to the best possible move


 i.e. move that leads to the outcome with the best utility
 under assumption that opponent plays to minimize utility

The functions MAX-VALUE and MIN-VALUE

 go through the whole search tree


 all the way to the leaves – to determine the back up value of the state
 they do this through alternating recursion

Notation: argmax a  S f(a) -> computes the element of a of set S that has the
maximum value of f(a)

 so argmax is a function?

So the way it works:


 When it is MAX’s chance to play – he has options – he can act in different
ways
 His goal is to maximize the end value
 So he wants to know the value of each end value – and pick the greatest
one
 But after each action that makes takes – it is MINS turn to play – and
MAX’s actions lead to states – MIN will choose an action based on those
states
 So MAX does an action – it leads to a state – then min plays based on
those states – choosing the action that will result in the best state for MIN
– and MAX wants to know those states in advance – after MIN plays what
will the resulting states be – and what will those values be for MAX?

Problem becomes – what move will MIN make in the states that result from MAX
actions

 In this case – MIN will pick the smallest value for each node

5.3 Alpha-beta search


Problem with minimax – its good for computing a move

 But gets intractable when number of game states gets bigger


 Because – the algorithm needs to search all the leaves of the tree – which
increases exponentially to its depth

Pruning – means removing parts of the tree that we do not need

Alpha-beta – is a type of pruning

Search using alpha-beta pruning – called alpha-beta search

Essentially a simplification of MINIMAX

Example
MINIMAX ( root )=max ⁡¿

¿ max ( 3 , min ( 2 , x , y ) ,2 )¿ max ( 3 , z ,2 ) , where z=min ( 2 , x , y ) ≤ 2¿ 3

In this case – z was the only unknown – so we could assert that.

i.e. Value of the root is independent of the values of the leaves x and y

 So we can prune those leaves

General principle for alpha-beta pruning:

 If we have a node n somewhere in the tree


 And the current player has a choice of moving to n
 i.e. either n is a direct successor that Player is considering ( within its min
or max calc ) – or its somewhere further down
 if Player has a better choice – either at the same level – e.g. m’

 or at any point higher up in the tree – then we don’t need to consider m’


 better choice meaning:
- For MAX who is computing the max of the mins
- A better choice would mean max has already discovered a node that
has a value higher than any node that is element of that next node
(since MIN will pick the lowest) so any node lower guarantees that that
node will have a lower minimax value than the better choice node
So If we have enough info about n – by examining some of its descendants

 We can prune it

Algorithm:

Same as the min-max algorithm – except – that we maintain bounds in the variables
alpha and beta

 And use them to cutoff search when a value is outside the bounds

5.3.1 Diversion:

Or below – RESULT(Ap, s ) = {s1 , s2 , s3, … } = S (the result-relative set of Ap – the


set of actions that p can play in that state

Then MAX(Sresult-p-a) = { min(x1 ) ,min(x2 ) , min(x3) … }

 The minimax-relative set of Sresult,p,a

Now MAX – wants to pick the highest value from this set – but if

Implementation of alpha-beta uses two extra parameters

 In the MAX-VALUE function


 Signature is now – MAX-VALUE(state,  ,  )

 = the value of the best – (highest-value) choice we have found so far – at any
choice point along the path for max – i.e. maxeses of the mins –

 Think :  = “at least”

 = the value of the best (lowest value) choice we have found so far at any choice
point along the path for MIN

 Think:  = “at most”

We’re considering a single path

e.g.
Now we view the search as bounds

 And we limit to specific intervals


 The first step is to understand if we’re dealing with a max node or a min
node
 Min node allows anything below what it has discovered – but nothing
more
 Max node allows anything above what it has already discovered – but
nothing less

We explore the successors – and update the bounds for the parents based on the
successors

5.4 Move Ordering


Effectiveness of alpha-beta pruning is highly dependent on the order in which the
states are examined

Success of alpha-beta depends on the order in which nodes are explored


 Say we have 10 nodes – but min node is the last one
 If we explore the other 9 before the one we explore last – which is the min
 Then we don’t achieve anything – we have searched the entire tree
anyways
 But if we had explored the min one first – we save ourselves form
exploring the rest

5.4.1 Strategies to find the best move

Add dynamic move ordering – first try moves that have been the best in the past

Or can try moves based on previous exploration of the current move

 Through iterative deepening


 First search one ply deep – and record the ranking of moves based on
their evaluations
 Then search 1 ply deeper – using the previous ranking to inform move
ordering etc.
 NB so try iterative deepening

Best moves – known as killer moves

 To try the killer moves first


 Called the killer move heuristic

In game search, redundant paths can be cause by transpositions

 A transposition – is a different permutation of the move sequence that


end up in the same position (state) ?
 Can address this problem with a permutation table that caches the
heuristic value of states
 In chess – using permutation tables – allows to double the reachable
search depth – in the same amount of time

Even with alpha-beta pruning and clever move ordering –

 Minimax still doesn’t work for games with lots of search states
 Because too many states to explore with the available time

Two possible solutions to the problem of many states in the game tree:

1. Type A:

 Consider all possible moves to a certain depth in the tree


 Then use a heuristic evaluation function to estimate the utility of states at
that depth
 Explore wide but shallow portion of the tree

2. Type B

 Ignores moves that look bad


 And follows promising lines as far as possible
 Explores deep but narrow portion of the search tree
 Shown to have world-championship-chess level

5.5 Monte Carlo Tree Search

You might also like