Artificial Intelligence
Artificial Intelligence
Russel
1 Basic concepts
Agent – something that acts
Rational agent
2 Intelligent agents
2.1 Agents and Environments
Agent – anything that can be viewed as
Percept sequence – the complete history of everything the agent has ever perceived
Agents behaviour
Sphex is unable to learn that its plan is failing – and so does not change it.
To extent that the agent relies on prior knowledge of its designer – rather than its
own percepts – the agent lacks autonomy - NB
Rational agent should be autonomous – should learn what it can to compensate for
partial or incorrect prior knowledge.
Incorporation of learning allows to design a rational agent that will succeed in a vast
variety of environments.
The flavour of the task environment directly affects the appropriate design for the
agent program
1. Performance measure
2. Environment
3. Actuators
4. Sensors
First Question – what is the performance measure to which we would like our agent
to aspire?
E.g. roads
Traffic, pedestrians, stray animals, road works
Actuators
Sensors
Behavior
What percepts does the environment generate?
Performance measure?
Behaviour of an agent – the action that is performed after any given sequence of
percepts
program function – specifies the actions of the agent based on its percept variable
2.3 Properties of task environments:
Want dimensions along which task environments can be categorized
Fully Observable - When agents sensors give it access to the complete state of the
environment at each point in time.
i.e. if the sensors detect all aspects that are relevant to the choice of action
relevance depends on the performance measure
Partially observable –
Unobservable –
E.g. chess
Dynamic – the environment can change while the agent is deliberating – we say the
environment is dynamic for the agent
Semi-dynamic –
Discrete –
2.3.6 6. Known vs unknown
We call it a known (or unknown) environment but really it refers to the knowledge of
the agent (or the programmer/designers) about the laws of the environment
Known environment:
Unknown – agent will have to learn how it works in order to make good decisions
The job of AI – is to design the an agent program that implements the agent
function
Agent architecture – the computing device, with physical sensors and actuators,
that the agent program will run on.
architecture makes the percepts from the sensors available to the program
runs the program
feeds the program’s action choices to the actuators as they are generated
Select actions on the basis of the current percept – ignoring the rest of the percept
history.
INTERPRET-INPUT function
Rules-and-matching conceptual – NB
agent can keep track of the parts of the environment it cant see now
i.e. agent maintains an internal state, that depends on the percept history,
and thereby reflects at least some of the unobserved aspects of the current
state
Diagram
To update this internal state as time goes by – need two types of knowledge to be
encoded inside the agent program:
1. Information about how the world changes over time – divided into 2 equal
parts (Transition model)
a. the effects of the agents actions on the state of the environment
b. how the world evolves independently of the agent
2. Need information about how the state of the world is reflected in the agent’s
percepts – sensor model
a. What indicators in the environment does the agent perceive, that allows it
to conclude a certain state
Transition model and sensor model together allow the agent to keep track of the
state of the world
Agent that uses these two models is called a model based agent
The idea:
Knowing something about current state is not always enough to decide what to do .
Agent needs some sort of goal information – that describes the situations that are
desirable
Agent program can combine this with the model (of how the world works)
To describe the situations that are desirable
Reflex agent
Brakes when it sees a red light – because that’s what its condition-action rule
specifies
It has no idea why
Goals alone – not enough to generate high quality behaviour – because only provide
a crude binary description between happy and unhappy states
Rational utility based agent chooses the action that maximizes the expected
utility of the action outcomes
i.e. the utility the agent expects to derive, on average, given the probabilities
and utilities of each outcome
5. Learning Agents
For humans
but it is the performance standard that tells us that this is a good thing
percept itself does not say it is good – just that it is what has happened in
the environment
performance standard must be fixed
must be thought of as outside the agent – because agent must not be able
to modify the performance standard to fit its own behaviour
external performance standard must inform the agent that the loss of tips
is a negative contribution to its overall performance
then agent might be able to learn which actions contribute to its utility
Learning elements – use feedback from the critic – on how the agent is doing and
determines how the performance element should be modified to do better in future
The Critic tells the performance agent how well agent is doing with respect to a fixed
performance standard
Necessary – because percepts themselves provide no indication of success
NB – must think of critic as being outside of agent altogether – because
agent must not modify it to fit the agents own behaviour
So the agent perceives the environment via its senses – it passes that information to
the critic which tells it whether its performance is good or bad depending on the
external performance measure
Responsible for suggesting actions that will lead to new and informative
experiences
If performance measure had its way – would keep doing actions that that are
best, given what it knows but not willing to explore – I.e. not willing to do
some suboptimal actions to discover better actions for the long run.
Problem generators job – suggest these exploratory actions
Problem generator would – identify parts of the model that are in need of
improvement – and suggest experiments –
such as trying out the brakes on different road surfaces under different
conditions
Example:
external performance standard must inform the agent that the loss of tips is a
negative contribution to its overall performance
3 Solving problems by search
Solution is always a fixed sequence of actions.
Goal formulation –
Goal
Agents task
Problem formulation
Map – provides agent with information on states it might get itself into and actions it
can take
Use this information to consider subsequent stages of a hypothetical
journey
Via each choice of action
In general
Discrete – for any given state, only finitely many actions to choose from
Search algorithm
General procedure:
1. Formulate problem
2. Use search algorithm to find solution
3. Then execute actions that solution recommends
- Execution phase
Can use these elements to define a problem that can be gathered into a single data
structure – that is given as input to a problem-solving algorithm
Solution to a problem – an action sequence that leads from the initial state to the
goal state
1. States
2. Initial state
3. Actions
4. Transition model
5. Goal test
1. Completeness:
2. Cost optimality:
Does it find a solution with the lowest path cost of all solutions
3. Time complexity:
4. Space complexity:
Way it works:
Can also implement as a call to the best-first-search – where the evaluation function
is the depth of the node
Then all nodes of depth d will be occupy the same priority in the queue
Implementation details:
Without keeping track of the paths (no need for a function called getPath)
– but also may allow cycles
Nor the set of states that have been reached (no need for a reached map
or set)
i.e. just looking for the goal?
Its about getting a snapshot of the state we want
Advantages:
Disadvantages:
To implement
Can use the negative of the heuristic function as the objective function
i.e. climb locally to the state with the smallest heuristic distance to the goal
But can get stuck for any one of the following reasons
1. Local maxima – local maximum is peak that is higher than any of its
neigbouring states but lower than the global maximum
- Hill-climbing algorithms that land up “close” to a local maximum – will
be drawn towards the peak of that local maximum
- But then wont be able to get beyond it
- For the eight queens problem – means every move of a single queen
produces a worse state
2. Ridges
Possible solutions:
Let each hill-climbing search (on a specific problem) – have probability p of success
Expected number of steps (meaning how many nodes must we expand/how many
states we must test) is
For hill-climbing
It never makes downhill moves toward lower value (or higher cost)
Therefore vulnerable to getting stuck in a local maximum
Extreme opposite:
Try to combine hill climbing with random walk – to mix completeness and efficiency
Good explanation:
imagine task of getting a ping-pong ball into the deepest crevice in a very
bumpy surface
if we just roll the ball – will land up in a local minimum
but if we shake the surface – we can bounce the ball out of the local
minimum – perhaps in a deeper local minimum
Explanation:
1. How much the move worsens the situation (“badness of the move”) quantified
by E;
- ΔE represents the change in the evaluation function due to a move
- If ΔE is positive the move worsens the situation,
- If negative – it improves the situation
2. The current temperature T, of the system
P(ΔE, T) = e-E / T
For E > 0:
P(ΔE, T) = 1 / e E / T
The larger it is positive, the greater eE / T , so the smaller e-E / T , is.
For E < 0:
Cooling down
As the algorithm progresses , T is gradually reduced according to a
cooling schedule
We divide delta E by a smaller number, which increases the exponent of e,
therefore e increases , and 1/e becomes smaller
Decreases the likelihood of accepting worse solutions.
Local beam search – keeps track of k states rather than just one.
Downfall
Chapter 3 assumed:
Fully observable – the agent has the ability to know all information from
the environment relevant to the choice being made
Deterministic – next state is completely determined by the current state
and the agents choice of action in that state
- I.e. no external factors will affect the state at all
and Non-deterministic – agent doesn’t know what state it transitions to after taking
an action
so now – result of performing an action isn’t a single state – it’s a set of possible sets
e.g
Result(1, suck) = { 5, 7 }
Starting in state 1:
In deterministic environments
The only branches are produced due to the agents own choices – in each
state
i.e. or can do this or that action – which will product an exact result
called OR nodes
i.e. action that is performed in a given state is now also somewhat dictated
by the environment
call these and nodes
i.e. the set of results that might occur
A solution of and AND-OR search problem is a sub-tree of the complete search tree,
that:
i.e. looking for a sequence of actions that coerces the environment into
the goal state
we search in the space of belief states – instead of the physical states
1. States
The belief-state space – is every subset of the physical states – i.e. the power set of
the set of physical states
2. Initial State
Typically – the entire set of physical states – sense the agent could be anywhere,
provided it knows absolutely nothing about where it currently is
3. Actions
Problem is that the belief state is a set that could contain multiple different states e.g.
b = {s1, s2 }
But the possible actions that could be performed while in state s1 may not the be
same as the set of possible actions that could be performed in state s2.
Depends on what the effect will be of applying an illegal action to a certain state:
5. Goal Test
The agent possibly achieves the goal if some state s in the belief state satisfies the
goal test of the underlying problem.
The agent necessarily achieves the goal if every state in the belief state is the goal
state
6. Action cost
We can also keep track of belief states that have been reached:
We implement the what the agent can perceive while in a given state as
PERCEPT(s)
Returns the percept received by the agent in a given state
If non-deterministic
Then use PERCEPTS(s)
Which returns a set of possible states
i.e. what the agent can see is exactly the state of the environment
we perceive return some info – and use it to update which state we believe
we are in
4.3.2.1 Can think of the transition model between belief states in partially
observable problems as occurring in three stages:
1. Prediction stage
Prediction stage computes the belief state resulting from the an action.
i.e. RESULT(b, a)
2. Possible Percepts
Computes the set of percepts that could be observed in the predicted belief state
3. Update Stage:
The update stage computes, for each possible percept, the belief state that would
result from the percept
The new belief state b0 is the set of states in b hat, that could have produced the
percept (for each percept in the possible percepts set)
1. Formulates a problem
2. Calls a search algorithm to solve it
3. And executes the solution
1. Deterministic
- A single action by a player produces a single, predictable resulting
state
2. Two-player
3. Turn-taking
4. Perfect information
- Fully observable – means each player they are currently in
5. Zero-sum
- Means what is good for one player is just as bad for the other player
- No win-win outcome (a competitive situation)
MAX
- MAX moves first – then the players take turns until the game stops
- At end of game points are awarded to the winning player
- And penalties are given to the loser
MIN
1. S0 – the initial state – specifies how the game is set up at the start
2. TO-MOVE(s) – returns the player whose turn it is to move in state s
3. ACTION(s) – the set of legal moves in state s
4. RESULT(s, a) – The transition model
- Defines the state that results from taking action a in state s
5. IS-TERMINAL(s) – a terminal test – true when the game is over – and false
otherwise
- States where the game has ended – called terminal states
6. UTILITY(s, p)
- A utility function – also called objective function or payoff function
- Defines the final numeric value to player p – with the game ends in
terminal state s
- In chess – outcome is a win, loss, or draw
A search tree that follows every sequence of moves all the way to the
terminal state
i.e. try to find the conclusion of every possible sequence of moves
Game tree may be infinite
Max strategy:
For games that have a binary outcome (i.e. a 1 if Max wins, and 0 if max losses –
with only two players) – can use AND-OR search
need a more general search that AND-OR -> the more general algorithm is
called minimax search algorithm
i.e. each search problem can be implemented using some or other search
algorithm
we can either use one that we know – or we must define a new one
“move” – in some contexts means – both players have taken action , but
could also mean, a single player has moved
Assuming that both players will play optimally from that state to the
end of the game
The minimax value of terminal state – just its utility
i.e. max chooses the action that will result in the highest minimax value
i.e. min choose the action that results in the min value for MAX
MINIMAX(s) =
So the utility value of each node – is the max minimax value from the minimax
values of its successors?
Once we can compute the minimax value for a state i.e. MINIMAX(s)
The optimal move – the move that leads to a terminal state with max utility
Minimax algorithm – performs a complete depth-first exploration of the game tree
Space complexity –
Notation: argmax a S f(a) -> computes the element of a of set S that has the
maximum value of f(a)
so argmax is a function?
Problem becomes – what move will MIN make in the states that result from MAX
actions
In this case – MIN will pick the smallest value for each node
Example
MINIMAX ( root )=max ¿
i.e. Value of the root is independent of the values of the leaves x and y
We can prune it
Algorithm:
Same as the min-max algorithm – except – that we maintain bounds in the variables
alpha and beta
And use them to cutoff search when a value is outside the bounds
5.3.1 Diversion:
Now MAX – wants to pick the highest value from this set – but if
= the value of the best – (highest-value) choice we have found so far – at any
choice point along the path for max – i.e. maxeses of the mins –
= the value of the best (lowest value) choice we have found so far at any choice
point along the path for MIN
e.g.
Now we view the search as bounds
We explore the successors – and update the bounds for the parents based on the
successors
Add dynamic move ordering – first try moves that have been the best in the past
Minimax still doesn’t work for games with lots of search states
Because too many states to explore with the available time
Two possible solutions to the problem of many states in the game tree:
1. Type A:
2. Type B