Chapter 2 - AI - Notes
Chapter 2 - AI - Notes
INTELLIGENT AGENTS
2.1 Introduction
An agent is anything that can be viewed as perceiving its environment through sensors and
sensor acting upon that environment through actuators. This simple idea is illustrated in Figure 1.
– A human agent has eyes, ears, and other organs for sensors and hands, legs,
mouth, and other body parts for actuators.
– A robotic agent might have cameras and infrared range finders for sensors and
various motors for actuators.
– A software agent receives keystrokes, file contents, and network packets as
sensory inputs and acts on the environment by displaying on the screen, writing
files, and sending network packets.
We use the term percept to refer to the agent's perceptual inputs at any given instant.
Percept Sequence An agent’s percept sequence is the complete history of everything the agent
has ever perceived.
Agent function mathematically speaking, we say that an agent's behavior is described by the
agent function that maps any given percept sequence to an action.
Agent programs Internally, The agent function for an artificial agent will be implemented by an
agent program. It is important to keep these two ideas distinct. The agent function is an abstract
mathematical description; the agent program is a concrete implementation, running on the agent
architecture.
– Past experiences of previous actions and observations, or other data, from which
it can learn;
• Goals that it must try to achieve or preferences over states of the world; and
This particular world has just two locations: squares A and B. The vacuum agent
perceives which square it is in and whether there is dirt in the square. It can choose to move left,
move right, suck up the dirt, or do nothing. One very simple agent function is the following: if
the current square is dirty, then suck, otherwise move to the other square. A partial tabulation of
this agent function is shown in Figure 2 :
For each possible percept sequence, a rational agent should select an action that is expected to
maximize its performance measure, given the evidence provided by the percept sequence and
whatever built-in knowledge the agent has.
Performance measures
A performance measure embodies the criterion for success of an agent's behavior. When an agent
is plunked down in an environment, it generates a sequence of actions according to the percepts
it receives. This sequence of actions causes the environment to go through a sequence of states.
If the sequence is desirable, then the agent has performed well.
Consider the vacuum-cleaner agent from the preceding section. We might propose to measure
performance by the amount of dirt cleaned up in a single eight-hour shift. With a rational agent,
of course, what you ask for is what you get. A rational agent can maximize this performance
measure by cleaning up the dirt, then dumping it all on the floor, then cleaning it up again, and so
on. A more suitable performance measure would reward the agent for having a clean floor. For
example, one point could be awarded for each clean square at each time step (perhaps with a
penalty for electricity consumed and noise generated). As a general rule, it is better to design
performance measures according to what one actually wants in the environment, rather than
according to how one thinks the agent should behave.
An omniscient agent knows the actual outcome of its actions and can act accordingly; but
omniscience is impossible in reality.
Our definition of rationality does not require omniscience, then, because the rational choice
depends only on the percept sequence to date. We must also ensure that we haven't inadvertently
allowed the agent to engage in decidedly under intelligent activities. For example, if an agent
does not look both ways before crossing a busy road, then its percept sequence will not tell it that
there is a large truck approaching at high speed. Does our definition of rationality say that it's
now OK to cross the road? Far from it! First, it would not be rational to cross the road given this
uninformative percept sequence: the risk of accident from crossing without looking is too great.
Second, a rational agent should choose the "looking" action before stepping into the street,
because looking helps maximize the expected performance. Doing actions in order to modify
future percepts-sometimes called information gathering-is an important part of rationality. A
second example of information gathering is provided by the exploration that must be undertaken
by a vacuum-cleaning agent in an initially unknown environment.
Our definition requires a rational agent not only to gather information, but also to learn as much
as possible from what it perceives. The agent's initial configuration could reflect some prior
knowledge of the environment, but as the agent gains experience this may be modified and
augmented. There are extreme cases in which the environment is completely known a priori. In
such cases, the agent need not perceive or learn; it simply acts correctly. Of course, such agents
are very fragile.
Successful agents split the task of computing the agent function into three different periods:
when the agent is being designed, some of the computation is done by its designers; when it is
deliberating on its next action, the agent does more computation; and as it learns from
experience, it does even more computation to decide how to modify its behavior.
To the extent that an agent relies on the prior knowledge of its designer rather than on its own
percepts, we say that the agent lacks autonomy. A rational agent should be autonomous-it should
learn what it can to compensate for partial or incorrect prior knowledge. For example, a vacuum-
cleaning agent that learns to foresee where and when additional dirt will appear will do better
than one that does not. As a practical matter, one seldom requires complete autonomy from the
start: when the agent has had little or no experience, it would have to act randomly unless the
designer gave some assistance.
We must think about task environments, which are essentially the "problems" to which rational
agents are the "solutions." Before we design an intelligent agent, we must specify its “task
environment”, define the problem.
In our discussion of the rationality of the simple vacuum-cleaner agent, we had to specify the
performance measure, the environment, and the agent's actuators and sensors. We will group all
these together under the heading of the task environment.
• Environment is the surrounding areas that the agent perceives information from.
PEAS:
– Performance measure
– Environment
– Actuators
– Sensors
let us consider a more complex problem: an automated taxi driver. The full driving task is
extremely open-ended. There is no limit to the novel combinations of circumstances that can
arise.
First, what is the performance measure to which we would like our automated driver to aspire?
Desirable qualities include getting to the correct destination; minimizing fuel consumption and
wear and tear; minimizing the trip time and/or cost; minimizing violations of traffic laws and
disturbances to other drivers; maximizing safety and passenger comfort; maximizing profits.
Obviously, some of these goals conflict, so there will be tradeoffs involved.
Next, what is the driving environment that the taxi will face? Any taxi driver must deal with a
variety of roads. The roads contain other traffic, pedestrians, stray animals, road works, police
cars, puddles, and potholes. The taxi must also interact with potential and actual passengers.
The actuators available to an automated taxi will be more or less the same as those available to a
human driver: control over the engine through the accelerator and control over steering and
braking. In addition, it will need output to a display screen or voice synthesizer to talk back to
the passengers, and perhaps some way to communicate with other vehicles.
To achieve its goals in the driving environment, the taxi will need to know where it is, what else
is on the road, and how fast it is going. Its basic sensors should therefore include one or more
controllable TV cameras, the speedometer, and the odometer. To control the vehicle properly,
especially on curves, it should have an accelerometer; it will also need to know the mechanical
state of the vehicle, so it will need the usual array of engine and electrical system sensors. It
might have instruments that are not available to the average human driver: a satellite global
positioning system (GPS) to give it accurate position information with respect to an electronic
map, and infrared or sonar sensors to detect distances to other cars and obstacles. Finally, it will
need a keyboard or microphone for the passenger to request a destination.
Performance
Agent Type Environment Actuators Sensors
Environment
Cameras, sonar,
Roads, other Steering, speedometer,
Safe: fast, legal,
traffic, accelerator, GPS, odometer,
Taxi driver comfortable trip,
pedestrians, brake, signal, accelerometer,
maximize profits
customers horn, display engine sensors,
keyboard
If an agent's sensors give it access to the complete state of the environment at each point
in time, then we say that the task environment is fully observable. A task environment is
effectively fully observable if the sensors detect all aspects that are relevant to the choice of
action; relevance, in turn, depends on the performance measure. An environment might be
partially observable because of noisy and inaccurate sensors or because parts of the state are
simply missing from the sensor data-for example, a vacuum agent with only a local dirt sensor
cannot tell whether there is dirt in other squares, and an automated taxi cannot see what other
drivers are thinking.
If the next state of the environment is completely determined by the current state and the action
executed by the agent, then we say the environment is deterministic; otherwise, it is stochastic. In
principle, an agent need not worry about uncertainty in a fully observable, deterministic
environment. If the environment is partially observable, however, then it could appear to be
stochastic. Taxi driving is clearly stochastic in this sense, because one can never predict the
behavior of traffic exactly; moreover, one's tires blow out and one's engine seizes up without
warning. If the environment is deterministic except for the actions of other agents, we say that
the environment is strategic.
In an episodic task environment, the agent's experience is divided into atomic episodes. Each
episode consists of the agent perceiving and then performing a single action. Crucially, the next
episode does not depend on the actions taken in previous episodes. In episodic environments, the
choice of action in each episode depends only on the episode itself. Many classification tasks are
episodic. In sequential environments, on the other hand, the current decision could affect all
future decisions. Chess and taxi driving are sequential: in both cases, short-term actions can have
long-term consequences. Episodic environments are much simpler than sequential environments
because the agent does not need to think ahead.
If the environment can change while an agent is deliberating, then we say the environment is
dynamic for that agent; otherwise, it is static. Static environments are easy to deal with because
the agent need not keep looking at the world while it is deciding on an action, nor need it worry
about the passage of time. Taxi driving is clearly dynamic: the other cars and the taxi itself keep
moving while the driving algorithm dithers about what to do next.
The distinction between single-agent and multi agent environments may seem simple enough.
An agent operating by itself in an environment. Does the other agent interfere with the
performance measure?.
There are, however, some subtle issues. First, we have described how an entity may be viewed
as an agent, but we have not explained which entities must be viewed as agents. Does an agent A
(the taxi driver for example) have to treat an object B (another vehicle) as an agent, or can it be
treated merely as a stochastically behaving object, analogous to waves at the beach or leaves
blowing in the wind? The key distinction is whether B's behavior is best described as
maximizing a performance measure whose value depends on agent A's behavior. For example, in
chess, the opponent entity B is trying to maximize its performance measure, which, by the rules
of chess, minimizes agent A's performance measure. Thus, chess is a competitive multi agent
environment. In the taxi-driving environment, on the other hand, avoiding collisions maximizes
the performance measure of all agents, so it is a partially cooperative multi agent environment.
It is also partially competitive because, for example, only one car can occupy a parking space.
If there are a limited number of distinct, clearly defined percepts and actions, we say the
environment is discrete.
The agent programs we will see all have the same skeleton, they take the current percept as input
from the sensors and return an action to the actuators. Notice the difference between the agent
program, which takes the current percept as input, and the agent function, which takes the entire
percept history. The agent program takes just the current percept as input because nothing more
is available from the environment; if the agent's actions depend on the entire percept sequence,
the agent will have to remember the percepts.
In the remainder of this section, we outline four basic kinds of agent program that embody the
principles underlying almost all intelligent systems:
☺ Simple reflex agents;
☺ Model-based reflex agents;
☺ Goal-based agents; and
☺ Utility-based agents.
☺ Learning Agents
The simplest kind of agent is the simple reflex agent. These agents select actions on the
basis of the current percept, ignoring the rest of the percept history. It works by finding a rule
whose condition matches the current situation (as defined by the percept) and then doing the
action associated with that rule.
E.g. If the car in front brakes, and its brake lights come on, then the driver should notice this
and initiate braking,
• Some processing is done on the visual input to establish the condition. If "The car
in front is braking"; then this triggers some established connection in the agent
program to the action "initiate braking". We call such a connection a condition-
action rule written as: If car-in-front-is breaking then initiate-braking.
A more general and flexible approach for building an agent program is first to build a general-
purpose interpreter for condition-action rules and then to create rule sets for specific task
environments. Figure 2.4 gives the structure of this general program in schematic form, showing
how the condition-action rules allow the agent to make the connection from percept to action.
We use rectangles to denote the current internal state of the agent's decision process and ovals to
represent the background information used in the process.
Figure 2.5 A simple reflex agent. It acts according to a rule whose condition matches the current
state, as defined by the percept.
The INTERPRET-INPUT function generates an abstracted description of the current state from
the percept, and the RULE-MATCH function returns the first rule in the set of rules that matches
the given state description.
• An agent that uses a description of how the next state depends on current state and action
(model of the world) is called a model-based agent.
• It works by finding a rule whose condition matches the current situation (as defined by
the percept and the stored internal state)
• If the car is a recent model -- there is a centrally mounted brake light. With older
models, there is no centrally mounted, so what if the agent gets confused?
• The camera should detect two red lights at the edge of the vehicle go ON or OFF
simultaneously.
• The driver should look in the rear-view mirror to check on the location of near by
vehicles. In order to decide on lane-change the driver needs to know whether or not they
are there.
• The driver sees, and there is already stored information, and then does the action
associated with that rule.
Figure 6 gives the structure of the reflex agent with internal state, showing how the current
percept is combined with the old internal state to generate the updated description of the current
state. The agent program is shown in Figure 7. The interesting part is the function UPDATE-
STATE, which is responsible for creating the new internal state description.
• Choose actions that achieve the goal (an agent with explicit goals)
– For example, at a road junction, the taxi can turn left, right or go straight.
The right decision depends on where the taxi is trying to get to. As well as
a current state description, the agent needs some sort of goal information,
which describes situations that are desirable. E.g. being at the passenger's
destination.
• The agent may need to consider long sequences, twists and turns to find a way to achieve
a goal.
Figure 8 A goal-based agent. It keeps track of the world state as well as a set of goals it is trying
to achieve, and chooses an action that will (eventually) lead to the achievement of its goals.
4. UTILITY-BASED AGENTS
– For e.g., there are many action sequences that will get the taxi to its destination,
thereby achieving the goal. Some are quicker, safer, more reliable, or cheaper
than others. We need to consider Speed and safety
• When there are several goals that the agent can aim for, non of which can be achieved
with certainty. Utility provides a way in which the likelihood of success can be weighed
up against the importance of the goals.
• An agent that possesses an explicit utility function can make rational decisions.
A utility function maps a state, which describes the associated degree of happiness.
Figure 9 A utility-based agent. It uses a model of the world, along with a utility function that
measures its preferences among states of the world. Then it chooses the action that leads to the
best expected utility, where expected utility is computed by averaging over all possible outcome
states, weighted by the probability of the outcome.
5. LEARNING AGENTS
• How does an agent improve over time? By monitoring it’s performance and suggesting
better modeling, new action rules, etc. Learning has an advantage that it allows the agents
to initially operate in unknown environments .
critic gives feedback to learning element on how the agent is doing with respect
to fixed performance standard and determines how the performance element
should be modified to do better in the future.
problem generator It is responsible for suggesting actions that will lead to new
and informative experiences.
• E.g. automate taxi: using Performance element the taxi goes out on the road and drives.
The critic observes the shocking language used by other drivers. From this experience,
the learning element is able to formulate a rule saying this was a bad action, and the
performance element is modified by installing new rule. The problem generator might
identify certain areas in need of improvement, such as trying out the brakes on different
roads under different conditions
Figure 11 A general model of learning agents.