What Is Artificial Intelligence?: Data: Information
What Is Artificial Intelligence?: Data: Information
Intelligence: It is the knowledge in operation towards the solution – how to do? How to apply
the solution?
Artificial Intelligence: Artificial intelligence is the study of how make computers to do things
which people do better at the moment. It refers to the intelligence controlled by a computer
machine.
One View of AI is
About designing systems that are as intelligent as humans
Computers can be acquired with abilities nearly equal to human intelligence
How system arrives at a conclusion or reasoning behind selection of actions
How system acts and performs not so much on reasoning process.
The AI Problem
There are some of the problems contained within AI.
1. Game Playing and theorem proving share the property that people who do them well are
considered to be displaying intelligence.
2. Another important foray into AI is focused on Commonsense Reasoning. It includes
reasoning about physical objects and their relationships to each other, as well as
reasoning about actions and other consequences.
3. To investigate this sort of reasoning Nowell Shaw and Simon built the General Problem
Solver (GPS) which they applied to several common sense tasks as well as the problem
of performing symbolic manipulations of logical expressions. But no attempt was made to
create a program with a large amount of knowledge about a particular problem domain.
MODULE-1
4. The following are the figures showing some of the tasks that are the targets of work in AI:
First perceptual, linguistic and commonsense skills are learned. Later expert skills such as
engineering, medicine or finance are acquired.
This hypothesis is only a hypothesis there appears to be no way to prove or disprove it on logical
ground so, it must be subjected to empirical validation we find that it is false. We may find the
bulk of the evidence says that it is true but only way to determine its truth is by experimentation”
Computers provide the perfect medium for this experimentation since they can be programmed
to simulate physical symbol system we like. The importance of the physical symbol system
hypothesis is twofold. It is a significant theory of the nature of human intelligence and so is of
great interest to psychologists.
What is an AI Technique?
Artificial Intelligence problems span a very broad spectrum. They appear to have very little in
common except that they are hard. There are techniques that are appropriate for the solution of
a variety of these problems. The results of AI research tells that
Important AI Techniques:
MODULE-1
Search: Provides a way of solving problems for which no more direct approach is
available as well as a framework into which any direct techniques that are
available can be embedded.
Use of Knowledge: Provides a way of solving complex problems by exploiting the
structures of the objects that are involved.
Abstraction: Provides a way of separating important features and variations from
the many unimportant ones that would otherwise overwhelm any process.
The goal of machine is to fool the interrogator into believing that it is the person. If the machine
succeeds we conclude that the machine can think. The machine is allowed to do whatever it can
do to fool the interrogator.
For example, if asked the question “How much is 12,324 times 73,981?” The machine
could wait several minutes and then respond with wrong answer.
The interrogator receives two sets of responses, but does not know which set comes from
human and which from computer. After careful examination of responses, if interrogator cannot
definitely tell which set has come from the computer and which from human, then the computer
has passed the Turing Test. The more serious issue is the amount of knowledge that a machine
would need to pass the Turing test.
It was the ability of electronic machines to store large amounts of information and process it at
very high speeds that gave researchers the vision of building systems which could emulate
(imitate) some human abilities.
We will see the introduction of the systems which equal or exceed human abilities and see them
because an important part of most business and government operations as well as our daily
activities.
Definition of AI: Artificial Intelligence is a branch of computer science concerned with the study
and creation of computer systems that exhibit some form of intelligence such as systems that
learn new concepts and tasks, systems that can understand a natural language or perceive and
comprehend a visual scene, or systems that perform other types of feats that require human
types of intelligence.
AI is not the study and creation of conventional computer systems. The study of the mind, the
body, and the languages as customarily found in the fields of psychology, physiology, cognitive
science, or linguistics.
In AI, the goal is to develop working computer systems that are truly capable of performing tasks
that require high levels of intelligence.
MODULE-1
Problem:
A problem, which can be caused for different reasons, and, if solvable, can usually be
solved in a number of different ways, is defined in a number of different ways.
It is a structured method for solving an unstructured problem. This approach consists of number
of states. The starting of the problem is “Initial State” of the problem. The last point in the
problem is called a “Goal State” or “Final State” of the problem.
State space is a set of legal positions, starting at the initial state, using the set of rules to
move from one state to another and attempting to end up in a goal state.
Production System
The entire procedure for getting a solution for AI problem can be viewed as “Production
System”. It provides the desired goal. It is a basic building block which describes the AI problem
and also describes the method of searching the goal. Its main components are:
A Set of Rules, each consisting of a left side (a pattern) that determines the applicability
of the rule and right side that describes the operation to be performed if the rule is
applied.
Knowledge Base – It contains whatever information is appropriate for a particular task.
Some parts of the database may be permanent, while the parts of it may pertain only to
the solution of the current problem.
Control Strategy – It specifies the order in which the rules will be compared to the
database and the way of resolving the conflicts that arise when several rules match at
one.
o The first requirement of a goal control strategy is that it is cause motion; a control
strategy that does not cause motion will never lead to a solution.
o The second requirement of a good control strategy is that it should be systematic.
A rule applier: Production rule is like below if(condition) then consequence or
action
The answer for the first question can be considered with the following definitions of classes of
production systems:
A monotonic production system is a production system in which the applications of a rule never
prevents the later application of another rule that could also have been applied at the time the
first rule was selected.
A partially commutative production system is a production system with the property that if the
application of a particular sequence of rules transforms state X into state Y, then any
permutation of those rules that is allowable also transforms state X into state Y.
A commutative production system is a production system that is both monotonic and partially
commutative.
In a formal sense, there is no relationship between kinds of problems and kinds of production of
systems, since all problems can be solved by all kinds of systems. But in practical sense, there
definitely is such a relationship between kinds of problems and the kinds of systems that led
themselves naturally to describing those problems.
MODULE-1
The following figure shows the four categories of production systems produced by the two
dichotomies, monotonic versus non-monotonic and partially commutative versus non-partially
commutative along with some problems that can be naturally be solved by each type of system.
Monotonic Non-monotonic
Chemical
Not Partially commutative Bridge
Synthesis
The four categories of Production Systems
Partially commutative, monotonic production systems are useful for solving ignorable
problems that involves creating new things rather than changing old ones generally
ignorable. Theorem proving is one example of such a creative process partially
commutative, monotonic production system are important for a implementation stand
point because they can be implemented without the ability to backtrack to previous states
when it is discovered that an incorrect path has been followed.
Production systems that are not partially commutative are useful for many problems in
which changes occur. For example “Chemical Synthesis”
Non-partially commutative production system less likely to produce the same node many
times in the search process.
Problem Characteristics
In order to choose the most appropriate method (or a combination of methods) for a particular
problem, it is necessary to analyze the problem along several key dimensions:
• Is the problem decomposable?
• Can solution steps be ignored or undone?
• Is the universe predictable?
• Is a good solution absolute or relative?
• Is the solution a state or a path?
MODULE-1
We can solve this problem by breaking it down into these smaller problems, each of which we
can then solve by using a small collection of specific rules the following figure shows problem
tree that as it can be exploited by a simple recursive integration program that works as follows.
At each step it checks to see whether the problem it is working on is immediately solvable. If so,
then the answer is returned directly. If the problem is not easily solvable, the integrator checks to
see whether it can decompose the problem into smaller problems. It can create those problems
and calls itself recursively on using this technique of problem decomposition we can often solve
very large problem easily.
Now consider the 8-puzzle game. A sample game using the 8-puzzle is shown below:
MODULE-1
In attempting to solve the 8 puzzle, we might make a stupid move for example; we slide the tile 5
into an empty space. We actually want to slide the tile 6 into empty space but we can back track
and undo the first move, sliding tile 5 back to where it was then we can know tile 6 so mistake
and still recovered from but not quit as easy as in the theorem moving problem. An additional
step must be performed to undo each incorrect step.
Now consider the problem of playing chess. Suppose a chess playing problem makes a stupid
move and realize a couple of moves later. But here solutions steps cannot be undone.
The above three problems illustrate difference between three important classes of problems:
1) Ignorable: in which solution steps can be ignored.
Example: Theorem Proving
2) Recoverable: in which solution steps can be undone.
Example: 8-Puzzle
3) Irrecoverable: in which solution steps cannot be undone.
Example: Chess
The recoverability of a problem plays an important role in determining the complexity of the
control structure necessary for problem solution.
Ignorable problems can be solved using a simple control structure that never backtracks.
Recoverable problems can be solved by slightly complicated control strategy that does
sometimes make mistakes using backtracking. Irrecoverable problems can be solved by
recoverable style methods via planning that expands a great deal of effort making each decision
since the decision is final.
We can do fairly well since we have available accurate estimates of a probabilities of each of
the possible outcomes. A few examples of such problems are
Controlling a robot arm: The outcome is uncertain for a variety of reasons. Someone
might move something into the path of the arm. The gears of the arm might stick.
Helping a lawyer decide how to defend his client against a murder charge. Here we
probably cannot even list all the possible outcomes, which leads outcome to be
uncertain.
For certain-outcome problems, planning can used to generate a sequence of operators that
is guaranteed to lead to a solution.
For uncertain-outcome problems, a sequence of generated operators can only have a good
probability of leading to a solution.
Plan revision is made as the plan is carried out and the necessary feedback is provided.
Now consider the Travelling Salesman Problem. Our goal is to find the shortest path route that
visits each city exactly once
Suppose we find a path it may not be a solution to the problem. We also try all other paths. The
shortest path (best path) is called as a solution to the problem. These types of problems are
known as “Best path” problems. But path problems are computationally harder than any path
problems.
The above two problems illustrate the difference between the problems for which a lot of
knowledge is important only to constrain the search for a solution and those for which a lot of
knowledge is required even to be able to recognize a solution.
Problem Classification
When actual problems are examined from the point of view all of these questions it becomes
apparent that there are several broad classes into which the problem fall. The classes can be
each associated with a generic control strategy that is approached for solving the problem.
There is a variety of problem-solving methods, but there is no one single way of solving all
problems. Not all new problems should be considered as totally new. Solutions of similar
problems can be exploited.
PROBLEMS
Water-Jug Problem
Problem is “You are given two jugs, a 4-litre one and a 3-litre one. One neither has any
measuring markers on it. There is a pump that can be used to fill the jugs with water. How can
you get exactly 2 litres of water into 4-litre jug?”
Solution:
The state space for the problem can be described as a set of states, where each state
represents the number of gallons in each state. The game start with the initial state described as
a set of ordered pairs of integers:
• State: (x, y)
– x = number of lts in 4 lts jug
– y = number of lts in 3 lts jug
x = 0, 1, 2, 3, or 4 y = 0, 1, 2, 3
• Start state: (0, 0) i.e., 4-litre and 3-litre jugs is empty initially.
• Goal state: (2, n) for any n that is 4-litre jug has 2 litres of water and 3-litre jug has any
value from 0-3 since it is not specified.
• Attempting to end up in a goal state.
Production Rules: These rules are used as operators to solve the problem. They are
represented as rules whose left sides are used to describe new state that result from
approaching the rule.
The solution to the water-jug problem is:
Chess Problem
Problem of playing chess can be defined as a problem of moving around in a state space where
each state represents a legal position of the chess board.
The game start with an initial state described as an 8x8 of each position contains symbol
standing for the appropriate place in the official chess opening position. A set of rules is used to
move from one state to another and attempting to end up on one of a set of final states which is
described as any board position in which the opponent does not have a legal move as his/her
king is under attacks.
The state space representation is natural for chess. Since each state corresponds to a board
position i.e. artificial well organized.
Production Rules:
These rules are used to move around the state space. They can be described easily as a set of
rules consisting of two parts:
1. Left side serves as a pattern to be matching against the current board position.
2. Right side that serves decides the chess to be made to the board position to reflect the
move.
To describe these rules it is convenient to introduce a notation for pattern and substitutions
E.g.:
1. White pawn at square (file1,rank2)
Move pawn from square (file i, rank2) AND square (file i, rank2)
AND
Square (file i,rank3) is empty To square (file i,rank4)
AND
Square (file i,rank4) is empty
8-Puzzle Problem
The Problem is 8-Puzzle is a square tray in which 8 square tiles are placed. The remaining 9th
square is uncovered. Each tile has a number on it. A file that is adjacent to the blank space can
be slide into that space. The goal is to transform the starting position into the goal position by
sliding the tiles around.
Solution:
State Space: The state space for the problem can be written as a set of states where each state
is position of the tiles on the tray.
Initial State: Square tray having 3x3 cells and 8 tiles number on it that are shuffled
2 8 3
1 6 4
7 5
Goal State
1 2 3
8 4
7 6 5
Production Rules: These rules are used to move from initial state to goal state. These are also
defined as two parts left side pattern should match with current position and left side will be
resulting position after applying the rule.
Solution:
Solution:
State Space: The state space for this problem represents states in which the cities traversed by
salesman and state described as salesman starting at any city in the given list of cities. A set of
rules is applied such that the salesman will not traverse a city traversed once. These rules are
resulted to be states with the salesman will complex the round trip and return to his starting
position.
Initial State
Salesman starting at any arbitrary city in the given list of cities
Goal State
Visiting all cities once and only and reaching his starting state
Production rules:
These rules are used as operators to move from one state to another. Since there is a path
between any pair of cities in the city list, we write the production rules for this problem as
• Visited(city[i]) AND Not Visited(city[j])
– Traverse(city[i],city[j])
• Visited(city[i],city[j]) AND Not Visited(city[k])
– Traverse(city[j],city[k])
• Visited(city[j],city[i]) AND Not Visited(city[k])
– Traverse(city[i],city[k])
• Visited(city[i],city[j],city[k]) AND Not Visited(Nil)
– Traverse(city[k],city[i])
Initial State:
Full(T1) | Empty(T2) |
Empty(T3) Goal State:
Empty(T1) | Full(T2) | Empty
(T3)
Production Rules:
These are rules used to reach the Goal State. These rules use the following operations:
POP(x) Remove top element x from the stack and update top
PUSH(x,y) Push an element x into the stack and update top. [Push an element x on to
the y]
Now to solve the problem the production rules can be described as follows:
1. Top(T1)<Top(T2) PUSH(POP(T1),T2)
2. Top(T2)<Top(T1) PUSH(POP(T2),T1)
3. Top(T1)<Top(T3) PUSH(POP(T1),T3)
4. Top(T3)<Top(T1) PUSH(POP(T3),T1)
5. Top(T2)<Top(T3) PUSH(POP(T2),T3)
6. Top(T3)<Top(T2) PUSH(POP(T3),T2)
7. Empty(T1) PUSH(POP(T2),T1)
8. Empty(T1) PUSH(POP(T3),T1)
9. Empty(T2) PUSH(POP(T1),T3)
10. Empty(T3) PUSH(POP(T1),T3)
11. Empty(T2) PUSH(POP(T3),T2)
12. Empty(T3) PUSH(POP(T2),T3)
Solution: Example: 3 Disks, 3 Towers
1) T1 T2
2) T1 T3
3) T2 T3
4) T1 T2
5) T3 T1
6) T3 T2
7) T1 T2
Solution: The state space for this problem is a set of states representing the position of the
monkey, position of chair, position of the stick and two flags whether monkey on the chair &
whether monkey holds the stick so there is a 5-tuple representation.
(M, C, S, F1, F2)
– M: position of the monkey
– C: position of the chair
– S: position of the stick
– F1: 0 or 1 depends on the monkey on the chair or not
– F2: 0 or 1 depends on the monkey holding the stick or not
7) (C,C,C,0,1) (C,C,C,1,1)
• {monkey and stick at the chair position, monkey on the chair and holding stick} 8)
(S,C,S,0,1) (C,C,C,0,1)
Solution:
1) (M,C,S,0,0)
2) (C,C,S,0,0)
3) (G,G,S,0,0)
4) (S,G,S,0,0)
5) (G,G,G,0,0)
6) (G,G,G,0,1)
7) (G,G,G,1,1)
Solution:
The state space for the problem contains a set of states which represent the present number of
cannibals and missionaries on the either side of the bank of the river. (C,M,C1,M1,B)
– C and M are number of cannibals and missionaries on the starting bank
– C1 and M1 are number of cannibals and missionaries on the destination bank
– B is the position of the boat wither left bank (L) or right bank (R)
Production System: These are the operations used to move from one state to other state. Since
at any bank the number of cannibals must less than or equal to missionaries we can write two
production rules for this problem as follows:
• (C,M,C1,M1,L / C=3, M=3) (C-2,M,C1+2,M1,R)
• (C,M,C1,M1,L / C=3, M=3) (C-1,M-1,C1+1,M1+1,R)
• (C,M,C1,M1,L / C=3, M=3) (C-1,M,C1+1,M1,R)
• (C,M,C1,M1,R / C=1, M=3) (C+1,M,C1-1,M1,L)
• (C,M,C1,M1,R / C=0, M=3,C1=3,M1=0) (C+1,M,C1-1,M1,L)
1 3 2 0
2 3 1 0
0 3 3 0
1 3 2 0
1 1 2 2
2 2 1 1
2 0 1 3
3 0 0 3
1 0 2 3
2 0 1 3
0 0 3 3
Control Strategy
The question arises
"How to decide which rule to apply next during the process of searching for a
solution to a problem?"
Requirements of a good search strategy:
1. It causes motion. It must reduce the difference between current state and goal state.
Otherwise, it will never lead to a solution.
2. It is systematic. Otherwise, it may use more steps than necessary.
3. It is efficient. Find a good, but not necessarily the best, answer.
Algorithm:
1) If the initial state is the goal state, quit return success.
2) Otherwise, do the following until success or failure is signaled
a. Generate a successor E of the initial state, if there are no more successors,
signal failure
b. Call Depth-First Search with E as the initial state
c. If success is returned, signal success. Otherwise continue in this loop.
Another strategy is, begin generating complete paths, keeping track of the shorter path so far
and neglecting the paths where partial length is greater than the shortest found. This method is
better than the first but it is inadequate.
Heuristic Function:
– It is a function applied to a state in a search space to indicate a likelihood of
success if that state is selected
– It is a function that maps from problem state descriptions to measures of
desirability usually represented by numbers – Heuristic function is problem
specific.
The purpose of heuristic function is to guide the search process in the most profitable direction
by suggesting which path to follow first when more than one is available (best promising way).
We can find the TSM problem in less exponential items. On the average Heuristic improve the
quality of the paths that are explored. Following procedure is to solve TRS problem
– Select a Arbitrary City as a starting city
– To select the next city, look at all cities not yet visited, and select one closest to
the current city
– Repeat steps until all cities have been visited
Heuristic search methods which are the general purpose control strategies for controlling search
is often known as "weak methods" because of their generality and because they do not apply a
great deal of knowledge.
Weak Methods
a) Generate and Test
b) Hill Climbing
c) Best First Search
d) Problem Reduction
e) Constraint Satisfaction
f) Means-ends analysis
Generate and Test
The generate-and-test strategy is the simplest of all the approaches. It consists of the following
steps:
Algorithm:
1. Generate a possible solution. For some problems, this means generating a particular
point in the problem space. For others, it means generating a path from a start state.
2. Test to see if this is actually a solution by comparing the chosen point or the endpoint of
the chosen path to the set of acceptable goal states.
If there exists a solution for one problem then this strategy definitely finds the solution. Because
the complete solution must be generated before they can be tested. So, we can say that
Generate-and-test algorithm is a Depth-First Search procedure. It will take if the problem space
is very large. In the strategy we can operate by generating solution randomly instead of
systematically. Then we cannot give the surety that we will set the solution.
To implement this generate and test usually, we will use depth-first tree. If there are cycles then
we use graphs rather than a tree. This is not an efficient (mechanism) technique when the
problem is much harder. It is acceptable for simple problems. When it is combined with the
other techniques it will restrict the space.
For example, one of the most successful AI program is DENDRAL, which informs the structure
of organ i.e. components using mass spectrum and nuclear magnetic resonance data. It uses
the strategy called plan-generate-test, in which a planning process that uses constraint
satisfaction techniques, which creates lists of recommended structures. The generate-and-test
procedure then uses those lists so that it can explain only a limited set of structures, which is
proved highly effective.
Examples:
- Searching a ball in a bowl (Pick a green ball) - State
- Water Jug Problem – State and Path
Hill Climbing
A GENERATE and TEST procedure, if not only generates the alternative path but also the
direction of the path in the alternatives which be near, than all the paths in Generate and Test
procedures the heuristic function responds only yes or no but this heuristic function responds
only yes will generate an estimate of how close a given state is to a goal state.
The key difference between this algorithm and generate and test algorithm is the use of an
evaluation function as a way to inject task-specific knowledge into the control process.
Algorithm:
1. Evaluate the initial state. If it is also a goal state then return it and quit. Otherwise
continue with the initial state as the current state.
2. Loop until a solution is found or until a complete iteration produces no change to current
state:
a. Let SUCC be a state such that any possible successor of the current state will be
better than SUCC.
b. For each operator that applies to the current state do:
i. Apply the operator and generate a new state.
ii. Evaluate the new state. If it is a goal state, then return it and quit. If not
compare it to SUCC. If it is better, then set SUCC to this state. If it is not
better, leave SUCC alone.
c. IF the SUCC is better than current state, then set current state to SUCC.
Bothe basic and steepest-ascent hill climbing may fail to find a solution. Either algorithm may
terminate not by finding a goal state but by getting a state from which no better states can be
generated. This will happen if the program has reached a local maximum, a plateau or a ridge.
A ridge is a special kind of maximum. It is an area of the search space that is higher than
surrounding areas and that itself has a slope.
There are some ways of dealing with these problems, although these methods are by no means
guaranteed:
Backtrack to some earlier node and try going in a different direction. This is particularly
reasonable if at that node there was another direction that looked as promising or almost
as promising as the one that was chosen earlier. This is a fairly good way to deal with
local maxima.
Make a big jump in some direction to try to get to a new section of the search space.
This is a good way of dealing with plateaus.
Apply two or more rules before doing the test. This corresponds to moving in several
directions at once. This is a good strategy for dealing with ridges.
Simulated Annealing:
A variation of hill climbing in which, at the beginning of the process, some downhill moves may
be made.
In simulated annealing at the beginning of the process some hill moves may be made. The idea
is to do enough exploration of the whole space early on. So that the final solution in relatively
insensitive to the starting state. By doing so we can lower the chances of getting caught at local
maximum, plateau or a ridge.
In this we attempt to minimize rather than maximize the value of the objective function. Thus this
process is one of valley descending in which the object function is the energy level.
Physical Annealing
• Physical substances are melted and then gradually cooled until some solid state is
reached.
• The goal is to produce a minimal-energy state.
• Annealing schedule: if the temperature is lowered sufficiently slowly, then the goal will be
attained.
• Nevertheless, there is some probability for a transition to a higher energy state: e- .
E/kT
The probability that a transaction to a higher energy state will occur and so given by a function:
(i) If the new state is goal state then return it and quit
(ii) If it is not a goal state but is better than the current state then make it the
current state. Also set BEST-SO-FAR to this new state.
(iii) If it is not better than the current state, then make it the current state with
probability as defined above. This step is usually implemented by
invoking a random number generator to produce a number in the range
[0,1]. If that number is less than then the move is accepted. Otherwise
do nothing.
c. Revise T as necessary according to the annealing schedule.
5. Return BEST-SO-FAR as the answer.
Note:
For each step we check the probability of the successor with the current state. If it is greater
than the current state the move is accepted. Otherwise move is rejected and search in other
direction.
Best-First Search
Best-First Search (BFS) is a way of combining the advantages of both depth-first search and
breadth first search into a single method, i.e., is to follow a single path at a time but switch paths
whenever completing path looks more promising than the current one does.
The process is to select the most promising of the new nodes we have generated so far. We
then expand the chosen node by using the rules to generate its successors. If one of them is a
solution, then we can quit, else repeat the process until we search goal.
In BFS, one move is selected, but others are kept around so that they can be revisited later if
the selected path becomes less promising. This is not the case steepest ascent climbing.
OR Graphs
A graph is called OR graph, since each of its branches represents alternative problems solving
path.
To implement such a graph procedure, we will need to use lists of nodes:
1) OPEN: nodes that have been generated and have had the heuristic function applied to
them which have not yet been examined. It is a priority queue in which the elements
with highest priority are those with the most promising value of the heuristic function.
2) CLOSED: nodes that have already been examined whenever a new node is generated
we need to check whether it has been generated before.
3) A heuristic function f which will estimate the merits of each node we generate.
Algorithm:
1. Start with OPEN containing just the initial state
2. Until a goal is found or there are no nodes left on OPEN do:
a. Pick the best node on OPEN
b. Generate its successors
c. For each successor do:
i. If it is not been generated before, evaluate it, add it to OPEN and record
its parent.
ii. If it has been generated before, change the parent if this new path is
better than the previous one. In that case update the cost of getting to
this node and to any successors that this node may already have.
Step 1:
A NIL
Step 2:
A NIL
B A
C A
D A
Step 3:
A NIL
B A
C A
D A
E D
F D
Step 4:
Step 5:
The Element with the low cost is the first element. The new states are added according to the
cost value.
A* Algorithm:
A* algorithm is a best first graph search algorithm that finds a least cost path from a given
initial node to one goal node. The simplification of Best First Search is called A* algorithm.
This algorithm uses , functions as well as the lists OPEN and CLOSED.
For many applications, it is convenient to define function as the sum of two components
that we call g and h’.
• g:
– Measures of the cost of getting from the initial state to the current node.
– It is not the estimate; it is known to be exact sum of the costs.
• h’ :
– is an estimate of the additional cost of getting from current node to goal state.
Algorithm:
1) Start with OPEN containing only the initial state (node) set that node g value 0 its ’ value
to whatever it is and its ’ value ’+ 0 or ’. Set CLOSED to the empty list.
2) Until a goal node is found repeat the following procedure: If there are no nodes on OPEN,
report failure. Otherwise pick the node on OPEN with lowest ’ value. CALL it BESTNODE.
Remove from OPEN. Place it on CLOSED. If BESTNODE is the goal node, exit and
report a solution. Otherwise, generate the successors of BESTNODE. For each
successor, do the following
a) Set successors to point back to BESTNODE this backwards links will make
possible to recover the path once a solution is found.
b) Compute
c) If successor is already exist in OPEN call that node as OLD and we must decide
whether OLD’ s parent link should reset to point to BESTNODE (graphs exist in
this case)
If OLD is cheaper then we need do nothing. If successor is cheaper then reset
OLD’s parent link to point to BESTNODE. Record the new cheaper path in ( )
and update ’( ).
d) If SUCCESSOR was not on OPEN, see if it is on CLOSED. If so, call node on
CLOSED OLD and add OLD to the list of BESTNODE successors. Calculate all the
g, f’ and h’ values for successors of that node which is better then move that.
So to propagate the new cost downward, do a depth first traversal of the tree
starting at OLD, changing each nodes value (and thus also its ’ value),
terminating each branch when you reach either a node with no successor or
a node which an equivalent or better path has already been found.
e) If successor was not already on either OPEN or CLOSED, then put it on OPEN
and add it to the list of BESTNODE successors. Compute
A* algorithm is often used to search for the lowest cost path from the start to the goal location in
a graph of visibility/quad tree. The algorithm solves problems like 8-puzzle problem and
missionaries & Cannibals problem.
Problem Reduction:
• Planning how best to solve a problem that can be recursively decomposed into
subproblems in multiple ways.
• There can be more than one decompositions of the same problem. We have to decide
which is the best way to decompose the problem so that the total solution or cost of the
solution is good.
• Examples:
o Matrix Multiplication
o Towers of Hanoi
o Blocks World Problem
o Theorem Proving
• Formulations: (AND/OR Graphs)
o An OR node represents a choice between possible decompositions.
o An AND node represents a given decomposition.
The AND-OR graph (or tree) is useful for representing the solution of problems that can be
solved by decomposing them into a set of smaller problems, all of which must then be solved.
This decomposition or reduction generate arcs that we call AND arcs.
One AND arc may point to any number of successors nodes all of which must be solved in
order for the arc to point to a solution. Just as in OR graph, several arcs may emerge from a
single node, indicating a variety of ways in which the original problem might be solved.
In order to find solutions in an AND-OR graph, we need an algorithm similar to best-first search
but with the ability to handle the AND arcs appropriately.
To see why our Best-First search is not adequate for searching AND-OR graphs, consider Fig
(a).
– The top node A has been expanded, producing 2 arcs, one leading to B and one leading
to C and D. The numbers at each node represent the value of f' at that node.
– We assume for simplicity that every operation has a uniform cost, so each arc with a
single successor has a cost of 1 and each AND arc with multiple successors has a cost
of 1 for each of its components.
– If we look just at the nodes and choose for expansion the one with the lowest f' value, we
must select C. It would be better to explore the path going through B since to use C we
must also use D, for a total cost of 9 (C+D+2) compared to the cost of 6 that we get
through B.
– The choice of which node to expand next must depend not only on the f' value of that
node but also on whether that node is part of the current best path from the initial node.
In order to describe an algorithm for searching an AND-OR graph we need to exploit a value
that we call FUTILITY. If the estimated cost of a solution becomes greater than the value of
FUTILITY, then we abandon the search. FUTILITY should be chosen to correspond to a
threshold such any solution with a cost above it is too expensive to be practical even if it could
ever be found.
Algorithm:
1. Initialize the graph to the starting node.
2. Loop until the starting node is labeled SOLVED or until its cost goes above FUTILITY:
a. Traverse the graph, starting at the initial node following the current best path and
accumulate the set of nodes that are on that path and have not yet been
expanded or labeled solved.
b. Pick up one of those unexpanded nodes and expand it. If there are no
successors, assign FUTILITY as the value of this node. Otherwise add the
successors to the graph and each of this compute f’ (use only h’ and ignore g). If
f’ of any node is “0”, mark the node as SOLVED.
c. Change the f’ estimate of the newly expanded node to reflect the new information
provided by its successors. Propagate this change backward through the graph.
If any node contains a successor whose descendants are all solved, label the
node itself as SOLVED. At each node that is visible while going up the graph,
decide which of its successors arcs is the most promising and mark it as part of
the current best path. This may cause the current best path to change. The
propagation of revised cost estimates backup the tree was not necessary in the
best-first search algorithm because only unexpanded nodes were examined. But
now expanded nodes must be reexamined so that the best current path can be
selected. Thus it is important that their f’ values be the best estimates available.
At Step 1, A is the only node, so it is at the end of the current best path. It is expanded,
yielding nodes B, C and D. The arc to D is labeled as the most promising one emerging from
A, since it costs 6 compared to B and C, which costs 9.
In Step 2, node D is chosen for expansion. This process produces one new arc, the AND arc
to E and F, with a combined cost estimate of 10. So we update the f' value of D to 10.
We see that the AND arc B-C is better than the arc to D, so it is labeled as the current best
path. At Step 3, we traverse that arc from A and discover the unexpanded nodes B and C. If
we are going to find a solution along this path, we will have to expand both B and C
eventually. SO explore B first.
This generates two new arcs, the ones to G and to H. Propagating their f' values backward,
we update f' to B to 6. This requires updating the cost of AND arc B-C to 12 (6+4+2). Now
the arc to D is again the better path from A, so we record that as the current best path and
either node E or F will be chosen for the expansion at Step 4.
This process continues until either a solution is found or all paths have led to dead ends,
indicating that there is no solution.
Limitations
1. A longer path may be better In Fig
(a), the nodes were generated.
Now suppose that node J is
expanded at the next step and
that one of its successors is node E, producing the graph shown in Fig (b). The new path
to E is longer than the previous path to E going through C. Since the path through C will
only lead to a solution if there is also a solution to D, which there is not. The path through
J is better.
While solving any problem please don’t try to travel the nodes which are already labeled
as solved because while implementing it may be struck in loop.
2. Interactive Sub-goals
Another limitation of the algorithm fails to take into account any
interaction between sub-goals. Assume in figure that both node
C and node E ultimately lead to a solution; our algorithm will
report a complete solution that includes both of them. The
AND-OR graph states that for A to be solved, both C and D
must be solved. But the algorithm considers the solution of D as a completely separate
process from the solution of C.
While moving to the goal state, keep track of all the sub-goals we try to move which one
is giving an optimal cost.
AO* Algorithm:
AO* Algorithm is a generalized algorithm, which will always find minimum cost solution. It is
used for solving cyclic AND-OR graphs The AO* will use a single structure GRAPH representing
the part of the search graph that has been explicitly generated so far. Each node in the graph
will point both down to its immediate successors and up to immediate predecessors. The top
down traversing of the best-known path which guarantees that only nodes that are on the best
path will ever be considered for expansion. So h’ will serve as the estimate of goodness of a
node.
Algorithm (1):
1) Initialize: Set G* = {s}, f(s) = h(s).
If , label s as SOLVED, where T is terminal node.
3) Select: Select a non-terminal leaf node n from the marked sub tree
4) Expand: Make explicit the successors of n.
For each new successor, m: Set f(m) = h(m)
If m is Terminal, label m as SOLVED.
Means-Ends Analysis:
One general-purpose technique used in AI is means-end analysis, a step-by-step, or
incremental, reduction of the difference between the current state and the final goal. The
program selects actions from a list of means—in the case of a simple robot this might consist of
PICKUP, PUTDOWN, MOVEFORWARD, MOVEBACK, MOVELEFT, and MOVERIGHT—until
the goal is reached. This means we could solve major parts of a problem first and then return to
smaller problems when assembling the final solution.
Usually, we search strategies that can reason either forward or backward. Often, however a
mixture of the two directions is appropriate. Such mixed strategy would make it possible to solve
the major parts of problem first and solve the smaller problems arise when combining them
together. Such a technique is called "Means - Ends Analysis".
This process centers on the detection of difference between the current state and goal state.
After the difference had been found, we should find an operator which reduces the difference.
But this operator cannot be applicable to the current state. Then we have to set up a sub-
problem of getting to the state in which it can be applied if the operator does not produce the
goal state which we want. Then we should set up a sub-program of getting from state it does
produce the goal. If the chosen inference is correct, the operator is effective, then the two sub-
problems should be easier to solve than the original problem.
The means-ends analysis process can be applied recursively to them. In order to focus system
attention on the big problems first, the difference can be assigned priority levels, in which high
priority can be considered before lower priority.
Like the other problems, it also relies on a set of rules rather than can transform one state to
another these rules are not represented with complete state description. The rules are
represented as a left side that describes the conditions that must be met for the rule applicable
and right side which describe those aspects of the problem state that will be changed by the
application of the rule.
Consider the simple HOLD ROBOT DOMAIN. The available operators are as follows:
PUSH has 4-preconditions. Two of which produce difference between start and goal states
since the desks is already large. One precondition creates no difference. The ROBOT can be
brought to the location by using WALK, the surface can be cleared by two uses of pickup but
after one pickup the second results in another difference – the arm must be empty. PUTDOWN
can be used to reduce the difference.
One PUSH is performed; the problem state is close to the goal state, but not quite. The objects
must be placed back on the desk. PLACE will put them there. But it cannot be applied
immediately. Another difference must be eliminated, since the robot is holding the objects. Then
we will find the progress as shown above. The final difference between C and E can be reduced
by using WALK to get the ROBOT back to the objects followed by PICKUP and CARRY.
Algorithm:
1. Until the goal is reached or no more procedures are available:
– Describe the current state, the goal state and the differences between the two.
– Use the difference the describe a procedure that will hopefully get nearer to goal.
– Use the procedure and update current state.
2. If goal is reached then success otherwise fail.
Algorithm:
Constraint Satisfaction
• Search procedure operates in a space of constraint sets. Initial state contains the original
constraints given in the problem description.
• A goal state is any state that has been constrained enough – Cryptarithmetic: “enough”
means that each letter has been assigned a unique numeric value.
• Constraint satisfaction is a 2-step process:
o Constraints are discovered and propagated as far as possible.
o If there is still not a solution, then search begins. A guess about is made and
added as a new constraint.
• To apply the constraint satisfaction in a particular problem domain requires the use of 2
kinds of rules:
o Rules that define valid constraint propagation
o Rules that suggest guesses when necessary
Goal State:
We have to assign unique digit for the above specified alphabets.
FREQUENTLY ASKED QUESTIONS
1) Define Intelligence, Artificial Intelligence.
2) List four things to build a system to solve a problem.
3) What is Production System?
4) Explain water Jug problem as a state space search.
5) Explain production system characteristics.
6) Explain A* algorithm with example.
7) What is Means-Ends Analysis? Explain with an example.
8) What do you mean by heuristic?
9) Write a heuristic function for travelling salesman problem.
10) What is heuristic search?
11) Explain problem characteristics.
12) Write AO* algorithm and explain the steps in it.
13) What is constraint satisfaction problem? Explain it.
14) Explain annealing schedule.
15) Explain Breadth-first search and depth-first search. List down the advantages and
disadvantages of both?
16) What do you mean by an AI technique?
17) Discuss the tic-tac-toe problem in detail and explain how it can be solved using AI
techniques.
18) What are the advantages of Heuristic Search?
19) Explain Turing Test as Criteria for success.
20) Explain Hill Climbing and give its disadvantages.
21) Define Control Strategy and requirements for good search strategy.
22) Define State Space Search. Write algorithm for state space.
MODULE-2
5. Predicate Logic
Introduction
Predicate logic is used to represent Knowledge. Predicate logic will be met in Knowledge
Representation Schemes and reasoning methods. There are other ways but this form is popular.
Propositional Logic
It is simple to deal with and decision procedure for it exists. We can represent real-world facts as
logical propositions written as well-formed formulas.
To explore the use of predicate logic as a way of representing knowledge by looking at a specific
example.
The above two statements becomes totally separate assertion, we would not be able to draw any
conclusions about similarities between Socrates and Plato.
These representations reflect the structure of the knowledge itself. These use predicates applied to
arguments.
It fails to capture the relationship between any individual being a man and that individual being a
mortal.
We need variables and quantification unless we are willing to write separate statements.
Predicate:
A Predicate is a truth assignment given for a particular statement which is either true or false. To solve
common sense problems by computer system, we use predicate logic.
Predicate Logic
• Terms represent specific objects in the world and can be constants, variables or functions.
• Predicate Symbols refer to a particular relation among objects.
• Sentences represent facts, and are made of terms, quantifiers and predicate symbols.
• Functions allow us to refer to objects indirectly (via some relationship).
• Quantifiers and variables allow us to refer to a collection of objects without explicitly naming
each object.
• Some Examples
o Predicates: Brother, Sister, Mother , Father
o Objects: Bill, Hillary, Chelsea, Roger
o Facts expressed as atomic sentences a.k.a.
literals:
o Father(Bill,Chelsea)
o Mother(Hillary,Chelsea)
o Brother(Bill,Roger)
o Father(Bill,Chelsea)
Nested Quantification
MODULE-2
Functions
• Functions are terms - they refer to a specific object.
• We can use functions to symbolically refer to objects without naming them.
• Examples:
fatherof(x) age(x) times(x,y) succ(x)
• Using functions
If we use logical statements as a way of representing knowledge, then we have available a good
way of reasoning with that knowledge.
Representing facts with Predicate Logic
1) Marcus was a man man(Markus)
2) Marcus was a Pompeian pompeian(Markus)
Resolution:
A procedure to prove a statement, Resolution attempts to show that Negation of Statement gives
Contradiction with known statements. It simplifies proof procedure by first converting the
statements into canonical form. Simple iterative process; at each step, 2 clauses called the
parent clauses are compared, yielding a new clause that has been inferred from them.
Resolution refutation:
• Convert all sentences to CNF (conjunctive normal
form)
• Negate the desired conclusion (converted to CNF)
Apply resolution rule until either
– Derive false (a contradiction)
– Can’t apply any more
Resolution refutation is sound and complete
• If we derive a contradiction, then the conclusion follows from the axioms
• If we can’t apply any more, then the conclusion cannot be proved from the axioms.
Sometimes from the collection of the statements we have, we want to know the answer of this
question - "Is it possible to prove some other statements from what we actually know?" In order
to prove this we need to make some inferences and those other statements can be shown true
using Refutation proof method i.e. proof by contradiction using Resolution. So for the asked goal
we will negate the goal and will add it to the given statements to prove the contradiction.
So resolution refutation for propositional logic is a complete proof procedure. So if the thing that
you're trying to prove is, in fact, entailed by the things that you've assumed, then you can prove it
using resolution refutation.
Clauses:
Resolution can be applied to certain class of wff called clauses.
A clause is defined as a wff consisting of disjunction of literals.
All of the following formulas in the variables A, B, C, D, and E are in conjunctive normal form:
MODULE-2
Clause Form:
Algorithm:
1. Eliminate implies relation ( ) Using (Ex: → => )
4. Move all quantifiers to the left of the formulas without changing their relative order.
5. Eliminate existential quantifiers. We can eliminate the quantifier by substituting for the variable
a reference to a function that produces the desired value. y: President(y) => President(S1)
In general the function must have the same number of arguments as the number of universal
quantifiers in the current scope.
Skolemize to remove existential quantifiers. This step replaces existentially quantified
variables by Skolem functions. For example, convert ( x)P(x) to P(c) where c is a brand
MODULE-2
new constant symbol that is not used in any other sentence (c is called a Skolem
constant). More generally, if the existential quantifier is within the scope of a universal
quantified variable, then introduce a Skolem function that depends on the universally
quantified variable. For example, " x y P(x,y) is converted to " x P(x, f(x)). f is called a
Skolem function, and must be a brand new function name that does not occur in any
other part of the logic sentence.
6. Drop the prefix. At this point, all remaining variables are universally quantified.
8. Create a separate clause corresponding to each conjunct in order for a well formed formula to
be true, all the clauses that are generated from it must be true.
9. Standardize apart the variables in set of clauses generated in step 8. Rename the variables.
So that no two clauses make reference to same variable.
Basis of Resolution:
Resolution process is applied to pair of parent clauses to produce a derived clause. Resolution
procedure operates by taking 2 clauses that each contain the same literal. The literal must occur
in the positive form in one clause and negative form in the other. The resolvent is obtained by
combining all of the literals of two parent clauses except ones that cancel. If the clause that is
produced in an empty clause, then a contradiction has been found.
Eg: winter and winter will produce the empty clause.
Unification Algorithm
• In propositional logic it is easy to determine that two literals cannot both be true at the same
time.
• Simply look for L and ~L . In predicate logic, this matching process is more complicated, since
bindings of variables must be considered.
• In order to determine contradictions we need a matching procedure that compares two literals
and discovers whether there exist a set of substitutions that makes them identical.
• There is a recursive procedure that does this matching. It is called Unification algorithm.
• The process of finding a substitution for predicate parameters is called unification.
• We need to know:
– that 2 literals can be matched.
– the substitution is that makes the literals identical.
• There is a simple algorithm called the unification algorithm that does this.
• When attempting to match 2 literals, all substitutions must be made to the entire literal.
MODULE-2
• There may be many substitutions that unify 2 literals; the most general unifier is always
desired.
Unification Example:
The object of the Unification procedure is to discover at least one substitution that causes two
literals to match. Usually, if there is one such substitution there are many
In
Unification algorithm each literal is represented as a list, where first element is the name of a
predicate and the remaining elements are arguments. The argument may be a single element
(atom) or may be another list.
The unification algorithm recursively matches pairs of elements, one pair at a time. The matching
rules are:
• Different constants, functions or predicates cannot match, whereas identical ones can.
• A variable can match another variable, any constant or a function or predicate
expression, subject to the condition that the function or [predicate expression must not
contain any instance of the variable being matched (otherwise it will lead to infinite
recursion).
MODULE-2
• The substitution must be consistent. Substituting y for x now and then z for x later is
inconsistent. (a substitution y for x written as y/x)
Example:
Suppose we want to unify p(X,Y,Y) with p(a,Z,b).
Initially E is {p(X,Y,Y)=p(a,Z,b)}.
The first time through the while loop, E becomes {X=a,Y=Z,Y=b}.
Suppose X=a is selected next.
Then S becomes{X/a} and E becomes {Y=Z,Y=b}.
Suppose Y=Z is selected.
Then Y is replaced by Z in S and E.
S becomes{X/a,Y/Z} and E becomes {Z=b}.
Finally Z=b is selected, Z is replaced by b, S becomes {X/a,Y/b,Z/b}, and
E becomes empty.
The substitution {X/a,Y/b,Z/b} is returned as an MGU.
Unification:
MODULE-2
Example:
(a) Convert all the above statements into predicate
John likes all kinds of food.
logic
Apples are food.
(b) Show that John likes peanuts using back chaining
Chicken is food.
(c) Convert the statements into clause form
Anything anyone eats and it is not killed is (d) Using Resolution show that “John likes peanuts” food.
Bill eats peanuts and is still alive.
Swe eats everything bill eats
Answer:
(a) Predicate Logic:
Answering Questions
We can also use the proof procedure to answer questions such as “who tried to assassinate
Caesar” by proving:
– Tryassassinate(y,Caesar).
MODULE-2
– Once the proof is complete we need to find out what was substitution was made for
y.
We show how resolution can be used to answer fill-in-the-blank questions, such as "When did
Marcus die?" or "Who tried to assassinate a ruler?” Answering these questions involves finding a
known statement that matches the terms given in the question and then responding with another
piece of the same statement that fills the slot demanded by the question.
From Clause Form to Horn Clauses
The operation is to convert Clause form to Horn Clauses. This operation is not always possible.
Horn clauses are clauses in normal form that have one or zero positive literals. The conversion
from a clause in normal form with one or zero positive literals to a Horn clause is done by using
the implication property.
Example:
MODULE-2
Introduction:
Knowledge plays an important role in AI systems. The kinds of knowledge might need to be
represented in AI systems:
Objects: Facts about objects in our world domain. e.g. Guitars have strings, trumpets are brass
instruments.
Events: Actions that occur in our world. e.g. Steve Vai played the guitar in Frank Zappa's
Band.
Performance: A behavior like playing the guitar involves knowledge about how to do things.
Meta-knowledge: Knowledge about what we know. e.g. Bobrow's Robot who plan's a trip. It
knows that it can read street signs along the way to find out where it is.
A variety of ways of representing knowledge have been exploited in AI problems. In this regard we
deal with two different kinds of entities:
Facts: truths about the real world and these are the things we want to represent.
Representation of the facts in some chosen formalism. These are the things which we will
actually be able to manipulate.
The model in the above figure focuses on facts, representations and on the 2-way mappings that
must exist between them. These links are called Representation Mappings.
- Forward Representation mappings maps from Facts to Representations.
- Backward Representation mappings maps from Representations to Facts.
English or natural language is an obvious way of representing and handling facts. Regardless of
representation for facts, we use in program, we need to be concerned with English
MODULE-2
Representation of those facts in order to facilitate getting information into or out of the system.
Then, using the deductive mechanisms of logic, we may generate the new
representation object: astail(Spot)
Using an appropriate backward mapping function the English sentence “Spot has a
tail” can be generated.
Fact-Representation mapping may not be one-to-one but rather are many-to-many which are a
characteristic of English Representation. Good Representation can make a reasoning program
simple.
Example:
“All dogs have tails”
“Every dog has a tail”
From the two statements we can conclude that “Each dog has a tail.” From the
statement 1, we conclude that “Each dog has more than one tail.”
When we try to convert English sentence into some other represent such as logical propositions, we
first decode what facts the sentences represent and then convert those facts into the new
representations. When an AI program manipulates the internal representation of facts these new
representations should also be interpretable as new representations of facts.
The first representation does not directly suggest the answer to the problem. The second may
suggest. The third representation does, when combined with the single additional facts that each
domino must cover exactly one white square and one black square.
The puzzle is impossible to complete. A domino placed on the chessboard will always cover one
white square and one black square. Therefore a collection of dominoes placed on the board will
cover an equal numbers of squares of each color. If the two white corners are removed from the
board then 30 white squares and 32 black squares remain to be covered by dominoes, so this is
impossible. If the two black corners are removed instead, then 32 white squares and 30 black
squares remain, so it is again impossible.
In the above figure, the dotted line across the top represents the abstract reasoning process that
a program is intended to model. The solid line across the bottom represents the concrete
reasoning process that a particular program performs. This program successfully models the
abstract process to the extent that, when the backward representation mapping is applied to the
program’s output, the appropriate final facts are actually generated.
If no good mapping can be defined for a problem, then no matter how good the program to solve
the problem is, it will not be able to produce answers that correspond to real answers to the
problem.
MODULE-2
Using Knowledge
Let us consider to what applications and how knowledge may be used.
Learning: acquiring knowledge. This is more than simply adding new facts to a knowledge
base. New data may have to be classified prior to storage for easy retrieval, etc.. Interaction
and inference with existing facts to avoid redundancy and replication in the knowledge and
also so that facts can be updated.
Retrieval: The representation scheme used can have a critical effect on the efficiency of the
method. Humans are very good at it. Many AI methods have tried to model human.
Reasoning: Infer facts from existing data.
However a question like “Can Miles Davis play his instrument well?” requires reasoning. The
above are all related. For example, it is fairly obvious that learning and reasoning involve
retrieval etc.
• Inferential Efficiency: the ability to incorporate into the knowledge structure additional
information that can be used to focus the attention of the inference mechanisms in
the most promising directions.
• Acquisitional Efficiency: the ability to acquire new information easily. The simplest
case involves direct insertion, by a person of new knowledge into the database.
Ideally, the program itself would be able to control knowledge acquisition.
No single system that optimizes all of the capabilities for all kinds of knowledge has yet been found.
As a result, multiple techniques for knowledge representation exist.
Declarative Knowledge
– a statement in which knowledge is specified, but the use to which that knowledge is to
be put is not given.
– Example: laws, people’s name; there are facts which can stand alone, not dependent
on other knowledge
Procedural Knowledge
– a representation in which the control information, to use the knowledge is embedded in
the knowledge itself.
– Example: computer programs, directions and recipes; these indicate specific use or
implementation
Given the facts it is not possible to answer simple question such as "Who is the heaviest
player?" but if a procedure for finding heaviest player is provided, then these facts will enable
that procedure to compute an answer. We can ask things like who "bats - left" and "throws -
right".
Inheritable Knowledge
Here the knowledge elements inherit attributes from their parents. The knowledge is embodied in
the design hierarchies found in the functional, physical and process domains. Within the
hierarchy, elements inherit attributes from their parents, but in many cases not all attributes of
the parent elements be prescribed to the child elements.
The inheritance is a powerful form of inference, but not adequate. The basic KR needs to be
augmented with inference mechanism.
Property inheritance: The objects or elements of specific classes inherit attributes and values from
more general classes. The classes are organized in a generalized hierarchy.
Baseball Knowledge
- isa: show class inclusion
- instance: show class membership
• The directed arrows represent attributes (isa, instance, team) originates at object being described
and terminates at object or its value.
• The box nodes represent objects and values of the attributes.
Isa: Adult-Male
Bats: EQUAL handed
Height: 6-1
Batting-average: 0.252
This algorithm is simple. It describes the basic mechanism of inheritance. It does not say what to do
if there is more than one value of the instance or “isa” attribute.
This can be applied to the example of knowledge base, to derive answers to the following queries:
team (Pee-Wee-Reese) = Brooklyn-Dodger
batting-average (Three-Finger-Brown) = 0.106
height (Pee-Wee-Reese) = 6.1
bats (Three-Finger-Brown) = right
Inferential Knowledge:
This knowledge generates new information from the given information. This new information
does not require further data gathering from source, but does require analysis of the given
information to generate new knowledge. In this, we represent knowledge as formal logic.
Example:
- given a set of relations and values, one may infer other values or relations
- a predicate logic (a mathematical deduction) is used to infer from a set of attributes.
- inference through predicate logic uses a set of logical operations to relate individual data.
- the symbols used for the logic operations are:
MODULE-2
Procedural Knowledge
Procedural knowledge can be represented in programs in many ways. The most common way is
simply as for doing something. The machine uses the knowledge when it executes the code to
perform a task. Procedural Knowledge is the knowledge encoded in some procedures.
Unfortunately, this way of representing procedural knowledge gets low scores with respect to the
properties of inferential adequacy (because it is very difficult to write a program that can reason
about another program’s behavior) and acquisitional efficiency (because the process of updating
and debugging large pieces of code becomes unwieldy).
The most commonly used technique for representing procedural knowledge in AI programs is the
use of production rules.
Production rules, particularly ones that are augmented with information on how they are to be
used, are more procedural than are the other representation methods. But making a clean
distinction between declarative and procedural knowledge is difficult. The important difference is
in how the knowledge is used by the procedures that manipulate it.
The attributes are called a variety of things in AI systems, but the names do not matter. What
does matter is that they represent class membership and class inclusion and that class inclusion is
transitive. The predicates are used in Logic Based Systems.
The second way can be realized using semantic net and frame based systems. This Inverses is
used in Knowledge Acquisition Tools.
This also provides information about constraints on the values that the attribute can have and
mechanisms for computing those values.
MODULE-2
Introduce an explicit notation for temporal interval. If two different values are ever asserted for
the same temporal interval, signal a contradiction automatically.
Assume that the only temporal interval that is of interest is now. So if a new value is asserted,
replace the old value.
Provide no explicit support. Logic-based systems are in this category. But in these systems,
knowledge base builders can add axioms that state that if an attribute has one value then it is
known not to have all other values.
MODULE-2
Choosing the Granularity of Representation Primitives are fundamental concepts such as holding,
seeing, playing and as English is a very rich language with over half a million words it is clear we will
find difficulty in deciding upon which words to choose as our primitives in a series of situations.
Separate levels of understanding require different levels of primitives and these need many rules to
link together similar primitives.
MODULE-2
MODULE-2
The declarative representation is one in which the knowledge is specified but how to use to which that
knowledge is to be put is not given.
• Declarative knowledge answers the question 'What do you know?'
• It is your understanding of things, ideas, or concepts.
• In other words, declarative knowledge can be thought of as the who, what, when, and
where of information.
• Declarative knowledge is normally discussed using nouns, like the names of people,
places, or things or dates that events occurred.
The procedural representation is one in which the control information i.e., necessary to use the
knowledge is considered to be embedded in the knowledge itself.
• Procedural knowledge answers the question 'What can you do?'
• While declarative knowledge is demonstrated using nouns,
• Procedural knowledge relies on action words, or verbs.
• It is a person's ability to carry out actions to complete a task.
The real difference between declarative and procedural views of knowledge lies in which the control
information presides.
Example:
In both the cases, the control strategy is it must cause motion and systematic. The production
system model of the search process provides an easy way of viewing forward and backward
reasoning as symmetric processes.
Consider the problem of solving a particular instance of the 8-puzzle problem. The rules to be
used for solving the puzzle can be written as:
Generate the next level of the tree by finding all the rules whose right sides match the root
node. These are all the rules that, if only we could apply them, would generate the state we
want. Use the left sides of the rules to generate the nodes at this second level of the tree.
Generate the next level of the tree by taking each node at the previous level and finding all
the rules whose right sides match it. Then use the corresponding left sides to generate the
new nodes.
Continue until a node that matches the initial state is generated.
This method of reasoning backward from the desired final state is often called goaldirected
reasoning.
To reason forward, the left sides (preconditions) are matched against the current state and the
right sides (results) are used to generate new nodes until the goal is reached. To reason
backward, the right sides are matched against the current node and the left sides are used to
generate new nodes representing new goal states to be achieved.
Whether it is possible to use the same rules for both forward and backward reasoning also
depends on the form of the rules themselves. If both left sides and right sides contain pure
assertions, then forward chaining can match assertions on the left side of a rule and add to the
state description the assertions on the right side. But if arbitrary procedures are allowed as the
right sides of rules then the rules will not be reversible.
Logic Programming
Logic Programming is a programming language paradigm in which logical assertions are
viewed as programs.
There are several logic programming systems in use today, the most popular of which is
PROLOG.
A PROLOG program is described as a series of logical assertions, each of which is a Horn
clause.
A Horn clause is a clause that has at most one positive literal. Thus p, p q, p q are all
Horn clauses.
MODULE-2
Example:
The first two of these differences arise naturally from the fact that PROLOG programs are actually
sets of Horn Clauses that have been transformed as follows:
1. If the Horn Clause contains no negative literals (i.e., it contains a single literal which is
positive), then leave it as it is.
2. Otherwise, return the Horn clause as an implication, combining all of the negative literals
into the antecedent of the implication and leaving the single positive literal (if there is one)
as the consequent.
This procedure causes a clause, which originally consisted of a disjunction of literals (all but one
of which were negative), to be transformed to single implication whose antecedent is a
conjunction of (what are now positive) literals.
MODULE-2
Matching
We described the process of using search to solve problems as the application of appropriate
rules to individual problem states to generate new states to which the rules can then be applied
and so forth until a solution is found.
How we extract from the entire collection of rules those that can be applied at a given point? To
do so requires some kind of matching between the current state and the preconditions of the
rules. How should this be done? The answer to this question can be critical to the success of a
rule based system.
MODULE-2
A more complex matching is required when the preconditions of rule specify required properties
that are not stated explicitly in the description of the current state. In this case, a separate set of
rules must be used to describe how some properties can be inferred from others. An even more
complex matching process is required if rules should be applied and if their pre condition
approximately match the current situation. This is often the case in situations involving physical
descriptions of the world.
Indexing
One way to select applicable rules is to do a simple search though all the rules comparing each
one’s precondition to the current state and extracting all the one’s that match. There are two
problems with this simple solution:
i. The large number of rules will be necessary and scanning through all of them at every step
would be inefficient.
ii. It’s not always obvious whether a rule’s preconditions are satisfied by a particular state.
Solution: Instead of searching through rules use the current state as an index into the rules and
select the matching one’s immediately.
Matching process is easy but at the price of complete lack of generality in the statement of the
rules. Despite some limitations of this approach, Indexing in some form is very important in the
efficient operation of rule based systems.
Backward-chaining systems usually use depth-first backtracking to select individual rules, but
forward-chaining systems generally employ sophisticated conflict resolution strategies to choose
among the applicable rules.
While it is possible to apply unification repeatedly over the cross product of preconditions and state
description elements, it is more efficient to consider the many-many match problem, in which many
MODULE-2
rules are matched against many elements in the state description simultaneously. One efficient
many-many match algorithm is RETE.
INFERENCE ENGINE
The above cycle is repeated until no rules are put in the conflict set or until stopping condition is
reached. In order to verify several conditions, it is a time consuming process. To eliminate the
need to perform thousands of matches of cycles on effective matching algorithm is called RETE.
RETE Algorithm is many-match algorithm (In which many rules are matched against many
elements). RETE uses forward chaining systems which generally employee sophisticated conflict
resolution strategies to choose among applicable rules. RETE gains efficiency from 3 major
sources.
1. RETE maintains a network of rule condition and it uses changes in the state
description to determine which new rules might apply. Full matching is only
pursued for candidates that could be affected by incoming/outgoing data.
2. Structural Similarity in rules: RETE stores the rules so that they share structures
in memory, set of conditions that appear in several rules are matched once for
cycle.
3. Persistence of variable binding consistency. While all the individual preconditions
of the rule might be met, there may be variable binding conflicts that prevent the
rule from firing.
MODULE-2
can be minimized. RETE remembers its previous calculations and is able to merge
new binding information efficiently.
Approximate Matching:
Rules should be applied if their preconditions approximately match to the current situation
Eg: Speech understanding program
Rules: A description of a physical waveform to phones
Physical Signal: difference in the way individuals speak, result of background noise.
Conflict Resolution:
When several rules matched at once such a situation is called conflict resolution. There are 3
approaches to the problem of conflict resolution in production system.
1. Preference based on rule match:
a. Physical order of rules in which they are presented to the system
b. Priority is given to rules in the order in which they appear
We have described techniques for reasoning with a complete, consistent and unchanging model
of the world. But in many problem domains, it is not possible to create such models. So here we
are going to explore techniques for solving problems with incomplete and uncertain models.
What is reasoning?
When we require any knowledge system to do something it has not been explicitly told
how to do it must reason.
The system must figure out what it needs to know from what it already knows.
◦ Reasoning is the act of deriving a conclusion from certain premises using a given
methodology. (Process of thinking/ Drawing inference)
Uncertain Reasoning
Unfortunately the world is an uncertain place.
Any AI system that seeks to model and reasoning in such a world must be able to deal
with this.
In particular it must be able to deal with:
◦ Incompleteness – compensate for lack of knowledge.
◦ Inconsistencies – resolve ambiguities and contradictions.
◦ Change – it must be able to update its world knowledge base over time.
Clearly in order to deal with this some decision that a made are more likely to be true (or
false) than others and we must introduce methods that can cope with this uncertainty.
Monotonic Reasoning
Predicate logic and the inferences we perform on it is an example of monotonic reasoning. In
monotonic reasoning if we enlarge at set of axioms we cannot retract any existing assertions or
axioms.
A monotonic logic cannot handle
Reasoning by default
◦ Because consequences may be derived only because of lack of evidence of the
contrary
Abductive Reasoning
◦ Because consequences are only deduced as most likely explanations.
Belief Revision
◦ Because new knowledge may contradict old beliefs.
Non-Monotonic Reasoning
Non monotonic reasoning is one in which the axioms and/or the rules of inference are
extended to make it possible to reason with incomplete information. These systems
preserve, however, the property that, at any given moment, a statement is either
believed to be true,
believed to be false, or
not believed to be either.
Statistical Reasoning: in which the representation is extended to allow some kind of
numeric measure of certainty (rather than true or false) to be associated with each
statement.
In a system doing non-monotonic reasoning the set of conclusions may either grow or
shrink when new information is obtained.
Non-monotonic logics are used to formalize plausible reasoning, such as the following
inference step:
Birds typically fly.
Tweety is a bird.
--------------------------
Tweety (presumably) flies.
Such reasoning is characteristic of commonsense reasoning, where default rules are
applied when case-specific information is not available. The conclusion of non-monotonic
argument may turn out to be wrong. For example, if Tweety is a penguin, it is
incorrect to conclude that Tweety flies.
Non-monotonic reasoning often requires jumping to a conclusion and subsequently
retracting that conclusion as further information becomes available.
All systems of non-monotonic reasoning are concerned with the issue of consistency.
Inconsistency is resolved by removing the relevant conclusion(s) derived previously by
default rules.
Simply speaking, the truth value of propositions in a nonmonotonic logic can be classified
into the following types:
o facts that are definitely true, such as "Tweety is a bird"
o default rules that are normally true, such as "Birds fly"
o tentative conclusions that are presumably true, such as "Tweety flies"
When an inconsistency is recognized, only the truth value of the last type is changed.
Properties of FOPL
It is complete with respect to the domain of interest.
It is consistent.
The only way it can change is that new facts can be added as they become available.
◦ If these new facts are consistent with all the other facts that have already have been
asserted, then nothing will ever be retracted from the set of facts that are known
to be true.
◦ This is known as “monotonicity”.
If any of these properties is not satisfied, conventional logic based reasoning systems become
inadequate.
Non monotonic reasoning systems, are designed to be able to solve problems in which all of
these properties may be missing Issues to be addressed:
How can the knowledge base be extended to allow inferences to be made on the basis of
lack of knowledge as well as on the presence of it?
o We need to make clear the distinction between
It is known that P.
It is not known whether P.
o First-order predicate logic allows reasoning to be based on the first of these. o In
our new system, we call any inference that depends on the lack of some piece of
knowledge a non-monotonic inference.
oTraditional systems based on predicate logic are monotonic. Here number of
statements known to be true increases with time.
o New statements are added and new theorems are proved, but the previously
known statements never become invalid.
How can the knowledge base be updated properly when a new fact is added to the
system(or when the old one is removed)?
o In Non-Monotonic systems, since addition of a fact can cause previously
discovered proofs to become invalid,
how can those proofs, and all the conclusions that depend on them be
found?
Solution: keep track of proofs, which are often called justifications.
o Such a recording mechanism also makes it possible to support,
monotonic reasoning in the case where axioms must occasionally be
retracted to reflect changes in the world that is being modeled.
How can knowledge be used to help resolve conflicts when there are several in consistent
non monotonic inferences that could be drawn?
o It turns out that when inferences can be based
on the lack of knowledge as well as on its presence,
contradictions are much likely to occur than they were in conventional
logical systems in which the only possible contradictions.
Default Reasoning
Non monotonic reasoning is based on default reasoning or “most probabilistic choice”.
◦ S is assumed to be true as long as there is no evidence to the contrary.
Default reasoning ( or most probabilistic choice) is defined as follows:
◦ Definition 1 : If X is not known, then conclude Y.
◦ Definition 2 : If X can not be proved, then conclude Y.
◦ Definition 3: If X can not be proved in some allocated amount of time then
conclude Y.
Default Reasoning
This is a very common form of non-monotonic reasoning.
Here we want to draw conclusions based on what is most likely to be true.
Two Approaches to do this
◦ Non-Monotonic Logic
◦ Default Logic
Non-Monotonic reasoning is generic descriptions of a class of reasoning.
Non-Monotonic logic is a specific theory.
The same goes for Default reasoning and Default logic.
Non-monotonic Logic
One system that provides a basis for default reasoning is Non-monotonic Logic (NML).
This is basically an extension of first-order predicate logic to include a modal operator, M.
◦ The purpose of this is to allow for consistency.
states that
• for all x is x plays an instrument and if the fact that x can improvise is consistent with all
other knowledge
• then we can conclude that x is a jazz musician.
states that
• for all x and y, if x and y are related and if the fact that x gets along with y is consistent
with everything else that is beleieved,
• then we can conclude that x will defend y
Now this states that Quakers tend to be pacifists and Republicans tend not to be. BUT Nixon
was both a Quaker and a Republican so we could assert:
Quaker(Nixon)
Republican(Nixon)
This now leads to our total knowledge becoming inconsistent.
Default Logic
An alternative logic for performing default based reasoning is Reiter’s Default Logic (DL).
Default logic introduces a new inference rule of the form:
Now this is similar to Non-monotonic logic but there are some distinctions:
New inference rules are used for computing the set of plausible extensions. So in the
Nixon example above Default logic can support both assertions since is does not say
anything about how choose between them -- it will depend on the inference being made.
In Default logic any nonmonotonic expressions are rules of inference rather than
expressions. They cannot be manipulated by the other rules of inference. This leads to
some unexpected results.
In Default Logic, A indicates prerequisite, B indicates justification, and C indicates Consequence.
If we can prove from our beliefs that x is American and adult and believing that there is some car
that is owned by x does not lead to an inconsistency.
Inheritance:
One very common use of nonmonotonic reasoning is as a basis for inheriting attribute values
from a prototype description of a class to the individual entities that belong to the class.
Considering the Baseball example in Inheritable Knowledge, and try to write its inheriting
knowledge as rules in DL.
We can write a rule to account for the inheritance of a default value for the height of a baseball
player as:
Which prohibits someone from having more than one height, then we would not be able to apply
the default rule. Thus an explicitly stated value will block the inheritance of a default value, which
is exactly what we want.
Let's encode the default rule for the height of adult males in general. If we pattern it after the one
for baseball players, we get
Unfortunately, this rule does not work as we would like. In particular, if we again assert
Pitcher(Three-Finger-Brown) then the resulting theory contains 2 extensions: one in which our
first rule fires and brown’s height is 6-1 and one in which this rule applies and Brown’s height is
510. Neither of these extensions is preferred. In order to state that we prefer to get a value from
the more specific category, baseball player, we could rewrite the default rule for adult males in
general as:
This effectively blocks the application of the default knowledge about adult males in this case
that more specific information from the class of baseball players is available. Unfortunately, this
approach can become widely as the set of exceptions to the general rule increases. We would
end up with a rule like:
A clearer approach is to say something like. Adult males typically have a height of 5-10 unless
they are abnormal in some way. We can then associate with other classes the information that
they are abnormal in one or another way. So we could write, for example:
Abduction
Abductive reasoning is to abduce (or take away) a logical assumption, explanation, inference,
conclusion, hypothesis, or best guess from an observation or set of observations. Because the
conclusion is merely a best guess, the conclusion that is drawn may or may not be true. Daily
decision-making is also an example of abductive reasoning.
If we notice Spots, we might like to conclude measles, but it may be wrong. But may be a best
guess, we can make about what is going on. Deriving conclusions in this way is abductive
reasoning (a form of default reasoning).
Given 2 wff’s (AB) & (B), for any expression A & B, if it is consistent to assume A,
do so.
Minimalist Reasoning
We describe methods for saying a very specific and highly useful class of things that are
generally true. These methods are based on some variant of the idea of a minimal model. We
will define a model to be minimal if there are no other models in which fewer things are true. The
idea behind using minimal models as a basis for non-monotonic reasoning about the world is the
following –
There are many fewer true statements than false ones.
If something is true and relevant it makes sense to assume that it has been entered into
our knowledge base.
Therefore, assume that the only true statements are those that necessarily must be true
in order to maintain the consistency.
The extended KB
(
The problem is that we have assigned a special status to the positive instances of predicates as
opposed to negative ones. CWA forces completion of KB by adding negative assertion whenever
it is consistent to do so.
CWA captures part of the idea that anything that must not necessarily be true should be
assumed to be false, it does not capture all of it.
It has two essential limitations:
It operates on individual predicates without considering interactions among predicates
that are defined in the KB.
It assumes that all predicates have all their instances listed. Although in many database
applications this is true, in many KB systems it is not.
Circumscription
Circumscription is a rule of conjecture (conclusion formed on the basis of incomplete
information) that allows you
◦ to jump to the conclusion that the objects you can show that posses a certain
property, p, are in fact all the objects that posses that property.
Circumscription can also cope with default reasoning. Several theories of circumscription
have been proposed to deal with the problems of CWA.
Circumscription together with first order logic allows a form of Non-monotonic Reasoning.
Suppose we know:
This is where we apply circumscription and, in this case, we will assume that those things that
are shown to be abnormal are the only things to be abnormal. Thus we can rewrite our default
rule as:
: () () ()
and add the following
()
since there is nothing that cannot be shown to be abnormal.
The idea of truth maintenance system arose as a way of providing the ability to do
dependencydirected backtracking and so to support nonmonotonic reasoning.
Types of TMS:
justification-based TMS (JTMS)
assumption-based TMS (ATMS)
logic-based TMS (LTMS)
Basically TMSs:
• all do some form of dependency directed backtracking Assertions are connected via a
network of dependencies.
A Justification-based truth maintenance system (JTMS) is a simple TMS where one can examine
the consequences of the current set of assumptions. In JTMS labels are attached to arcs from
sentence nodes to justification nodes. This label is either "+" or "-". Then, for a justification node
we can talk of its IN-LIST, the list of its inputs with "+" label, and of its OUT-LIST, the list of its
inputs with "-" label.
The meaning of sentences is not known. We can have a node representing a sentence p and
one representing ~p and the two will be totally unrelated, unless relations are established
between them by justifications. For example, we can write:
~p^p Contradiction Node
o
|
x 'x' denotes a justification node
/ \ 'o' denotes a sentence node
+/ \+ o o
p ~p which says that if both p and ~p are IN we have a
contradiction.
The association of IN or OUT labels with the nodes in a dependency network defines an in-
outlabeling function. This function is consistent if:
• The label of a junctification node is IN iff the labels of all the sentence nodes in its in-list
are all IN and the labels of all the sentence nodes in its out-list are OUT.
• The label of a sentence node is IN iff it is a premise, or an enabled assumption node, or it
has an input from a justification node with label IN.
A set of important reasoning operations that a JTMS does not perform, including:
• Applying rules to derive conclusions
• Creating justifications for the results of applying rules
• Choosing among alternative ways of resolving a contradiction
• Detecting contradictions
All of these operations must be performed by the problem-solving program that is using the
JTMS.
The ATMS like the JTMS is designed to be used in conjunction with a separate problem solver.
The problem solver’s job is to:
• Create nodes that correspond to assertions (both those that are given as axioms and
those that are derived by the problem solver).
• Associate with each such node one or more justifications, each of which describes
reasoning chain that led to the node.
• Inform the ATMS of inconsistent contexts.
This is identical to the role of the problem solver that uses a JTMS, except that no explicit
choices among paths to follow need to be made as reasoning proceeds. Some decision may be
necessary at the end, though, if more than one possible solution still has a consistent context.
The role of the ATMS system is then to:
• Propagate inconsistencies, thus ruling out contexts that include subcontexts (set of
assertions) that are known to be inconsistent.
• Label each problem solver node with the contexts in which it has a valid justification. This
is done by combining contexts that correspond to the components of a justification. In
particular, given a justification of the form
1 2 … →
assign as a context for the node corresponding to C the intersection of the contexts
corresponding to the nodes A1 through An.
Contexts get eliminated as a result of the problem-solver asserting inconsistencies and the
ATMS propagating them. Nodes get created by the problem-solver to represent possible
components of a problem solution. They may then get pruned from consideration if all their
context labels get pruned.
8. Statistical Reasoning
Introduction:
Statistical Reasoning: The reasoning in which the representation is extended to allow some kind
of numeric measure of certainty (rather than true or false) to be associated with each statement.
A fact is believed to be true or false. For some kind of problems, describe beliefs that are not
certain but for which there is a supporting evidence.
There are 2 class of problems:
• First class contain problems in which there is genuine randomness in the word.
o Example: Cards Playing
• Second class contains problems that could in principle be modeled using the technique
we described (i.e. resolution from predicate logic) o Example: Medical Diagnosis
Read this expression as the probability of hypothesis H given that we have observed evidence
E. To compute this, we need to take into account the prior probability of H and the extent to
which E provides evidence of H.
Suppose, for example, that we are interested in examining the geological evidence at a
particular location to determine whether that would be a good place to dig to find a desired
mineral. If we know the prior probabilities of finding each of the various minerals and we know
the probabilities that if a mineral is present then certain physical characteristics will be observed,
then we use the Baye’s formula to compute from the evidence we collect, how likely it is that the
various minerals are present.
The key to using Baye’s theorem as a basis for uncertain reasoning is to recognize exactly what
it says.
Suppose we are solving a medical diagnosis problem. Consider the following assertions:
:
:
: .
• Without any additional evidence, the presence of spots serves as evidence in favor of
measles. It also serves as evidence of fever since measles would cause fever.
• Suppose we already know that the patient has measles. Then the additional evidence
that he has spots actually tells us nothing about fever.
• Either spots alone or fever alone would constitute evidence in favor of measles.
• If both are present, we need to take both into account in determining the total weight of
evidence.
Disadvantages with Baye’s Theorem
The size of set of joint probability that we require in order to compute this function grows
as 2n if there are n different propositions considered.
Baye’s Theorem is hard to deal with for several reasons:
◦ Too many probabilities have to be provided
◦ the space that would be required to store all the probabilities is too large.
◦ time required to compute the probabilities is too large.
We describe one practical way of compromising on a pure Bayesian system. MYCIN system is
an example of an expert system, since it performs a task normally done by a human expert.
MYCIN system attempts to recommend appropriate therapies for patients with bacterial
infections. It interacts with the physician to acquire the clinical data it needs. We concentrate on
the use of probabilistic reasoning.
MYCIN represents most of its diagnostic knowledge as a set of rules. Each rule has associated
with it a certainty factor, which is a measure of the extent to which the evidence is described by
antecedent of the rule, supports the conclusion that is given in the rule’s consequent. It uses
backward reasoning to the clinical data available from its goal of finding significant
diseasecausing organisms.
We must first need to describe some properties that we like combining functions to satisfy:
◦ Combining function should be commutative and Associative
◦ Until certainty is reached additional conforming evidence should increase MB
◦ If uncertain inferences are chained together then the result should be less certain than
either of the inferences alone
From MB and MD, CF can be computed. If several sources of corroborating evidence are
pooled, the absolute value of CF will increase. If conflicting evidence is introduced, the absolute
value of CF will decrease.
Our belief is a collection of several propositions taken together
We need to compute the certainty factor of a combination of hypothesis. This is necessary when
we need to know the certainty factor of a rule antecedent that contains several clauses. The
combination certainty factor can be computed from its MB and MD. The formula for the MB of
the conjunction {condition of being joined, proposition resulting from the combination of two or
more propositions using the ^ operator} and disjunction {proposition resulting from the
combination of two or more propositions using the v (OR) operator} of two hypotheses are:
[ 1 2, ] = min( [ 1, ], [ 2, ])
[ 1 2, ] = max( [ 1, ], [ 2, ])
MD can be computed analogously.
The certainty factor of the hypothesis must take into account both the strength with which the
evidence suggests the hypothesis and the level of confidence in the evidence. Let MB’[h,s] be
the measure of belief in h given that we are absolutely sure of the validity of s. Let e be the
evidence that led us to believe in s (for example, the actual readings of the laboratory
instruments or
results of applying other rules). Then:
It turns out that these definitions are incompatible with a Bayesian view of conditional
probability. Small changes to them however make them compatible. We can redefine MB as
MYCIN uses CF. The CF can be used to rank hypothesis in order of importance. Example, if a
patient has certain symptoms that suggest several possible diseases. Then the disease with
higher CF would be investigated first. If E then H CF(rule) = level of belief of H given E.
The first scenario (a), Our example rule has three antecedents with a single CF rather than three
separate rules; this makes the combination rules unnecessary. The rule writer did this because
the three antecedents are not independent.
To see how much difference MYCIN’s independence assumption can make, suppose for the
moment that we had instead had three separate rules and that the CF of each was 0.6. This
could happen and still be consistent with the combined CF of 0.7 if three conditions overlap
substantially. If we apply the MYCIN combination formula to the three separate rules, we get
This is a substantially different result than the true value, as expressed by the expert of 0.7.
Let’s consider what happens when independence assumptions are violated in the scenario of
(c):
BAYESION NETWORKS
CFs is a mechanism for reducing the complexity of a Bayesian reasoning system by making
some approximations to the formalism. Bayesian networks in which we preserve the formalism
and rely instead on the modularity of the world we are trying to model. Bayesian Network is also
called Belief Networks.
The basic idea of Bayesian Network is knowledge in the world is modular. Most events are
conditionally independent of other events. Adopt a model that can use local representation to
allow interactions between events that only affect each other. The main idea is that to describe
the real world it is not necessary to use a huge list of joint probabilities table in which list of
probabilities of all conceivable combinations of events. Some events may only be unidirectional
others may be bidirectional events may be casual and thus get chained tighter in network.
Implementation:
A Bayesian Network is a directed acyclic graph. A graph where the directions are links which
indicate dependencies that exist between nodes. Nodes represent propositions about events
or events themselves. Conditional probabilities quantify the strength of dependencies.
Eg: Consider the following facts
S: Sprinklers was on the last night
W: Grass is wet
R: It rained last night
Artificial Intelligence Notes
From the above diagram, Sprinkler suggests Wet and Wet suggests Rain. (a) shows the flow of
constraints.
There are two different ways that propositions can influence the likelihood of each other.
• The first is that causes. Influence the likelihood of their symptoms.
• The second is that the symptoms affect the likelihood of all of its possible causes.
Rules:
(i) If the sprinkler was ON last night then the grass will be wet this morning
(ii) If grass is wet this morning then it rained last night
(iii) By chaining (if two rules are applied together) we believe that it rained because we
believe that sprinkler was ON.
The idea behind the Bayesian network structure is to make a clear distinction between these two
kinds of influence.
Each node in Bayesian Network has an associated Conditional Probability table (CPT). This
gives the probability values for the random variable at the node conditioned on values for its
parents.
Since each row must sum to one. Since the C node has no parents, its CPT specifies the prior
probability that is cloudy (in this case, 0.5).
Dempster-Shafer Theory
So far we considered individual propositions and assign each of them a point of degree of belief
that is warranted for given evidence. The Dempster Shafer theory approach considers sets of
propositions and assigns each of them an interval
{ , }
in which the degree of belief must lie.
Belief measures the strength of evidence in favor of the set of propositions. It ranges from 0 to 1
where 0 indicates no evidence and 1 denoting certainty.
Plausability (PL) is defined as
()=1−
It also ranges from 0 to 1 and measures the extent to which evidence in favour of S leaves
room for belief in S.
where i.e. all the evidence that makes us believe in the correctness of P, and
Set up a confidence interval – an interval of probabilities within which the true probability lies
with a certain confidence based on belief B and plausibility PL provided by some evidence E for
a proposition P.
Suppose we are given two belief statements M1 M2. Let S be the subset of Θ which M1
assigns a
non-zero value & let y be corresponding set to M2. We define the combination M3 of M1 & M2.
E.g.:
Fuzzy Logic
Fuzzy logic is an alternative for representing some kinds of uncertain knowledge. Fuzzy logic is
a form of many-valued logic; it deals with reasoning that is approximate rather than fixed and
exact. Compared to traditional binary sets (where variables may take on true or false values),
fuzzy logic variables may have a truth value that ranges in degree between 0 and 1. Fuzzy logic
has been extended to handle the concept of partial truth, where the truth value may range
between completely true and completely false. Fuzzy set theory defines set membership as a
possibility distribution.
Weak slot and filler structures turns out to be useful one for reasons besides the support of
inheritance, though, including
It enables attribute values to be retrieved quickly assertions are indexed by the entities
binary predicates are indexed by first argument.
.
Properties of relations are easy to describe.
It allows ease of consideration as it embraces aspects of object oriented programming
including modularity and ease of viewing by people.
Weak slot and filler structures describe two views: . These talk about
the representations themselves and about techniques for reasoning with them. They do not say
much about the specific knowledge that the structures should contain. We call these as
“knowledge poor” structures.
A is an attribute value pair in its simplest form. A is a value that a slot can take -- could be
a numeric, string (or any data type) value or a pointer to another slot. A
does not consider the of the representation.
Semantic Nets were originally designed as a way to represent the meaning of English words.
The main idea is that the meaning of a concept comes from the ways in which it is connected to
other concepts. The information is stored by interconnecting nodes with labeled arcs. Semantic
nets
The Semantic Nets can be represented in different ways by using relationships. Semantic nets
have been used to represent a variety of knowledge in a variety of different programs.
One of the early ways that semantic nets were used was to find relationships among objects by
object representing the entire predicate statement and then introducing binary predicates to
describe the relationship to this new object of each of the original arguments.
score(Cubs, Dodgers, 5-3) can be represented in semantic net by creating a node to represent
the specific game & then relating each of the three pieces of information to it.
This technique is particularly useful for representing the contents of a typical declarative
sentence
In the networks, some distinctions are glossed that are important in reasoning. For example,
there should be difference between a link that defines a new entity and one that relates two
existing entities.
Both nodes represent objects that exist independently of their relationship to each other.
H1 and H2 are new concepts representing John’s height and Bill’s height. They are defined by
To represent simple quantified expressions in semantic nets. The way is to partition the
semantic net into a hierarchical set of spaces, each of which corresponds to the scope of one or
more variables.
The statement:
The nodes Dogs, Bite, Mail-Carrier represent the classes of dogs, bitings and mail carriers
respectively, while the nodes d, b, m represent a particular dog, biting and a particular
mailcarrier. This fact can be represented easily by a single net without no partitioning.
The node g stands for the assertion given above. Node g is an instance of the special class GS
of general statements about the world (i.e., those with universal quantifiers). Every element of
GS
it is not viewed as an existentially quantified variable whose value may depend on the value of
d.
he statement:
As we expand the range of problem solving tasks that the representation must
support, the representation necessarily begins to become more complex.
It becomes useful to assign more structure to nodes as well as to links.
The more structure the system has, the more likely it is to be termed a frame system.
The set of major league baseball players is a subset of adult males and so forth
Theinstance relation corresponds to the relation element-of.
Pee- Wee-Reese is the element of Fielder.
A class represents a set.
There are 2 kinds of attributes that can be associated with it:
Attributes about the set itself and
Attributes that are to be inherited by each element of the set
Prefixed with *
Slot and Filler Structures are a device to support property inheritance along isa and instance
links.
◦ Knowledge in these is structured as a set of entities and their
attributes. This structure turns out to be useful for following reasons:
◦ It enables attribute values to be retrieved quickly
assertions are indexed by the entities
binary predicates are indexed by first argument. E.g. team(Mike-Hall , Cardiff).
◦ Properties of relations are easy to describe .
◦ It allows ease of consideration as it embraces aspects of object oriented programming.
Modularity
Ease of viewing by people.
Inheritance
-- the isa and instance representation provide a mechanism to implement this.
Inheritance also provides a means of dealing with default reasoning. E.g. we could represent:
Emus are birds.
• Typically birds fly and have wings.
• Emus run.
in the following Semantic net:
Partitioned Networks
Partitioned Semantic Networks allow for:
• propositions to be made without commitment to truth.
• expressions to be quantified.
Basic idea: Break network into spaces which consist of groups of nodes and arcs and regard
each space as a node.
Consider the following: Andrew believes that the earth is flat. We can encode the proposition the
earth is flat in a space and within it have nodes and arcs the represent the fact. We can the have
nodes and arcs to link this space the the rest of the network to represent Andrew's belief.
NEED OF FRAMES
Frame is a type of schema used in many AI applications including vision and natural language
processing. Frames provide a convenient structure for representing objects that are typical to
stereotypical situations. The situations to represent may be visual scenes, structure of complex
physical objects, etc. Frames are also useful for representing commonsense knowledge. As
frames allow nodes to have structures they can be regarded as three-dimensional
representations of knowledge.
A frame is similar to a record structure and corresponding to the fields and values are slots and
slot fillers. Basically it is a group of slots and fillers that defines a stereotypical object. A single
frame is not much useful. Frame systems usually have collection of frames connected to each
other. Value of an attribute of one frame may be another frame. A frame for a book is given
below.
Slots Fillers
publishe Thomson
r
title Expert
Systems
author Giarratano
edition Third
year 1998
pages 600
The above example is simple one but most of the frames are complex. Moreover with filler slots
and inheritance provided by frames powerful knowledge representation systems can be built.
Frames can represent either generic or frame. Following is the example for generic frame.
Slot Fillers
name computer
specialization_o a_kind_of machine
f
types (desktop, laptop,mainframe,super) if-
added: Procedure ADD_COMPUTER
speed default: faster
if-needed: Procedure FIND_SPEED
location (home,office,mobile)
under_warranty (yes, no)
The fillers may values such as computer in the name slot or a range of values as in type’s slot.
The procedures attached to the slots are called procedural attachments. There are mainly three
types of procedural attachments: if-needed, default and if-added. As the name implies if-needed
types of procedures will be executed when a filler value is needed. Default value is taken if no
other value exists. Defaults are used to represent commonsense knowledge. Commonsense is
generally used when no more situation specific knowledge is available.
The if-added type is required if any value is to be added to a slot. In the above example, if a new
type of computer is invented ADD_COMPUTER procedure should be executed to add that
information. An if-removed type is used to remove a value from the slot.
Person
isa: Mammal Cardinality:
Adult-Male
isa: Person
Cardinality:
Rugby-Player
isa: Adult-Male Cardinality:
Height:
Weight:
Position:
Team:
Team-Colours:
Back
isa: Rugby-Player Cardinality:
Tries:
Mike-Hall
instance: Back Height:
6-0
Position: Centre
Team: Cardiff-RFC
Team-Colours: Black/Blue
Rugby-Team
isa: Team Cardinality: Team-size: 15
Coach:
Note
• The isa relation is in fact the subset relation.
• The instance relation is in fact element of.
• The isa attribute possesses a transitivity property. This implies: Robert-Howley is a Back
and a Back is a Rugby-Player who in turn is an Adult-Male and also a Person.
• Both isa and instance have inverses which are called subclasses or all instances.
• There are attributes that are associated with the class or set such as cardinality and on
the other hand there are attributes that are possessed by each member of the class or
set.
Solution: MetaClasses
A metaclass is a special class whose elements are themselves classes.
Now consider our rugby teams as:
The basic metaclass is Class, and this allows us to
• define classes which are instances of other classes, and (thus) inherit properties from
this class.
Inheritance of default values occurs when one element or class is an instance of a class.
Slots as Objects
How can we to represent the following properties in frames?
• Attributes such as weight, age be attached and make sense.
• Constraints on values such as age being less than a hundred
• Default values
• Rules for inheritance of values such as children inheriting parent's names Rules for
computing values Many values for a slot.
A slot is a relation that maps from its domain of classes to its range of values.
A relation is a set of ordered pairs so one relation is a subset of another.
Since slot is a set the set of all slots can be represent by a metaclass called Slot, say.
It is the theory of how to represent the kind of knowledge about events that is usually contained
in natural language sentences. The goal is to represent the knowledge in a way that Facilitates
drawing inferences from the sentences.
Is independent of the language in which the sentences are originally stated. CD
provides a structure into which nodes representing information can be placed a specific set of
primitives at a given level of granularity.
Representation of Conceptual Dependency:
A second set of CD building blocks is the set of allowable dependencies among the
conceptualizations described in a sentence. There are 4 primitive conceptual categories from
which dependency structures can be built.
describes the relationship between an actor and the event he or she causes. This is a two-way
dependency since neither actor nor event can be considered primary. The letter p above the
dependency link indicates past tense.
describes the relationship between a PP and a PA that is being asserted to describe
it. Many state descriptions, such as height, are represented in CD as numeric scales.
describes the relationship between two PPs, one of which belongs to the set
defined by the other.
describes the relationship between a PP and an attribute that has already been
predicated of it. The direction of the arrow is toward the PP being described.
describes the relationship between two PPs, one of which provides a particular
kind of information about the other. The three most common types of information to be
provided in this way are o possession (shown as POSS-BY), o location (shown as
LOC) and o physical containment (shown as CONT).
The direction of the arrow is again tow ard the concept being described.
describes the relationship between an ACT and the PP that is the object of that
ACT. The direction of the arrow is toward the ACT since the context of the specific ACT
determines the meaning of the object relation.
describes the relationship between an ACT and the source and the recipient of the
ACT.
describes the relationship between an ACT and the instrument with which it is
performed. The instrument must always be a full conceptualization (i.e., it must contain
so point in the opposite direction of the implication arrows. The two forms of the rule
A script is a structure that prescribes a set of circumstances which could be expected to follow
on from one another. It is similar to a thought sequence or a chain of situations which could be
anticipated. It could be considered to consist of a number of slots or frames but with more
specialized roles.
Scripts provide an ability for default reasoning when information is not available that directly
states that an action occurred. So we may assume, unless otherwise stated, that a diner at a
restaurant was served food, that the diner paid for the food, and that the dinner was served by a
waiter/waitress.
The important components of the script are: these must be satisfied before events in the
script can occur.
Conditions that will be true after events in script occur.
s representing objects involved in events.
s involved in the events.
Variations on the script. Different tracks may share components of the
same script.
The sequence of events that occur. Events are represented in conceptual dependency form.
Scripts are useful in describing certain situations such as robbing a bank. This might involve:
Getting a gun.
Hold up a bank.
Escape with the money.
The script does not contain typical actions although there are options such as whether the
customer was pleased or not. There are multiple paths through the scenes to make for a robust
script what would a “going to the movies” script look like? Would it have similar props, actors,
scenes? How about “ going to class” ?
CYC is a very large knowledge base project aimed at capturing human commonsense
knowledge. The goal of CYC is to encode the large body of knowledge that is so obvious that it
is easy to forget to state it explicitly. Such a knowledge base could then be combined with
specialized knowledge bases to produce systems that are less brittle than most of the ones
available today.
Building an immense knowledge base is a staggering task. There are two possibilities for
acquiring this knowledge automatically:
1. In order for a system to learn a great deal, it must already know a
great deal. In particular, systems with a lot of knowledge will be able to employ
Strong Slot and Filler Structures (Continued)
The CD representation of the information contained in the sentence is shown above. It says that
Bill informed John that he (Bill) will do something to break John’s nose. Bill did this so that John
will believe that if he (John) does something (different from what Bill will do to break his nose),
then Bill will break John’s nose. In this representation, the word believe has been used to
simplify the example. But the idea behind believe can be represented in CD as MTRANS of a
fact into John’s memory. The actions do1 and do2 are dummy placeholders that refer to some
as yet unspecified actions.
CYCL
CYCs knowledge is encoded in a representation language called CYCL.
CYCL is a frame based system that incorporates most of the techniques.
Generalizes the notion of inheritance so that properties can be inherited
along any link, not just isa and instance.
CYCL contains a constraint language that allows the expression of arbitrary
first-order logical expressions.
11. Learning
What is Learning?
Learning is an important area in AI, perhaps more so than planning.
• Problems are hard -- harder than planning.
• Recognised Solutions are not as common as planning.
• A goal of AI is to enable computers that can be taught rather than programmed.
Why is it hard?
• Intelligence implies that an organism or machine must be able to adapt to new situations.
• It must be able to learn to do new things.
• This requires knowledge acquisition, inference, updating/refinement of knowledge base,
acquisition of heuristics, applying faster searches, etc.
Rote Learning
Rote Learning is basically memorisation.
• Saving knowledge so it can be used again.
• Retrieval is the only problem.
• No repeated computation, inference or query is necessary.
Samuel's Checkers program employed rote learning (it also used parameter adjustment which
will be discussed shortly).
• A minimax search was used to explore the game tree.
• Time constraints do not permit complete searches.
• It records board positions and scores at search ends.
• Now if the same board position arises later in the game the stored value can be recalled
and the end effect is that more deeper searched have occurred.
Rote learning is basically a simple process. However it does illustrate some issues that are
relevant to more complex learning issues.
• Organisation o access of the stored value must be faster than it would be to recompute it.
Methods such as hashing, indexing and sorting can be employed to enable this.
o E.g Samuel's program indexed board positions by noting the number of pieces.
• Generalisation o The number of potentially stored objects can be very large. We may
need to generalise some information to make the problem manageable.
o E.g Samuel's program stored game positions only for white to move. Also
rotations along diagonals are combined.
• Stability of the Environment o Rote learning is not very effective in a rapidly changing
environment. If the environment does change then we must detect and record exactly
what has changed -- the frame problem.
Store v Compute
Many programs rely on an evaluation procedure to summarize the state of search etc. Game
playing programs provide many examples of this.
However, many programs have a static evaluation function.
In learning a slight modification of the formulation of the evaluation of the problem is required.
Here the problem has an evaluation function that is represented as a polynomial of the form such
as:
For example: Making dinner can be described a lay the table, cook dinner, serve dinner. We
could treat laying the table as on action even though it involves a sequence of actions.
Consider a blocks world example in which ON(C,B) and ON(A,TABLE) are true.
STRIPS can achieve ON(A,B) in four steps:
UNSTACK(C,B), PUTDOWN(C), PICKUP(A), STACK(A,B)
STRIPS now builds a macro-operator MACROP with preconditions ON(C,B), ON(A,TABLE),
postconditions ON(A,B), ON(C,TABLE) and the four steps as its body.
MACROP can now be used in future operation.
But it is not very general. The above can be easily generalised with variables used in place of the
blocks.
However generalisation is not always that easy (See Rich and Knight).
Learning by Chunking
Chunking involves similar ideas to Macro Operators and originates from psychological ideas on
memory and problem solving.
The computational basis is in production systems (studied earlier).
SOAR is a system that use production rules to represent its knowledge. It also employs chunking
to learn from experience.
Basic Outline of SOAR's Method
• SOAR solves problems it fires productions these are stored in long term memory.
• Some firings turn out to be more useful than others.
• When SOAR detects are useful sequence of firings, it creates chunks.
• A chunk is essentially a large production that does the work of an entire sequence of
smaller ones.
• Chunks may be generalised before storing.
Inductive Learning
This involves the process of learning by example -- where a system tries to induce a general rule
from a set of observed instances.
This involves classification -- assigning, to a particular input, the name of a class to which it
belongs. Classification is important to many problem solving tasks.
A learning system has to be capable of evolving its own class descriptions:
• Initial class definitions may not be adequate.
• The world may not be well understood or rapidly changing.
The task of constructing class definitions is called induction or concept learning
Version Spaces
Structural concept learning systems are not without their problems.
The biggest problem is that the teacher must guide the system through carefully chosen
sequences of examples.
In Winston's program the order of the process is important since new links are added as and
when now knowledge is gathered.
The concept of version spaces aims is insensitive to order of the example presented.
To do this instead of evolving a single concept description a set of possible descriptions are
maintained. As new examples are presented the set evolves as a process of new instances and
near misses.
We will assume that each slot in a version space description is made up of a set of predicates
that do not negate other predicates in the set -- positive literals.
Indeed we can represent a description as a frame bases representation with several slots or
indeed use a more general representation. For the sake of simplifying the discussion we will
keep to simple representations.
If we keep to the above definition the Mitchell's candidate elimination algorithm is the best known
algorithm.
Let us look at an example where we are presented with a number of playing cards and we need
to learn if the card is odd and black.
We already know things like red, black, spade, club, even card, odd card etc.
• Let the be the first card in the sample set. We are told that this is odd black.
• So the most specific concept is alone the least is still all our cards.
• Next card : we need to modify our most specific concept to indicate the generalisation of
the set something like ``odd and black cards''. Least remains unchanged.
• Next card : Now we can modify the least specific set to exclude the . As more exclusion
are added we will generalise this to all black cards and all odd cards.
• NOTE that negative instances cause least specific concepts to become more specific and
positive instances similarly affect the most specific.
• If the two sets become the same set then the result is guaranteed and the target concept
is met.
Then set
Decision Trees
Quinlan in his ID3 system (986) introduced the idea of decision trees.
ID3 is a program that can build trees automatically from given positive and negative instances.
Basically each leaf of a decision tree asserts a positive or negative concept. To classify a
particular input we start at the top and follow assertions down until we reach an answer (Fig 28)
In this case operational criterion is: We must express concept definition in pure description
language syntax.
If we analyse the proof (say with an ATMS). We can learn a few general rules from it.
Since Brecon appears in the database when we abstract things we must explicitly record the use
of the fact:
near(Cardiff,x) holds(loc(x),drive(Cardiff,x), result(fly(Cardiff), s')))
This states if x is near Cardiff we can get to it by flying to Cardiff and then driving. We have learnt
this general rule.
This states we can get top Brecon by flying to another nearby airport and driving from there.
We could add airport(Swansea) and get an alternative means of travel plan. Finally
we could actually abstract out both Brecon and Cardiff to get a general plan:
near(x,y) airport(y) holds(loc(y), result(drive(x,y),result(fly(x),s')))
Discovery
Discovery is a restricted form of learning in which one entity acquires knowledge without the help
of a teacher.
Analogy
Analogy involves a complicated mapping between what might appear to be two dissimilar
concepts.
Bill is built like a large outdoor brick lavatory.
He was like putty in her hands
Humans quickly recognise the abstractions involved and understand the meaning.
There are two methods of analogical problem methods studied in AI.
Transformational Analogy
Look for a similar solution and copy it to the new situation making suitable substitutions where
appropriate.
E.g. Geometry.
If you know about lengths of line segments and a proof that certain lines are equal (Fig. 29) then
we can make similar assertions about angles.
Transformational analogy does not look at how the problem was solved -- it only looks at the final
solution.
The history of the problem solution - the steps involved - are often relevant.
Carbonell (1986) showed that derivational analogy is a necessary component in the transfer of
skills in complex domains:
• In translating Pascal code to LISP -- line by line translation is no use. You will have to
reuse the major structural and control decisions.
• One way to do this is to replay a previous derivation and modify it when necessary.
• If initial steps and assumptions are still valid copy them across.
• Otherwise alternatives need to found -- best first search fashion.
• Reasoning by analogy becomes a search in T-space -- means-end analysis.
Expert Systems
• Expert systems (ES) are one of the prominent research domains of AI. It is introduced by
the researchers at Stanford University, Computer Science Department.
• Expert systems solve problems that are normally solved by human “experts”. To solve
expert-level problems, expert systems need access to a substantial domain knowledge
base, which must be built as efficiently as possible. They also need to exploit one or more
reasoning mechanisms to apply their knowledge to the problems they are given. Then
they need a mechanism for explaining what they have done to the users who rely on
them.
• The problems that expert systems deal with are highly diverse. There are some general
issues that arise across these varying domains. But it also turns out that there are
powerful techniques that can be defined for specific classes of problems.
• What are Expert Systems?
o The expert systems are the computer applications developed to solve complex
problems in a particular domain, at the level of extra-ordinary human intelligence
and expertise.
Knowledge Base
• It contains domain-specific and high-quality knowledge. Knowledge is required to exhibit
intelligence. The success of any ES majorly depends upon the collection of highly
accurate and precise knowledge.
• What is Knowledge?
o The data is collection of facts. The information is organized as data and facts
about the task domain. Data, information, and past experience combined together
are termed as knowledge.
• Components of Knowledge Base o The knowledge base of an ES is a store of both,
factual and heuristic knowledge.
Factual Knowledge − It is the information widely accepted by the
Knowledge Engineers and scholars in the task domain.
Heuristic Knowledge − It is about practice, accurate judgment, one’s ability
of evaluation, and guessing.
• Knowledge representation o It is the method used to organize and formalize the
knowledge in the knowledge base. It is in the form of IF-THEN-ELSE rules.
• Knowledge Acquisition o The success of any expert system majorly depends on the
quality, completeness, and accuracy of the information stored in the knowledge base.
o The knowledge base is formed by readings from various experts, scholars, and
the Knowledge Engineers. The knowledge engineer is a person with the qualities
of empathy, quick learning, and case analyzing skills.
o He acquires information from subject expert by recording, interviewing, and
observing him at work, etc.
o He then categorizes and organizes the information in a meaningful way, in the
form of IF-THEN-ELSE rules, to be used by interference machine. The knowledge
engineer also monitors the development of the ES.
Inference Engine
Use of efficient procedures and rules by the Inference Engine is essential in deducting a
correct, flawless solution.
In case of knowledge-based ES, the Inference Engine acquires and manipulates the
knowledge from the knowledge base to arrive at a particular solution.
In case of rule based ES, it − o Applies rules repeatedly to the facts, which are obtained
from earlier rule application.
o Adds new knowledge into the knowledge base if required.
o Resolves rules conflict when multiple rules are applicable to a particular case.
To recommend a solution, the Inference Engine uses the following strategies − o Forward
Chaining o Backward Chaining
User Interface
User interface provides interaction between user of the ES and the ES itself.
It is generally Natural Language Processing so as to be used by the user who is well-
versed in the task domain.
The user of the ES need not be necessarily an expert in Artificial Intelligence.
It explains how the ES has arrived at a particular recommendation. The explanation may
appear in the following forms − o Natural language displayed on screen. o Verbal
narrations in natural language. o Listing of rule numbers displayed on the screen.
o The user interface makes it easy to trace the credibility of the deductions.
MYCIN is one example of an expert system rule. All the rules we show are English versions of
the actual rules that the systems use.
RI (sometimes are called XCON) is a program that configures DEC VAX systems. Its
rules look like this:
Notice that RI’s rules, unlike MYCIN’s, contain no numeric measures of certainty. In the
task domain with which RI deals, it is possible to state exactly the correct thing to be
done in each particular set of circumstances. One reason for this is that there exists a
good deal of human expertise in this area. Another is that since RI is doing a design task,
it is not necessary to consider all possible alternatives; one good one is enough. As a
result, probabilistic information is not necessary in RI.
PROSPECTOR is a program that provides advice on mineral exploration. Its rules look
like this:
In PROSPECTOR, each rule contains two confidence estimates. The first indicates the
extent to which the presence of the evidence described in the condition part of the rule
suggests the validity of the rule’s conclusion. In the PROSPECTOR rule shown above,
the number 2 indicates that the presence of the evidence is mildly encouraging. The
second confidence estimate measures the extent to which the evidence is necessary to
the validity of the conclusion or stated another way, the extent to which the lack of the
evidence indicates that the conclusion is not valid.
DESIGN ADVISOR is a system that critiques chip designs. Its rules look like:
This gives advice to a chip designer, who can accept or reject the advice. If the advice is
rejected,, the system can exploit a justification-based truth maintenance system to revise
its model of the circuit. The first rule shown here says that an element should be criticized
for poor resetability if the sequential level count is greater than two, unless its signal is
currently believed to be resettable.
Early expert systems shells provided mechanisms for knowledge representation, reasoning and
explanation. But as experience with using these systems to solve real world problem grew, it
became clear that expert system shells needed to do something else as well. They needed to
make it easy to integrate expert systems with other kinds of programs.
EXPLANATION
In order for an expert system to be an effective tool, people must be able to interact with it easily.
To facilitate this interaction, the expert system must have the following two capabilities in
addition to the ability to perform its underlying task:
Explain its reasoning:
o In many of the domains in which expert systems operate, people will not accept
results unless they have been convinced of the accuracy of the reasoning process
that produced those results. This is particularly true, for example, in medicine,
where a doctor must accept ultimate responsibility for a diagnosis, even if that
diagnosis was arrived at with considerable help from a program.
Acquire new knowledge and modifications of old knowledge:
o Since expert systems derive their power from the richness of the knowledge
bases they exploit, it is extremely important that those knowledge bases be as
complete and as accurate as possible. One way to get this knowledge into a
program is through interaction with the human expert. Another way is to have the
program learn expert behavior from raw data.
KNOWLEDGE ACQUISITION
How are expert systems built? Typically, a knowledge engineer interviews a domain expert to
elucidate expert knowledge, when is then translated into rules. After the initial system is built, it
must be iteratively refined until it approximates expert-level performance. This process is
expensive and time-consuming, so it is worthwhile to look for more automatic ways of
constructing expert knowledge bases.
While no totally automatic knowledge acquisition systems yet exist, there are many programs
that interact with domain experts to extract expert knowledge efficiently. These programs provide
support for the following activities:
Entering knowledge
Maintaining knowledge base consistency
Ensuring knowledge base completeness
The most useful knowledge acquisition programs are those that are restricted to a particular
problem-solving paradigm e.g. diagnosis or design. It is important to be able to enumerate the
roles that knowledge can play in the problem-solving process. For example, if the paradigm is
diagnosis, then the program can structure its knowledge base around symptoms, hypotheses
and causes. It can identify symptoms for which the expert has not yet provided causes.
Since one symptom may have multiple causes, the program can ask for knowledge about how to
decide when one hypothesis is better than another. If we move to another type of
problemsolving, say profitably interacting with an expert.
MOLE (Knowledge Acquisition System)
It is a system for heuristic classification problems, such as diagnosing diseases. In particular, it is
used in conjunction with the cover-and-differentiate problem-solving method. An expert system
produced by MOLE accepts input data, comes up with a set of candidate explanations or
classifications that cover (or explain) the data, then uses differentiating knowledge to determine
which one is best. The process is iterative, since explanations must themselves be justified, until
ultimate causes the ascertained.
MOLE interacts with a domain expert to produce a knowledge base that a system called MOLE-p
(for MOLE-performance) uses to solve problems. The acquisition proceeds through several
steps:
1. Initial Knowledge base construction.
MOLE asks the expert to list common symptoms or complaints that might require
diagnosis. For each symptom, MOLE prompts for a list of possible explanations.
MOLE then iteratively seeks out higher-level explanations until it comes up with a
set of ultimate causes. During this process, MOLE builds an influence network
similar to the belief networks.
The expert provides covering knowledge, that is, the knowledge that a
hypothesized event might be the cause of a certain symptom.
2. Refinement of the knowledge base.
MOLE now tries to identify the weaknesses of the knowledge base. One approach
is to find holes and prompt the expert to fill them. It is difficult, in general, to know
whether a knowledge base is complete, so instead MOLE lets the expert watch
MOLE-p solving sample problems. Whenever MOLE-p makes an incorrect
diagnosis, the expert adds new knowledge. There are several ways in which
MOLE-p can reach the wrong conclusion. It may incorrectly reject a hypothesis
because it does not feel that the hypothesis is needed to explain any symptom.
MOLE has been used to build systems that diagnose problems with car engines, problems in
steelrolling mills, and inefficiencies in coal-burning power plants. For MOLE to be applicable,
however, it must be possible to preenumerate solutions or classifications. It must also be
practical to encode the knowledge in terms of covering and differentiating.
One problem-solving method useful for design tasks is called propose-and-revise. Propose-
andrevise systems build up solutions incrementally. First, the system proposes an extension to
the current design. Then it checks whether the extension violates any global or local constraints.
Constraints violations are then fixed, and the process repeats.
SALT Program
The SALT program provides mechanisms for elucidating this knowledge from the expert. Like
MOLE, SALT builds a dependency network as it converses with an expert. Each node stands for
a value of a parameter that must be acquired or generated. There are three kinds of links:
Contributes-to: Associated with the first type of link are procedures that allow SALT to
generate a value for one parameter based on the value of another.
Constraints: Rules out certain parameter values.
Suggests-revision-of: points of ways in which a constraint violation can be fixed.
Control Knowledge is also important. It is critical that the system propose extensions and
revisions that lead toward a design solution. SALT allows the expert to rate revisions in terms of
how much trouble they tend to produce.
SALT compiles its dependency network into a set of production rules. As with MOLE, an expert
can watch the production system solve problems and can override the system’s decision. At the
point, the knowledge base can be changes or the override can be logged for future inspection.