Unit 3 Notes
Unit 3 Notes
KNOWLEDGE REPRESENTATION:-
For the purpose of solving complex problems c\encountered in AI, we need both a large amount
of knowledge and some mechanism for manipulating that knowledge to create solutions to new
problems. A variety of ways of representing knowledge (facts) have been exploited in AI
programs. In all variety of knowledge representations , we deal with two kinds of entities.
A. Facts: Truths in some relevant world. These are the things we want to represent.
One way to think of structuring these entities is at two levels : (a) the knowledge level, at which
facts are described, and (b) the symbol level, at which representations of objects at the
knowledge level are defined in terms of symbols that can be manipulated by programs.
The facts and representations are linked with two-way mappings. This link is called
representation mappings. The forward representation mapping maps from facts to
representations. The backward representation mapping goes the other way, from representations
to facts.
Given the facts, it is not possible to answer a simple question such as: “Who is
the heaviest player?”
Also, But if a procedure for finding the heaviest player is provided, then these facts
will enable that procedure to compute an answer.
Moreover, We can ask things like who “bats – left” and “throws – right”.
Inheritable Knowledge
Here the knowledge elements inherit attributes from their parents.
The knowledge embodied in the design hierarchies found in the functional, physical
and process domains.
Within the hierarchy, elements inherit attributes from their parents, but in many cases,
not all attributes of the parent elements prescribed to the child elements.
Also, The inheritance is a powerful form of inference, but not adequate.
Moreover, The basic KR (Knowledge Representation) needs to augment with inference
mechanism.
Property inheritance: The objects or elements of specific classes inherit attributes
and values from more general classes.
So, The classes organized in a generalized hierarchy.
Boxed nodes — objects and values of attributes of objects.
Arrows — the point from object to its value.
This structure is known as a slot and filler structure, semantic network or a collection
of frames.
The steps to retrieve a value for an attribute of an instance object:
1. Find the object in the knowledge base
2. If there is a value for the attribute report it
3. Otherwise look for a value of an instance, if none fail
4. Also, Go to that node and find a value for the attribute and then report it
5. Otherwise, search through using is until a value is found for the attribute.
Inferential Knowledge
This knowledge generates new information from the given information.
This new information does not require further data gathering form source but does
require analysis of the given information to generate new knowledge.
Example: given a set of relations and values, one may infer other values or relations. A
predicate logic (a mathematical deduction) used to infer from a set of attributes.
Moreover, Inference through predicate logic uses a set of logical operations to relate
Represent knowledge as formal logic: All dogs have tails ∀x: dog(x) → hastail(x)
individual data.
Advantages:
A set of strict rules.
Can use to derive more facts.
Also, Truths of new statements can be verified.
Guaranteed correctness.
So, Many inference procedures available to implement standard rules of logic popular
in AI systems. e.g Automated theorem proving.
Procedural Knowledge
A representation in which the control information, to use the knowledge, embedded in the
knowledge itself. For example, computer programs, directions, and recipes; these indicate
specific use or implementation;
Moreover, Knowledge encoded in some procedures, small programs that know how to do
specific things, how to proceed.
Advantages:
Heuristic or domain-specific knowledge can represent.
Moreover, Extended logical inferences, such as default reasoning facilitated.
Also, Side effects of actions may model. Some rules may become false in time.
Keeping track of this in large systems may be tricky.
Disadvantages:
Completeness — not all cases may represent.
Consistency — not all deductions may be correct. e.g If we know that Fred is a
bird we might deduce that Fred can fly. Later we might discover that Fred is
an emu.
Modularity sacrificed. Changes in knowledge base might have far-reaching
effects.
Cumbersome control information.
Also, Consider the following example that shows the use of predicate logic as a way of
representing knowledge.
1. Marcus was a man.
2. Marcus was a Pompeian.
3. All Pompeians were Romans.
4. Caesar was a ruler.
5. Also, All Pompeians were either loyal to Caesar or hated him.
6. Everyone is loyal to someone.
7. People only try to assassinate rulers they are not loyal to.
8. Marcus tried to assassinate Caesar.
The facts described by these sentences can be represented as a set of well-formed formulas (wffs)
as follows:
1. Marcus was a man.
man(Marcus)
2. Marcus was a Pompeian.
Pompeian(Marcus)
→¬loyalto(x, y)
8. Marcus tried to assassinate Caesar.
tryassassinate(Marcus, Caesar)
Now suppose if we want to use these statements to answer the question: Was Marcus loyal to
Caesar?
Also, Now let’s try to produce a formal proof, reasoning backward from the desired goal: ¬
Ioyalto(Marcus, Caesar)
In order to prove the goal, we need to use the rules of inference to transform it into another goal
(or possibly a set of goals) that can, in turn, transformed, and so on, until there are no
unsatisfied goals remaining.
Now we can satisfy the last goal and produce a proof that Marcus was not loyal
to Caesar.
Moreover, From this simple example, we see that three important issues must be
addressed in the process of converting English sentences into logical statements and then
using those statements to deduce new ones:
1. Many English sentences are ambiguous (for example, 5, 6, and 7
above). Choosing the correct interpretation may be difficult.
2. Also, There is often a choice of how to represent the knowledge. Simple
representations are desirable, but they may exclude certain kinds of reasoning.
3. Similalry, Even in very simple situations, a set of sentences is unlikely to contain
all the information necessary to reason about the topic at hand. In order to be
able to use a set of statements effectively. Moreover, It is usually necessary to
have access to another set of statements that represent facts that people consider
too obvious to mention.
Representing Instance and ISA Relationships
Specific attributes instance and isa play an important role particularly in a useful form of
reasoning called property inheritance.
The predicates instance and isa explicitly captured the relationships they used to
express, namely class membership and class inclusion.
4.2 shows the first five sentences of the last section represented in logic in three different
ways.
The first part of the figure contains the representations we have already discussed.
In these representations, class membership represented with unary predicates (such
as Roman), each of which corresponds to a class.
Asserting that P(x) is true is equivalent to asserting that x is an instance (or element) of P.
The second part of the figure contains representations that use the instance predicate
explicitly.
79)]
6) No mortal lives longer than 150 years.
x: t1: At2: mortal(x) born(x, t1) gt(t2 – t1,150) → died(x, t2)
7) It is now 1991.
now = 1991
So, Above example shows how these ideas of computable functions and predicates can be useful.
It also makes use of the notion of equality and allows equal objects to be substituted for each
other whenever it appears helpful to do so during a proof.
So, Now suppose we want to answer the question “Is Marcus alive?”
The statements suggested here, there may be two ways of deducing an answer.
Either we can show that Marcus is dead because he was killed by the volcano or we
can show that he must be dead because he would otherwise be more than 150 years old,
which we know is not possible.
Also, As soon as we attempt to follow either of those paths rigorously, however, we
discover, just as we did in the last example, that we need some additional knowledge. For
example, our statements talk about dying, but they say nothing that relates to being alive,
which is what the question is asking.
So we add the following facts:
8) Alive means not dead.
x: t: [alive(x, t) → ¬ dead(x, t)] [¬ dead(x, t) → alive(x, t)]
9) If someone dies, then he is dead at all later times.
x: t1: At2: died(x, t1) gt(t2, t1) → dead(x, t2)
So, Now let’s attempt to answer the question “Is Marcus alive?” by proving: ¬ alive(Marcus,
now)
Resolution
Propositional Resolution
1. Convert all the propositions of F to clause form.
2. Negate P and convert the result to clause form. Add it to the set of clauses obtained
in step 1.
3. Repeat until either a contradiction is found or no progress can be made:
1. Select two clauses. Call these the parent clauses.
2. Resolve them together. The resulting clause, called the resolvent, will be the
disjunction of all of the literals of both of the parent clauses with the following
exception: If there are any pairs of literals L and ¬ L such that one of the
parent clauses contains L and the other contains ¬L, then select one such pair
and eliminate both L and ¬ L from the resolvent.
3. If the resolvent is the empty clause, then a contradiction has been found. If it is
not, then add it to the set of classes available to the procedure.
The Unification Algorithm
In propositional logic, it is easy to determine that two literals cannot both be true at
the same time.
Simply look for L and ¬L in predicate logic, this matching process is more
complicated since the arguments of the predicates must be considered.
For example, man(John) and ¬man(John) is a contradiction, while the man(John) and
¬man(Spot) is not.
Thus, in order to determine contradictions, we need a matching procedure that
compares two literals and discovers whether there exists a set of substitutions that
makes them identical.
There is a straightforward recursive procedure, called the unification algorithm, that
does it.
Algorithm: Unify(L1, L2)
1. If L1 or L2 are both variables or constants, then:
1. If L1 and L2 are identical, then return NIL.
2. Else if L1 is a variable, then if L1 occurs in L2 then return {FAIL}, else
return (L2/L1).
3. Also, Else if L2 is a variable, then if L2 occurs in L1 then return {FAIL}, else
return (L1/L2). d. Else return {FAIL}.
2. If the initial predicate symbols in L1 and L2 are not identical, then return {FAIL}.
3. If LI and L2 have a different number of arguments, then return {FAIL}.
4. Set SUBST to NIL. (At the end of this procedure, SUBST will contain all the
substitutions used to unify L1 and L2.)
5. For I ← 1 to the number of arguments in L1 :
1. Call Unify with the ith argument of L1 and the ith argument of L2, putting the
result in S.
2. If S contains FAIL then return {FAIL}.
3. If S is not equal to NIL then:
2. Apply S to the remainder of both L1 and L2.
3. SUBST: = APPEND(S, SUBST).
6. Return SUBST.
Resolution in Predicate Logic
We can now state the resolution algorithm for predicate logic as follows, assuming a set of given
statements F and a statement to be proved P:
Algorithm: Resolution
1. Convert all the statements of F to clause form.
2. Negate P and convert the result to clause form. Add it to the set of clauses obtained in 1.
3. Repeat until a contradiction found, no progress can make, or a predetermined amount of
effort has expanded.
1. Select two clauses. Call these the parent clauses.
2. Resolve them together. The resolvent will the disjunction of all the literals of
both parent clauses with appropriate substitutions performed and with the
following exception: If there is one pair of literals T1 and ¬T2 such that one of
the parent clauses contains T2 and the other contains T1 and if T1 and T2 are
unifiable, then neither T1 nor T2 should appear in the resolvent. We call T1 and
T2 Complementary literals. Use the substitution produced by the unification to
create the resolvent. If there is more than one pair of complementary literals, only
one pair should omit from the resolvent.
3. If the resolvent is an empty clause, then a contradiction has found. Moreover, If it
is not, then add it to the set of classes available to the procedure.
Resolution Procedure
Resolution is a procedure, which gains its efficiency from the fact that it operates
on statements that have been converted to a very convenient standard form.
Resolution produces proofs by refutation.
In other words, to prove a statement (i.e., to show that it is valid), resolution attempts to
show that the negation of the statement produces a contradiction with the known
statements (i.e., that it is unsatisfiable).
The resolution procedure is a simple iterative process: at each step, two clauses, called
the parent clauses, are compared (resolved), resulting in a new clause that has inferred
from them. The new clause represents ways that the two parent clauses interact with
each other. Suppose that there are two clauses in the system:
winter V summer
¬ winter V cold
Now we observe that precisely one of winter and ¬ winter will be true at any point.
If winter is true, then cold must be true to guarantee the truth of the second clause. If ¬
winter is true, then summer must be true to guarantee the truth of the first clause.
Thus we see that from these two clauses we can deduce summer V cold
This is the deduction that the resolution procedure will make.
Resolution operates by taking two clauses that each contains the same literal, in this
example, winter.
Moreover, The literal must occur in the positive form in one clause and in negative
form in the other. The resolvent obtained by combining all of the literals of the two
parent clauses except the ones that cancel.
If the clause that produced is the empty clause, then a contradiction has found.
For example, the two clauses
winter
¬ winter
will produce the empty clause.
P⇒ Q
know P is true, and we know that P implies Q, then we can conclude Q.
P
(modus ponens)
Q
The propositions above the line are called premises; the proposition below the line is
the conclusion. Both the premises and the conclusion may contain metavariables (in this case, P
and Q) representing arbitrary propositions. When an inference rule is used as part of a proof,
the metavariables are replaced in a consistent way with the appropriate kind of object (in this
case, propositions).
Most rules come in one of two flavors: introduction or elimination rules. Introduction rules
elimination rule for ⇒. On the right-hand side of a rule, we often write the name of the rule.
introduce the use of a logical operator, and elimination rules eliminate it. Modus ponens is an
also have written (⇒-elim) to indicate that this is the elimination rule for ⇒.
This is helpful when reading proofs. In this case, we have written (modus ponens). We could
under that assumption to try to derive Q. If we are successful, then we can conclude that P ⇒ Q.
In a proof, we are always allowed to introduce a new assumption P, then reason under that
assumption. We must give the assumption a name; we have used the name x in the example
below. Each distinct assumption must have a different name.
(assum)
[x : P]
Because it has no premises, this rule can also start a proof. It can be used as if the proposition P
were proved. The name of the assumption is also indicated here.
However, you do not get to make assumptions for free! To get a complete proof, all assumptions
be proved under the assumption P, then the implication P ⇒ Q holds without any assumptions.
We write x in the rule name to show which assumption is discharged. This rule and modus
ponens are the introduction and elimination rules for implications.
⋮ P⇒ Q
[x : P]
P
(⇒-elim, modus ponens)
Q
P⇒ Q
Q
(⇒-intro/x)
A proof is valid only if every assumption is eventually discharged. This must happen in the proof
tree below the assumption. The same assumption can be used more than once.
P∨ Q P⇒ R Q ⇒ R (∨-
Rules for Disjunction
P∨ P∨
P (∨-intro- Q (∨-intro-
left) right) R elim)
Q Q
P⇒ ⊥
P ⇒ ⊥ (¬-elim)
¬P
¬P (¬-intro)
⋮ ⊥
[x : ¬P]
⊥
(ex falso quodlibet, EFQ)
P
P (reductio ad absurdum, RAA/x)
middle, P ∨ ¬P. We will take it as an axiom in our system. The Latin name for this rule
Another classical tautology that is not intuitionistically valid is the the law of the excluded
P ∨ ¬P
(magic)
Proofs
A proof of proposition P in natural deduction starts from axioms and assumptions and derives
P with all assumptions discharged. Every step in the proof is an instance of an inference rule
with metavariables substituted consistently with expressions of the appropriate syntactic class.
Example
For example, here is a proof of the proposition (A ⇒ B ⇒ C) ⇒ (A 𝖠 B ⇒ C).
A proposition that has a complete proof in a deductive system is called a theorem of that system.
Soundness and Completeness
A measure of a deductive system's power is whether it is powerful enough to prove all true
statements. A deductive system is said to be complete if all true statements are theorems (have
proofs in the system). For propositional logic and natural deduction, this means that all
tautologies must have natural deduction proofs. Conversely, a deductive system is
called sound if all theorems are true. The proof rules we have given above are in fact sound and
complete for propositional logic: every theorem is a tautology, and every tautology is a theorem.
Finding a proof for a given tautology can be difficult. But once the proof is found, checking that
it is indeed a proof is completely mechanical, requiring no intelligence or insight whatsoever. It
is therefore a very strong argument that the thing proved is in fact true.
We can also make writing proofs less tedious by adding more rules that provide reasoning
shortcuts. These rules are sound if there is a way to convert a proof using them into a proof using
the original rules. Such added rules are called admissible.
For example the PROLOG clause P(x): – Q(x, y) is equal to logical expression ∀x: ∃y: Q
in the consequent are still universally quantified.
(x, y) → P(x).
The difference between the logic and PROLOG representation is that the
PROLOG interpretation has a fixed control strategy. And so, the assertions in the
PROLOG program define a particular search path to answer any question.
But, the logical assertions define only the set of answers but not about how to choose
among those answers if there is more than one.
Consider the following example:
b ←c 𝖠 ∼ aba.
rule of the form
where aba is an atom that means abnormal with respect to some aspect a. Given c, the agent can
infer bunless it is told aba. Adding aba to the knowledge base can prevent the conclusion of b.
Rules that imply abacan be used to prevent the default under the conditions of the body of the
rule.
Example 5.27: Suppose the purchasing agent is investigating purchasing holidays. A resort may
be adjacent to a beach or away from a beach. This is not symmetric; if the resort was adjacent to
away_from_beach ← ∼ on_beach.
a beach, the knowledge provider would specify this. Thus, it is reasonable to have the clause
This clause enables an agent to infer that a resort is away from the beach if the agent is not told it
is adjacent to a beach.
A cooperative system tries to not mislead. If we are told the resort is on the beach, we would
expect that resort users would have access to the beach. If they have access to a beach, we
We could say that British Columbia is abnormal with respect to swimming near cities:
abno_swimming_near_city ←in_BC 𝖠 ∼ abBC_beaches.
Given only the preceding rules, an agent infers away_from_beach. If it is then told on_beach, it
can no longer infer away_from_beach, but it can now infer beach_access and swim_at_beach. If
it is also told enclosed_bay and big_city, it can no longer infer swim_at_beach. However, if it is
then told in_BC, it can then infer swim_at_beach.
By having defaults of what is normal, a user can interact with the system by telling it what is
abnormal, which allows for economy in communication. The user does not have to state the
obvious.
One way to think about non-monotonic reasoning is in terms of arguments. The rules can be
used as components of arguments, in which the negated abnormality gives a way to undermine
arguments. Note that, in the language presented, only positive arguments exist that can be
undermined. In more general theories, there can be positive and negative arguments that attack
each other.
Implementation Issues
Evolution Frames
As seen in the previous example, there are certain problems which are difficult to solve
with Semantic Nets.
Although there is no clear distinction between a semantic net and frame system,
more structured the system is, more likely it is to be termed as a frame system.
A frame is a collection of attributes (called slots) and associated values that describe
some entities in the world. Sometimes a frame describes an entity in some absolute
sense;
Sometimes it represents the entity from a particular point of view only.
A single frame taken alone is rarely useful; we build frame systems out of collections of
frames that connected to each other by virtue of the fact that the value of an attribute of
one frame may be another frame.
Frames as Sets and Instances
The set theory is a good basis for understanding frame systems.
Each frame represents either a class (a set) or an instance (an element of class)
Both isa and instance relations have inverse attributes, which we call subclasses & all
instances.
As a class represents a set, there are 2 kinds of attributes that can be associated with it.
1. Its own attributes &
2. Attributes that are to be inherited by each element of the set.
Frames as Sets and Instances
Sometimes, the difference between a set and an individual instance may not be clear.
Example: Team India is an instance of the class of Cricket Teams and can also think of
as the set of players.
Now the problem is if we present Team India as a subclass of Cricket teams, then
Indian players automatically become part of all the teams, which is not true.
So, we can make Team India a subclass of class called Cricket Players.
To do this we need to differentiate between regular classes and meta-classes.
Regular Classes are those whose elements are individual entities whereas Meta-classes
are those special classes whose elements are themselves, classes.
The most basic meta-class is the class CLASS.
It represents the set of all classes.
All classes are instances of it, either directly or through one of its subclasses.
The class CLASS introduces the attribute cardinality, which is to inherited by all
instances of CLASS. Cardinality stands for the number.
Other ways of Relating Classes to Each Other
We have discussed that a class1 can be a subset of class2.
If Class2 is a meta-class then Class1 can be an instance of Class2.
Another way is the mutually-disjoint-with relationship, which relates a class to one or
more other classes that guaranteed to have no elements in common with it.
Another one is, is-covered-by which relates a class to a set of subclasses, the union
of which is equal to it.
If a class is-covered-by a set S of mutually disjoint classes, then S called a partition of
the class.
Slots as Full-Fledged Objects (Frames)
Till now we have used attributes as slots, but now we will represent attributes explicitly and
describe their properties.
Some of the properties we would like to be able to represent and use in reasoning include,
The class to which the attribute can attach.
Constraints on either the type or the value of the attribute.
A default value for the attribute. Rules for inheriting values for the attribute.
To be able to represent these attributes of attributes, we need to describe attributes (slots)
as frames.
These frames will organize into an isa hierarchy, just as any other frames, and
that hierarchy can then used to support inheritance of values for attributes of slots.
Now let us formalize what is a slot. A slot here is a relation.
It maps from elements of its domain (the classes for which it makes sense) to elements of
its range (its possible values).
A relation is a set of ordered pairs.
Thus it makes sense to say that relation R1 is a subset of another relation R2.
In that case, R1 is a specialization of R2. Since a slot is a set, the set of all slots, which
we will call SLOT, is a meta-class.
Its instances are slots, which may have sub-slots.
Frame Example
In this example, the frames Person, Adult-Male, ML-Baseball-Player (corresponding to major
league baseball players), Pitcher, and ML-Baseball-Team (for major league baseball team) are all
classes.
Semantic Nets
Inheritance property can represent using isa and instance
Monotonic Inheritance can perform substantially more efficiently with such
structures than with pure logic, and non-monotonic inheritance is also easily
supported.
The reason that makes Inheritance easy is that the knowledge in slot and filler systems
is structured as a set of entities and their attributes.
These structures turn out to be useful as,
It indexes assertions by the entities they describe. As a result, retrieving the value for
an attribute of an entity is fast.
Moreover, It makes easy to describe properties of relations. To do this in a purely
logical system requires higher-order mechanisms.
It is a form of object-oriented programming and has the advantages that such
systems normally include modularity and ease of viewing by people.
Here we would describe two views of this kind of structure – Semantic Nets & Frames.
Semantic Nets
There are different approaches to knowledge representation include semantic net, frames,
and script.
The semantic net describes both objects and events.
In a semantic net, information represented as a set of nodes connected to each other by
a set of labeled arcs, which represents relationships among the nodes.
It is a directed graph consisting of vertices which represent concepts and edges
which represent semantic relations between the concepts.
It is also known as associative net due to the association of one node with other.
The main idea is that the meaning of the concept comes from the ways in which it
connected to other concepts.
We can use inheritance to derive additional relations.
Figure: A Semantic Network
Intersection Search Semantic Nets
We try to find relationships among objects by spreading activation out from each of
two nodes. And seeing where the activation meets.
Using this we can answer the questions like, what is the relation between India and Blue.
It takes advantage of the entity-based organization of knowledge that slot and
filler representation provides.
Representing Non-binary Predicates Semantic Nets
Simple binary predicates like isa(Person, Mammal) can represent easily by semantic nets
but other non-binary predicates can also represent by using general-purpose predicates
such as isa and instance.
Three or even more place predicates can also convert to a binary form by creating
one new object representing the entire predicate statement and then introducing
binary predicates to describe a relationship to this new object.
Conceptual Dependency
Introduction to Strong Slot and Filler Structures
The main problem with semantic networks and frames is that they lack formality; there
is no specific guideline on how to use the representations.
In frame when things change, we need to modify all frames that are relevant – this can be
time-consuming.
Strong slot and filler structures typically represent links between objects according to
more rigid rules, specific notions of what types of object and relations between them are
provided and represent knowledge about common situations.
Moreover, We have types of strong slot and filler structures:
1. Conceptual Dependency (CD)
2. Scripts
3. Cyc
Conceptual Dependency (CD)
Conceptual Dependency originally developed to represent knowledge acquired from natural
language input.
The goals of this theory are:
To help in the drawing of the inference from sentences.
To be independent of the words used in the original input.
That is to say: For any 2 (or more) sentences that are identical in meaning there should
be only one representation of that meaning.
Moreover, It has used by many programs that portend to understand English (MARGIE, SAM,
PAM).
Conceptual Dependency (CD) provides:
A structure into which nodes representing information can be placed.
Also, A specific set of primitives.
A given level of granularity.
Sentences are represented as a series of diagrams depicting actions using both abstract and real
physical situations.
The agent and the objects represented.
Moreover, The actions are built up from a set of primitive acts which can modify by
tense.
CD is based on events and actions. Every event (if applicable) has:
an ACTOR o an ACTION performed by the Actor
Also, an OBJECT that the action performs on
A DIRECTION in which that action is oriented
These are represented as slots and fillers. In English sentences, many of these attributes left out.
A Simple Conceptual Dependency Representation
For the sentences, “I have a book to the man” CD representation is as follows:
Scripts
Scripts Strong Slot
A script is a structure that prescribes a set of circumstances which could be expected to
follow on from one another.
It is similar to a thought sequence or a chain of situations which could be anticipated.
It could be considered to consist of a number of slots or frames but with more
specialized roles.
Scripts are beneficial because:
Events tend to occur in known runs or patterns.
Causal relationships between events exist.
Entry conditions exist which allow an event to take place
Prerequisites exist for events taking place. E.g. when a student progresses through
a degree scheme or when a purchaser buys a house.
Script Components
Each script contains the following main components.
Entry Conditions: Must be satisfied before events in the script can occur.
Results: Conditions that will be true after events in script occur.
Props: Slots representing objects involved in the events.
Roles: Persons involved in the events.
Track: the Specific variation on the more general pattern in the script. Different tracks
may share many components of the same script but not all.
Scenes: The sequence of events that occur. Events represented in conceptual dependency
form.
Advantages and Disadvantages of Script
Advantages
Capable of predicting implicit events
Single coherent interpretation may be build up from a collection of observations.
Disadvantage
More specific (inflexible) and less general than frames.
Not suitable to represent all kinds of knowledge.
To deal with inflexibility, smaller modules called memory organization packets (MOP)
can combine in a way that appropriates for the situation.
Script Example
CYC
What is CYC?
An ambitious attempt to form a very large knowledge base aimed at
capturing commonsense reasoning.
Initial goals to capture knowledge from a hundred randomly selected articles in
the Encyclopedia Britannica.
Also, Both Implicit and Explicit knowledge encoded.
Moreover, Emphasis on study of underlying information (assumed by the authors but
not needed to tell to the readers.
Example: Suppose we read that Wellington learned of Napoleon’s death Then we (humans)
can conclude Napoleon never new that Wellington had died.
How do we do this?
So, We require special implicit knowledge or commonsense such as:
We only die once.
You stay dead.
Moreover, You cannot learn anything when dead.
Time cannot go backward.
Why build large knowledge
bases:
1. Brittleness
Specialised knowledge bases are brittle. Hard to encode new situations and non-
graceful degradation in performance. Commonsense based knowledge bases
should have a firmer foundation.
2. Form and Content
Moreover, Knowledge representation may not be suitable for AI. Commonsense
strategies could point out where difficulties in content may affect the form.
3. Shared Knowledge
Also, Should allow greater communication among systems with common bases
and assumptions.
How is CYC coded?
By hand.
Special CYCL language:
LISP-like.
Frame-based
Multiple inheritances
Slots are fully fledged objects.
Generalized inheritance — any link not just isa and instance.
Module 2
Game Playing:
Game Playing
Charles Babbage, the nineteenth-century computer architect thought about
programming his analytical engine to play chess and later of building a machine to play
tic-tac-toe.
There are two reasons that games appeared to be a good domain.
1. They provide a structured task in which it is very easy to measure success or
failure.
2. They are easily solvable by straightforward search from the starting state to
a winning position.
The first is true is for all games bust the second is not true for all, except simplest games.
For example, consider chess.
The average branching factor is around 35. In an average game, each player might
make 50.
So in order to examine the complete game tree, we would have to examine 35100
Thus it is clear that a simple search is not able to select even its first move during
the lifetime of its opponent.
It is clear that to improve the effectiveness of a search based problem-solving
program two things can do.
1. Improve the generate procedure so that only good moves generated.
2. Improve the test procedure so that the best move will recognize and explored first.
If we use legal-move generator then the test procedure will have to look at each of
them because the test procedure must look at so many possibilities, it must be fast.
Instead of the legal-move generator, we can use plausible-move generator in which only
some small numbers of promising moves generated.
As the number of lawyers available moves increases, it becomes increasingly
important in applying heuristics to select only those moves that seem more promising.
The performance of the overall system can improve by adding heuristic knowledge
into both the generator and the tester.
In game playing, a goal state is one in which we win but the game like chess. It is
not possible. Even we have good plausible move generator.
The depth of the resulting tree or graph and its branching factor is too great.
It is possible to search tree only ten or twenty moves deep then in order to choose the
best move. The resulting board positions must compare to discover which is most
advantageous.
This is done using static evolution function, which uses whatever information it has to
evaluate individual board position by estimating how likely they are to lead eventually
to a win.
Its function is similar to that of the heuristic function h’ in the A* algorithm: in the
absence of complete information, choose the most promising position.
return false
An important thing to note is, we visit top level nodes multiple times. The last (or max depth)
level is visited once, second last level is visited twice, and so on. It may seem expensive, but it
turns out to be not so costly, since in a tree most of the nodes are in the bottom level. So it does
not matter much if the upper levels are visited multiple times.
Planning
Blocks World Problem
In order to compare the variety of methods of planning, we should find it useful to look at all of
them in a single domain that is complex enough that the need for each of the mechanisms is
apparent yet simple enough that easy-to-follow examples can be found.
There is a flat surface on which blocks can be placed.
There are a number of square blocks, all the same size.
They can be stacked one upon the other.
There is robot arm that can manipulate the blocks.
Actions of the robot arm
1. UNSTACK(A, B): Pick up block A from its current position on block B.
2. STACK(A, B): Place block A on block B.
3. PICKUP(A): Pick up block A from the table and hold it.
4. PUTDOWN(A): Put block A down on the table.
Notice that the robot arm can hold only one block at a time.
Predicates
In order to specify both the conditions under which an operation may be performed
and the results of performing it, we need the following predicates:
1. ON(A, B): Block A is on Block B.
2. ONTABLES(A): Block A is on the table.
3. CLEAR(A): There is nothing on the top of Block A.
4. HOLDING(A): The arm is holding Block A.
5. ARMEMPTY: The arm is holding nothing.
Robot problem-solving systems (STRIPS)
List of new predicates that the operator causes to become true is ADD List
Moreover, List of old predicates that the operator causes to become false is DELETE List
PRECONDITIONS list contains those predicates that must be true for the operator to
be applied.
STRIPS style operators for BLOCKs World
STACK(x, y)
P: CLEAR(y)^HOLDING(x)
D: CLEAR(y)^HOLDING(x)
A: ARMEMPTY^ON(x, y)
UNSTACK(x, y)
PICKUP(x)
P: CLEAR(x) ^ ONTABLE(x) ^ARMEMPTY
D: ONTABLE(x) ^ ARMEMPTY
A: HOLDING(x)
PUTDOWN(x)
Goal Stack Planning
To start with goal stack is simply:
ON(C,A)^ON(B,D)^ONTABLE(A)^ONTABLE(D)
This problem is separate into four sub-problems, one for each component of the goal.
Two of the sub-problems ONTABLE(A) and ONTABLE(D) are already true in the initial state.
Complete plan
1. UNSTACK(C, A)
2. PUTDOWN(C )
3. PICKUP(A)
4. STACK(A, B)
5. UNSTACK(A, B)
6. PUTDOWN(A)
7. PICKUP(B)
8. STACK(B, C)
9. PICKUP(A)
10. STACK(A,B)
Planning Components
Methods which focus on ways of decomposing the original problem into appropriate
subparts and on ways of recording and handling interactions among the subparts as they
are detected during the problem-solving process are often called as planning.
Planning refers to the process of computing several steps of a problem-solving procedure
before executing any of them.
Components of a planning system
Choose the best rule to apply next, based on the best available heuristic information.
The most widely used technique for selecting appropriate rules to apply is first to isolate
a set of differences between desired goal state and then to identify those rules that are
relevant to reduce those differences.
If there are several rules, a variety of other heuristic information can be exploited
to choose among them.
Apply the chosen rule to compute the new problem state that arises from its application.
In simple systems, applying rules is easy. Each rule simply specifies the problem state
that would result from its application.
In complex systems, we must be able to deal with rules that specify only a small part
of the complete problem state.
One way is to describe, for each action, each of the changes it makes to the state
description.
Detect when a solution has found.
A planning system has succeeded in finding a solution to a problem when it has found
a sequence of operators that transform the initial problem state into the goal state.
How will it know when this has done?
In simple problem-solving systems, this question is easily answered by a
straightforward match of the state descriptions.
One of the representative systems for planning systems is predicate logic. Suppose that
as a part of our goal, we have the predicate P(x).
To see whether P(x) satisfied in some state, we ask whether we can prove P(x) given
the assertions that describe that state and the axioms that define the world model.
Detect dead ends so that they can abandon and the system’s effort directed in more fruitful
directions.
As a planning system is searching for a sequence of operators to solve a particular
problem, it must be able to detect when it is exploring a path that can never lead to
a solution.
The same reasoning mechanisms that can use to detect a solution can often use for
detecting a dead end.
If the search process is reasoning forward from the initial state. It can prune any path
that leads to a state from which the goal state cannot reach.
If search process reasoning backward from the goal state, it can also terminate a path
either because it is sure that the initial state cannot reach or because little progress
made.
Detect when an almost correct solution has found and employ special techniques to make it
totally correct.
The kinds of techniques discussed are often useful in solving nearly decomposable
problems.
One good way of solving such problems is to assume that they are completely
decomposable, proceed to solve the sub-problems separately. And then check that
when the sub-solutions combined. They do in fact give a solution to the original
problem.
Hierarchical Planning
In order to solve hard problems, a problem solver may have to generate long plans.
It is important to be able to eliminate some of the details of the problem until a
solution that addresses the main issues is found.
Then an attempt can make to fill in the appropriate details.
Early attempts to do this involved the use of macro operators, in which larger operators
were built from smaller ones.
In this approach, no details eliminated from actual descriptions of the operators.
ABSTRIPS
A better approach developed in ABSTRIPS systems which actually planned in a hierarchy of
abstraction spaces, in each of which preconditions at a lower level of abstraction ignored.
ABSTRIPS approach is as follows:
First solve the problem completely, considering only preconditions whose criticality
value is the highest possible.
These values reflect the expected difficulty of satisfying the precondition.
To do this, do exactly what STRIPS did, but simply ignore the preconditions of
lower than peak criticality.
Once this done, use the constructed plan as the outline of a complete plan and
consider preconditions at the next-lowest criticality level.
Augment the plan with operators that satisfy those preconditions.
Because this approach explores entire plans at one level of detail before it looks at the
lower-level details of any one of them, it has called length-first approach.
The assignment of appropriate criticality value is crucial to the success of this
hierarchical planning method.
Those preconditions that no operator can satisfy are clearly the most critical.
Example, solving a problem of moving the robot, for applying an operator, PUSH-THROUGH
DOOR, the precondition that there exist a door big enough for the robot to get through is of
high criticality since there is nothing we can do about it if it is not true.
Consider an example of an English sentence which is being used for communication with a
keyword based data retrieval system. Suppose I want to know all about the temples in India. So I
would need to be translated into a representation such as The above sentence is a simple sentence
for which the corresponding representation may be easy to implement. But what for the complex
queries?
This type of complex queries can be modeled with the conceptual dependency representation
which is more complex than that of simple representation. Constructing these queries is very
difficult since more informationare to be extracted. Extracting more information will require
some more knowledge. Also the type of mapping process is not quite easy to the problem solver.
Understanding is the process of mapping an input from its original form to a more useful one.
The simplest kind of mapping is “one-toone”.
In one-to-one mapping each different problems would lead to only one solution. But there are
very few inputs which are one-to-one. Other mappings are quite difficult to implement. Many-to-
one mappings are frequent is that free variation is often allowed, either because of the physical
limitations of that produces the inputs or because such variation simply makes the task of
generating the inputs.
Many to one mapping require that the understanding system know about all the ways that a
target representation can be expressed in the source language. One-to-many mapping requires a
great deal of domain knowledge in order to make the correct choice among the available target
representation.
The mapping process is simplest if each component can be mapped without concern for the other
components of the statement. If the number of interactions increases, then the complexity of the
problem will increase. In many understanding situations the input to which meaning should be
assigned is not always the input that is presented to the under stander.
Because of the complex environment in which understanding usually occurs, other things often
interfere with the basic input before it reaches the under stander. Hence the understanding will be
more complex if there will be some sort of noise on the inputs.
Semantic Analysis
The semantic analysis must do two important things:
1. It must map individual words into appropriate objects in the knowledge base or
database.
2. It must create the correct structures to correspond to the way the meanings of
the individual words combine with each other.
Discourse Integration
Specifically, we do not know whom the pronoun “I” or the proper noun “Bill” refers to.
To pin down these references requires an appeal to a model of the current discourse
context, from which we can learn that the current user is USER068 and that the only
person named “Bill” about whom we could be talking is USER073.
Once the correct referent for Bill known, we can also determine exactly which
file referred to.
Pragmatic Analysis
The final step toward effective understanding is to decide what to do as a result.
One possible thing to do to record what was said as a fact and done with it.
For some sentences, a whose intended effect is clearly declarative, that is the
precisely correct thing to do.
But for other sentences, including this one, the intended effect is different.
We can discover this intended effect by applying a set of rules that characterize
cooperative dialogues.
The final step in pragmatic processing to translate, from the knowledge-based
representation to a command to be executed by the system.
Syntactic Processing
Syntactic Processing is the step in which a flat input sentence converted into a
hierarchical structure that corresponds to the units of meaning in the sentence.
This process called parsing.
It plays an important role in natural language understanding systems for two reasons:
1. Semantic processing must operate on sentence constituents. If there is no syntactic
parsing step, then the semantics system must decide on its own constituents. If
parsing is done, on the other hand, it constrains the number of constituents that
semantics can consider.
2. Syntactic parsing is computationally less expensive than is semantic
processing. Thus it can play a significant role in reducing overall system
complexity.
Although it is often possible to extract the meaning of a sentence without using
grammatical facts, it is not always possible to do so.
Almost all the systems that are actually used have two main components:
1. A declarative representation, called a grammar, of the syntactic facts about
the language.
2. A procedure, called parser that compares the grammar against input sentences to
produce parsed structures.
Grammars and Parsers
The most common way to represent grammars is a set of production rules.
The first rule can read as “A sentence composed of a noun phrase followed by
Verb Phrase”; the Vertical bar is OR; ε represents the empty string.
Symbols that further expanded by rules called non-terminal symbols.
Symbols that correspond directly to strings that must found in an input sentence
called terminal symbols.
Grammar formalism such as this one underlies many linguistic theories, which in turn
provide the basis for many natural language understanding systems.
Pure context-free grammars are not effective for describing natural languages.
NLPs have less in common with computer language processing systems such
as compilers.
Parsing process takes the rules of the grammar and compares them against the
input sentence.
The simplest structure to build is a Parse Tree, which simply records the rules and
how they matched.
Every node of the parse tree corresponds either to an input word or to a non-terminal in
our grammar.
Each level in the parse tree corresponds to the application of one grammar rule.
Example for Syntactic Processing – Augmented
Transition Network
Syntactic Processing is the step in which a flat input sentence is converted into a hierarchical
structure that corresponds to the units of meaning in the sentence. This process called parsing.
It plays an important role in natural language understanding systems for two reasons:
1. Semantic processing must operate on sentence constituents. If there is no syntactic
parsing step, then the semantics system must decide on its own constituents. If parsing
is done, on the other hand, it constrains the number of constituents that semantics can
consider.
2. Syntactic parsing is computationally less expensive than is semantic processing. Thus
it can play a significant role in reducing overall system complexity.
Example: A Parse tree for a sentence: Bill Printed the file
Semantic Analysis
The structures created by the syntactic analyzer assigned meanings.
A mapping made between the syntactic structures and objects in the task domain.
Structures for which no such mapping is possible may rejected.
The semantic analysis must do two important things:
It must map individual words into appropriate objects in the knowledge base or
database.
It must create the correct structures to correspond to the way the meanings of the
individual words combine with each other. Semantic Analysis AI
Producing a syntactic parse of a sentence is only the first step toward understanding it.
We must produce a representation of the meaning of the sentence.
Because understanding is a mapping process, we must first define the language
into which we are trying to map.
There is no single definitive language in which all sentence meaning can describe.
The choice of a target language for any particular natural language
understanding program must depend on what is to do with the meanings once
they constructed.
Choice of the target language in Semantic Analysis AI
There are two broad families of target languages that used in NL systems,
depending on the role that the natural language system playing in a larger system:
When natural language considered as a phenomenon on its own, as for example
when one builds a program whose goal is to read the text and then answer
questions about it. A target language can design specifically to support
language processing.
When natural language used as an interface language to another program (such
as a db query system or an expert system), then the target language must legal
input to that other program. Thus the design of the target language driven by the
backend program.
Module 3
LEARNING
Learning is the improvement of performance with experience over time.
Learning element is the portion of a learning AI system that decides how to modify the
performance element and implements those modifications.
We all learn new knowledge through different methods, depending on the type of material to be
learned, the amount of relevant knowledge we already possess, and the environment in which the
learning takes place. There are five methods of learning . They are,
1. Memorization (rote learning)
2. Direct instruction (by being told)
3. Analogy
4. Induction
5. Deduction
Learning by memorizations is the simplest from of le4arning. It requires the least amount of
inference and is accomplished by simply copying the knowledge in the same form that it will be
used directly into the knowledge base.
Example:- Memorizing multiplication tables, formulate , etc.
Direct instruction is a complex form of learning. This type of learning requires more inference
than role learning since the knowledge must be transformed into an operational form before
learning when a teacher presents a number of facts directly to us in a well organized manner.
Analogical learning is the process of learning a new concept or solution through the use of
similar known concepts or solutions. We use this type of learning when solving problems on an
exam where previously learned examples serve as a guide or when make frequent use of
analogical learning. This form of learning requires still more inferring than either of the previous
forms. Since difficult transformations must be made between the known and unknown situations.
Learning by induction is also one that is used frequently by humans . it is a powerful form of
learning like analogical learning which also require s more inferring than the first two methods.
This learning re quires the use of inductive inference, a form of invalid but useful inference. We
use inductive learning ofinstances of examples of the concept. For example we learn the
concepts of color or sweet taste after experiencing the sensations associated with several
examples of colored objects or sweet foods.
Deductive learning is accomplished through a sequence of deductive inference steps using
known facts. From the known facts, new facts or relationships are logically derived. Deductive
learning usually requires more inference than the other methods.
Review Questions:-
1. what is perception ?
2. How do we overcome the Perceptual Problems?
3. Explain in detail the constraint satisfaction waltz algorithm?
4. What is learning ?
5. What is Learning element ?
6. List and explain the methods of learning?
Types of learning:- Classification or taxonomy of learning types serves as a guide in studying or
comparing a differences among them. One can develop learning taxonomies based on the type
of knowledge representation used (predicate calculus , rules, frames), the type of knowledge
learned (concepts, game playing, problem solving), or by the area of application(medical
diagnosis , scheduling , prediction and so on).
The classification is intuitively more appealing and is one which has become popular among
machine learning researchers . it is independent of the knowledge domain and the representation
scheme is used. It is based on the type of inference strategy employed or the methods used in the
learning process. The five different learning methods under this taxonomy are:
Memorization (rote learning)
Direct instruction(by being told)
Analogy
Induction
Deduction
Learning by memorization is the simplest form of learning. It requires the least5 amount of
inference and is accomplished by simply copying the knowledge in the same form that it will be
used directly into the knowledge base. We use this type of learning when we memorize
multiplication tables ,
for example.
A slightly more complex form of learning is by direct instruction. This type of learning
requires more understanding and inference than role learning since the knowledge must be
transformed into an operational form before being integrated into the knowledge base. We use
this type of learning when a teacher presents a number of facts directly to us in a well
organized manner.
The third type listed, analogical learning, is the process of learning an ew concept or solution
through the use of similar known concepts or solutions. We use this type of learning when
solving problems on an examination where previously learned examples serve as a guide or
when we learn to drive a truck using our knowledge of car driving. We make frewuence use of
analogical learning. This form of learning requires still more inferring than either of the previous
forms, since difficult transformations must be made between the known and unknown situations.
This is a kind of application of knowledge in a new situation.
The fourth type of learning is also one that is used frequency by humans. It is a powerful form
of learning which, like analogical learning, also requires more inferring than the first two
methods. This form of learning requires the use of inductive inference, a form of invalid but
useful inference. We use inductive learning when wed formulate a general concept after seeing a
number of instance or examples of the concept. For example, we learn the concepts of color
sweet taste after experiencing the sensation associated with several examples of colored objects
or sweet foods.
The final type of acquisition is deductive learning. It is accomplished through a sequence of
deductive inference steps using known facts. From the known facts, new facts or relationships
are logically derived. Deductive learning usually requires more inference than the other
methods. The inference method used is, of course , a deductive type, which is a valid from of
inference.
In addition to the above classification, we will sometimes refer to learning methods as wither
methods or knowledge-rich methods. Weak methods are general purpose methods in which little
or no initial knowledge is available. These methods are more mechanical than the classical AI
knowledge – rich methods. They often rely on a form of heuristics search in the learning process.
Rote Learning
Rote learning is the basic learning activity. Rote learning is a memorization technique based
on repetition. It is also called memorization because the knowledge, without any modification is,
simply copied into the knowledge base. As computed values are stored, this technique can save a
significant amount of time.
Rote learning technique can also be used in complex learning systems provided sophisticated
techniques are employed to use the stored values faster and there is a generalization to keep the
number of stored information down to a manageable level. Checkers-playing program, for ex
The idea is that one will be able to quickly recall the meaning of the material the more one
repeats it. Some of the alternatives to rote learning include meaningful learning, associative
learning, and active learning. ample, uses this technique to learn the board positions it evaluates
in its look-ahead search.
Learning By Taking Advice.
This is a simple form of learning. Suppose a programmer writes a set of instructions to instruct
the computer what to do, the programmer is a teacher and the computer is a student. Once
learned (i.e. programmed), the system will be in a position to do new things.
The advice may come from many sources: human experts, internet to name a few. This type of
learning requires more inference than rote learning. The knowledge must be transformed into an
operational form before stored in the knowledge base. Moreover the reliability of the source of
knowledge should be considered.
The system should ensure that the new knowledge is conflicting with the existing knowledge.
FOO (First Operational Operationaliser), for example, is a learning system which is used to learn
the game of Hearts. It converts the advice which is in the form of principles, problems, and
methods into effective executable (LISP) procedures (or knowledge). Now this knowledge is
ready to use.
general learning model is depicted in figure 4.1 where the environment has been included as a
part of the overall learner system. The environment may be regarded as either a form of nature
which produces random stimuli or as a more organized training source such as a teacher which
provides carefully selected training examples for the learner component. The actual form of
environment used will depend on the particular learning paradigm. In any case, some
representation language must be assumed for communication between the environment and the
learner. The language may be the same representation scheme as that used in the knowledge base
(such as a form of predicate calculus). When they are hosen to be the same, we say the single
representation trick is being used. This usually results in a simpler implementation since it is not
necessary to transform between two or more different representations.
For some systems the environment may be a user working at a keyboard . Other systems will use
program modules to simulate a particular environment. In even more realistic cases the system
will have real physical sensors which interface with some world environment.
Inputs to the learner component may be physical stimuli of some type or descriptive , symbolic
training examples. The information conveyed to the learner component is used to create and
modify knowledge structures in the knowledge base. This same knowledge is used by the
performance component to carry out some tasks, such as solving a problem playing a game, or
classifying instances of some concept.
given a task, the performance component produces a response describing its action in
performing the task. The critic module then evaluates this response relative to an optimal
response.
Feedback , indicating whether or not the performance was acceptable , is then sent by the critic
module to the learner component for its subsequent use in modifying the structures in the
knowledge base. If proper learning was accomplished, the system’s performance will have
improved with the changes made to the knowledge base.
The cycle described above may be repeated a number of times until the performance of the
system has reached some acceptable level, until a known learning goal has been reached, or until
changes ceases to occur in the knowledge base after some chosen number of training examples
have been observed.
There are several important factors which influence a system’s ability to learn in addition to the
form of representation used. They include the types of training provided, the form and extent of
any initial background knowledge , the type of feedback provided, and the learning algorithms
used.
The type of training used in a system can have a strong effect on performance, much the same as
it does for humans. Training may consist of randomly selected instance or examples that have
been carefully selected and ordered for presentation. The instances may be positive examples of
some concept or task a being learned, they may be negative, or they may be mixture of both
positive and negative. The instances may be well focused using only relevant information, or
they may contain a variety of facts and details including irrelevant data.
There are Many forms of learning can be characterized as a search through a space of possible
hypotheses or solutions. To make learning more efficient. It is necessary to constrain this search
process or reduce the search space. One method of achieving this is through the use of
background knowledge which can be used to constrain the search space or exercise control
operations which limit the search process.
Feedback is essential to the learner component since otherwise it would never know if the
knowledge structures in the knowledge base were improving or if they were adequate for the
performance of the given tasks. The feedback may be a simple yes or no type of evaluation, or it
may contain more useful information describing why a particular action was good or bad. Also ,
the feedback may be completely reliable, providing an accurate assessment of the performance
or it may contain noise, that is the feedback may actually be incorrect some of the time.
Intuitively , the feedback must be accurate more than 50% of the time; otherwise the system
carries useful information, the learner should also to build up a useful corpus of knowledge
quickly. On the other hand, if the feedback is noisy or unreliable, the learning process may be
very slow and the resultant knowledge incorrect.
Expert systems:
Expert system = knowledge + problem-solving methods.....A knowledge base that captures
the domain-specific knowledge and an inference engine that consists of algorithms for
manipulating the knowledge represented in the knowledge base to solve a problem presented to
the system.
Expert systems (ES) are one of the prominent research domains of AI. It is introduced by
the researchers at Stanford University, Computer Science Department.
The expert systems are the computer applications developed to solve complex problems in a
particular domain, at the level of extra-ordinary human intelligence and expertise.
Knowledge Base
It contains domain-specific and high-quality knowledge. Knowledge is required to exhibit
intelligence. The success of any ES majorly depends upon the collection of highly accurate
and precise knowledge.
What is Knowledge?
The data is collection of facts. The information is organized as data and facts about the task
domain. Data, information, and past experience combined together are termed as knowledge.
Components of Knowledge Base
The knowledge base of an ES is a store of both, factual and heuristic knowledge.
Factual Knowledge − It is the information widely accepted by the Knowledge Engineers
and scholars in the task domain.
Heuristic Knowledge − It is about practice, accurate judgement, one’s ability
of evaluation, and guessing.
Knowledge representation
It is the method used to organize and formalize the knowledge in the knowledge base. It is in the
form of IF-THEN-ELSE rules.
Knowledge Acquisition
The success of any expert system majorly depends on the quality, completeness, and accuracy
of the information stored in the knowledge base.
The knowledge base is formed by readings from various experts, scholars, and the Knowledge
Engineers. The knowledge engineer is a person with the qualities of empathy, quick learning,
and case analyzing skills.
He acquires information from subject expert by recording, interviewing, and observing him at
work, etc. He then categorizes and organizes the information in a meaningful way, in the form
of IF-THEN-ELSE rules, to be used by interference machine. The knowledge engineer also
monitors the development of the ES.
Inference Engine
Use of efficient procedures and rules by the Inference Engine is essential in deducting a
correct, flawless solution.
In case of knowledge-based ES, the Inference Engine acquires and manipulates the knowledge
from the knowledge base to arrive at a particular solution.
In case of rule based ES, it −
Applies rules repeatedly to the facts, which are obtained from earlier rule application.
Adds new knowledge into the knowledge base if required.
Resolves rules conflict when multiple rules are applicable to a particular
case. To recommend a solution, the Inference Engine uses the following strategies −
Forward Chaining
Backward Chaining
Forward Chaining
It is a strategy of an expert system to answer the question, “What can happen next?”
Here, the Inference Engine follows the chain of conditions and derivations and finally deduces
the outcome. It considers all the facts and rules, and sorts them before concluding to a solution.
This strategy is followed for working on conclusion, result, or effect. For example, prediction
of share market status as an effect of changes in interest rates.
Backward Chaining
With this strategy, an expert system finds out the answer to the question, “Why this happened?”
On the basis of what has already happened, the Inference Engine tries to find out which
conditions could have happened in the past for this result. This strategy is followed for finding
out cause or reason. For example, diagnosis of blood cancer in humans.
User Interface
User interface provides interaction between user of the ES and the ES itself. It is generally
Natural Language Processing so as to be used by the user who is well-versed in the task domain.
The user of the ES need not be necessarily an expert in Artificial Intelligence.
It explains how the ES has arrived at a particular recommendation. The explanation may appear
in the following forms −
Natural language displayed on screen.
Verbal narrations in natural language.
Listing of rule numbers displayed on the screen.
The user interface makes it easy to trace the credibility of the
deductions. Requirements of Efficient ES User Interface
It should help users to accomplish their goals in shortest possible way.
It should be designed to work for user’s existing or desired work practices.
Its technology should be adaptable to user’s requirements; not the other way round.
It should make efficient use of user
input. Expert Systems Limitations
No technology can offer easy and complete solution. Large systems are costly, require
significant development time, and computer resources. ESs have their limitations which include
−
Limitations of the technology
Difficult knowledge acquisition
ES are difficult to maintain
High development costs
Applications of Expert System
The following table shows where ES can be applied.
Application Description
Expert System.
DEFINITION - An expert system is a computer program that simulates the judgement and
behavior of a human or an organization that has expert knowledge and experience in a particular
field. Typically, such a system contains a knowledge base containing accumulated experience
and a set of rules for applying the knowledge base to each particular situation that is described to
the program. Sophisticated expert systems can be enhanced with additions to the knowledge base
or to the set of rules.
Among the best-known expert systems have been those that play chess and that assist in medical
diagnosis.
factors
The MYCIN rule-based expert system introduced a quasi-probabilistic approach called certainty
factors, whose rationale is explained below.
A human, when reasoning, does not always make statements with 100% confidence: he might
venture, "If Fritz is green, then he is probably a frog" (after all, he might be a chameleon). This
type of reasoning can be imitated using numeric values called confidences. For example, if it is
known that Fritz is green, it might be concluded with 0.85 confidence that he is a frog; or, if it is
known that he is a frog, it might be concluded with 0.95 confidence that he hops. These certainty
factor (CF) numbers quantify uncertainty in the degree to which the available evidence supports
a hypothesis. They represent a degree of confirmation, and are not probabilities in a Bayesian
sense. The CF calculus, developed by Shortliffe & Buchanan, increases or decreases the CF
associated with a hypothesis as each new piece of evidence becomes available. It can be mapped
to a probability update, although degrees of confirmation are not expected to obey the laws of
probability. It is important to note, for example, that evidence for hypothesis H may have
nothing to contribute to the degree to which Not_h is confirmed or disconfirmed (e.g., although a
fever lends some support to a diagnosis of infection, fever does not disconfirm alternative
hypotheses) and that the sum of CFs of many competing hypotheses may be greater than one
(i.e., many hypotheses may be well confirmed based on available evidence).
The CF approach to a rule-based expert system design does not have a widespread following, in
part because of the difficulty of meaningfully assigning CFs a priori. (The above example of
green creatures being likely to be frogs is excessively naive.) Alternative approaches to quasi-
probabilistic reasoning in expert systems involve fuzzy logic, which has a firmer mathematical
foundation. Also, rule-engine shells such as Drools and Jess do not support probability
manipulation: they use an alternative mechanism called salience, which is used to prioritize the
order of evaluation of activated rules.
In certain areas, as in the tax-advice scenarios discussed below, probabilistic approaches are not
acceptable. For instance, a 95% probability of being correct means a 5% probability of being
wrong. The rules that are defined in such systems have no exceptions: they are only a means of
achieving software flexibility when external circumstances change frequently. Because rules
are stored as data, the core software does not need to be rebuilt each time changes to federal and
state tax codes are announced.
Chaining
Two methods of reasoning when using inference rules are forward chaining and backward
chaining.
Forward chaining starts with the data available and uses the inference rules to extract more data
until a desired goal is reached. An inference engine using forward chaining searches the
inference rules until it finds one in which the if clause is known to be true. It then concludes the
then clause and adds this information to its data. It continues to do this until a goal is reached.
Because the data available determines which inference rules are used, this method is also
classified as data driven.
Backward chaining starts with a list of goals and works backwards to see if there is data which
will allow it to conclude any of these goals. An inference engine using backward chaining would
search the inference rules until it finds one which has a then clause that matches a desired goal.
If the if clause of that inference rule is not known to be true, then it is added to the list of goals.
SW Architecture.
The following general points about expert systems and their architecture have been outlined:
1. The sequence of steps taken to reach a conclusion is dynamically synthesized with each new
case. The sequence is not explicitly programmed at the time that the system is built.
2. Expert systems can process multiple values for any problem parameter. This permits
more than one line of reasoning to be pursued and the results of incomplete (not fully
determined) reasoning to be presented.
End user
There are two styles of user-interface design followed by expert systems. In the original style of
user interaction, the software takes the end-user through an interactive dialog. In the following
example, a backward-chaining system seeks to determine a set of restaurants to recommend:
A. No
Q. Is there any kind of food you would particularly like?
A. No
A. No
A. Yes
A. Yes
Participants
There are generally three individuals having an interaction in an expert system. Primary among
these is the end-user, the individual who uses the system for its problem solving assistance. In
the construction and maintenance of the system there are two other roles: the problem domain
expert who builds the system and supplies the knowledge base, and a knowledge engineer who
assists the experts in determining the representation of their knowledge, enters this knowledge
into an explanation module and who defines the inference technique required to solve the
problem. Usually the knowledge engineer will represent the problem solving activity in the form
of rules. When these rules are created from domain expertise, the knowledge base stores the
rules of the expert system.
Inference rule
The function of the procedure node interface is to receive information from the procedures
coordinator and create the appropriate procedure call. The ability to call a procedure and receive
information from that procedure can be viewed as simply a generalization of input from the
external world. In some earlier expert systems external information could only be obtained in a
predetermined manner, which only allowed certain information to be acquired. Through the
knowledge base, this expert system disclosed in the cross-referenced application can invoke any
procedure allowed on its host system. This makes the expert system useful in a much wider class
of knowledge domains than if it had no external access or only limited external access.
In the area of machine diagnostics using expert systems, particularly self-diagnostic applications,
it is not possible to conclude the current state of "health" of a machine without some information.
The best source of information is the machine itself, for it contains much detailed information
that could not reasonably be provided by the operator.
The knowledge that is represented in the system appears in the rulebase. In the rulebase
described in the cross-referenced applications, there are basically four different types of
objects, with the associated information:
2. Parameters: Place holders for character strings which may be variables that can be
inserted into a class question at the point in the question where the parameter is positioned.
3. Rule nodes: Inferences in the system are made by a tree structure which indicates the
rules or logic mimicking human reasoning. The nodes of these trees are called rule nodes.
There are several different types of rule nodes.
Expert Systems/Shells. The E.S shell simplifies the process of creating a knowledge base. It is
the shell that actually processes the information entered by a user relates it to the concepts
contained in the knowledge base and provides an assessment or solution for a particular problem.
Knowledge Acquisition
Knowledge acquisition is the process used to define the rules and ontologies required for
a knowledge-based system. The phrase was first used in conjunction with expert systems to
describe the initial tasks associated with developing an expert system, namely finding and
interviewing domain experts and capturing their knowledge via rules, objects, and frame-
based ontologies.
Expert systems were one of the first successful applications of artificial intelligence technology
to real world business problems. Researchers at Stanford and other AI laboratories worked with
doctors and other highly skilled experts to develop systems that could automate complex tasks
such as medical diagnosis. Until this point computers had mostly been used to automate highly
data intensive tasks but not for complex reasoning. Technologies such as inference
engines allowed developers for the first time to tackle more complex problems.
As expert systems scaled up from demonstration prototypes to industrial strength applications it
was soon realized that the acquisition of domain expert knowledge was one of if not the most
critical task in the knowledge engineering process. This knowledge acquisition process became
an intense area of research on its own. One of the earlier works on the topic used Batesonian
theories of learning to guide the process.
One approach to knowledge acquisition investigated was to use natural language parsing and
generation to facilitate knowledge acquisition. Natural language parsing could be performed on
manuals and other expert documents and an initial first pass at the rules and objects could be
developed automatically. Text generation was also extremely useful in generating explanations
for system behavior. This greatly facilitated the development and maintenance of expert systems.
A more recent approach to knowledge acquisition is a re-use based approach. Knowledge can be
developed in ontologies that conform to standards such as the Web Ontology Language
(OWL). In this way knowledge can be standardized and shared across a broad community
of knowledge workers. One example domain where this approach has been successful
is bioinformatics.