0% found this document useful (0 votes)
371 views

Module-2 Knowledge Representation 18CS71 Knowledge Representation

This document discusses knowledge representation in artificial intelligence systems. It covers several key topics: 1) It describes different types of knowledge that need to be represented, such as objects, events, performance, and meta-knowledge. 2) It explains that knowledge representation involves mapping facts to symbolic representations that can be manipulated by computer programs. 3) Common representation methods include logic, rules, frames, and semantic networks. Knowledge can be either tacit or explicit. 4) A good knowledge representation system allows adequate representation and manipulation of knowledge as well as efficient knowledge acquisition.

Uploaded by

Pragnya Y S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
371 views

Module-2 Knowledge Representation 18CS71 Knowledge Representation

This document discusses knowledge representation in artificial intelligence systems. It covers several key topics: 1) It describes different types of knowledge that need to be represented, such as objects, events, performance, and meta-knowledge. 2) It explains that knowledge representation involves mapping facts to symbolic representations that can be manipulated by computer programs. 3) Common representation methods include logic, rules, frames, and semantic networks. Knowledge can be either tacit or explicit. 4) A good knowledge representation system allows adequate representation and manipulation of knowledge as well as efficient knowledge acquisition.

Uploaded by

Pragnya Y S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

KNOWLEDGE REPRESENTATION
Knowledge plays an important role in AI systems. The kinds of knowledge might need to be
represented in AI systems:
• Objects: Facts about objects in our world domain. e.g. Guitars have strings, trumpets
are brass instruments.
• Events: Actions that occur in our world. e.g. Steve Vai played the guitar in Frank
Zappa's Band.
• Performance: A behavior like playing the guitar involves knowledge about how to do
things.
• Meta-knowledge: Knowledge about what we know. e.g. Bobrow's Robot who plan's a
trip. It knows that it can read street signs along the way to find out where it is.
For the purpose of solving complex problems c\encountered in AI, we need both a large amount
of knowledge and some mechanism for manipulating that knowledge to create solutions to new
problems. A variety of ways of representing knowledge (facts) have been exploited in AI
programs. In all variety of knowledge representations, we deal with two kinds of entities.
A. Facts: Truths in some relevant world. These are the things we want to represent.
B. Representations of facts in some chosen formalism. these are things we will actually be able
to manipulate.
One way to think of structuring these entities is at two levels :
(a) the knowledge level, at which facts are described, and
(b) the symbol level, at which representations of objects at the knowledge level are defined in
terms of symbols that can be manipulated by programs.
The facts and representations are linked with two-way mappings. This link is called
representation mappings. The forward representation mapping maps from facts to
representations. The backward representation mapping goes the other way, from
representations to facts.
One common representation is natural language (particularly English) sentences. Regardless of
the representation for facts we use in a program , we may also need to be concerned with an
English representation of those facts in order to facilitate getting information into and out of the
system. We need mapping functions from English sentences to the representation we actually
use and from it back to sentences.

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 1


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

REPRESENTATIONS AND MAPPINGS


• In order to solve complex problems encountered in artificial intelligence, one needs both
a large amount of knowledge and some mechanism for manipulating that knowledge to
create solutions.
• Knowledge and Representation are two distinct entities. They play central but
distinguishable roles in the intelligent system.
• Knowledge is a description of the world. It determines a system’s competence by what
it knows.
• Moreover, Representation is the way knowledge is encoded. It defines a system’s
performance in doing something.
• Different types of knowledge require different kinds of representation.

Fig: Mapping between Facts and Representations


The model in the above figure focuses on facts, representations and on the 2 -way mappings
that must exist between them. These links are called Representation Mappings.
- Forward Representation mappings maps from Facts to Representations.
- Backward Representation mappings maps from Representations to Facts.
English or natural language is an obvious way of representing and handling facts. Regardless
of representation for facts, we use in program, we need to be concerned with English.
Representation of those facts in order to facilitate getting information into or out of the system.
The Knowledge Representation models/mechanisms are often based on:
• Logic
• Rules
• Frames
• Semantic Net
Knowledge is categorized into two major types:
1. Tacit corresponds to “informal” or “implicit“
• Exists within a human being;

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 2


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

• It is embodied.
• Difficult to articulate formally.
• Difficult to communicate or share.
• Moreover, Hard to steal or copy.
• Drawn from experience, action, subjective insight
2. Explicit formal type of knowledge, Explicit
• Explicit knowledge
• Exists outside a human being;
• It is embedded.
• Can be articulated formally.
• Also, Can be shared, copied, processed and stored.
• So, Easy to steal or copy
• Drawn from the artifact of some type as a principle, procedure, process, concepts.
A variety of ways of representing knowledge have been exploited in AI programs.
There are two different kinds of entities, we are dealing with.
1. Facts: Truth in some relevant world. Things we want to represent.
2. Also, Representation of facts in some chosen formalism. Things we will actually be able to
manipulate.
These entities structured at two levels:
1. The knowledge level, at which facts described.
2. Moreover, The symbol level, at which representation of objects defined in terms of ymbols
that can manipulate by programs
FRAMEWORK OF KNOWLEDGE REPRESENTATION
• The computer requires a well-defined problem description to process and provide a well
defined acceptable solution
• Moreover, To collect fragments of knowledge we need first to formulate a description
in our spoken language and then represent it in formal language so that computer can
understand.
• Also, The computer can then use an algorithm to compute an answer.
So, This process illustrated as,

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 3


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

Fig: Knowledge Representation Framework


The steps are:
• The informal formalism of the problem takes place first.
• It then represented formally and the computer produces an output.
• This output can then represented in an informally described solution that user
understands or checks for consistency.
The Problem solving requires,
• Formal knowledge representation, and
• Moreover, Conversion of informal knowledge to a formal knowledge that is the
conversion of implicit knowledge to explicit knowledge.
➢ Mapping functions from English Sentences to Representations: Mathematical logic as
representational formalism.
Example:
“ Spot is a dog”
➢ The fact represented by that English sentence can also be represented in logic as:
dog(Spot)
➢ Suppose that we also have a logical representation of the fact that

➢ Then, using the deductive mechanisms of logic, we may generate the new
representation object: as tail (Spot)
➢ Using an appropriate backward mapping function, the English sentence “Spot has a

tail” can be generated.


Mapping between Facts and Representation
• Knowledge is a collection of facts from some domain.
• Also, We need a representation of “facts“ that can manipulate by a program.
• Moreover, Normal English is insufficient, too hard currently for a computer program to
draw inferences in natural languages.

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 4


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

• Thus some symbolic representation is necessary.


A good knowledge representation enables fast and accurate access to knowledge and
understanding of the content.
A knowledge representation system should have following properties.
1. Representational Adequacy
• The ability to represent all kinds of knowledge that are needed in that domain.
2. Inferential Adequacy
• Also, The ability to manipulate the representational structures to derive new structures
corresponding to new knowledge inferred from old.
3. Inferential Efficiency
• The ability to incorporate additional information into the knowledge structure that can
be used to focus the attention of the inference mechanisms in the most promising
direction.
4. Acquisitional Efficiency
• Moreover, The ability to acquire new knowledge using automatic methods wherever
possible rather than reliance on human intervention.
• Fact-Representation mapping may not be one-to-one but rather are many-to-many which
are a characteristic of English Representation. Good Representation can make a reasoning
program simple.
Example:
“ All dogs have tails”
“ Every dog has a tail”

• From the two statements we can conclude that “Each dog has a tail.” From the
statement 1, we conclude that “Each dog has more than one tail.”
• When we try to convert English sentence into some other represent such as logical
propositions, we first decode what facts the sentences represent and then convert those
facts into the new representations. When an AI program manipulates the internal
representation of facts these new representations should also be interpretable as new
representations of facts.
Mutilated Checkerboard Problem:
Problem: In a normal chess board the opposite corner squares have been eliminated. The given
task is to cover all the squares on the remaining board by dominoes so that each domino covers
two squares. No overlapping of dominoes is allowed, can it be done? Consider three data

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 5


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

structures

The first representation does not directly suggest the answer to the problem. The second may
suggest. The third representation does, when combined with the single additional facts that
each domino must cover exactly one white square and one black square.

The puzzle is impossible to complete. A domino placed on the chessboard will always cover
one white square and one black square. Therefore, a collection of dominoes placed on the board
will cover an equal number of squares of each color. If the two white corners are removed from
the board then 30 white squares and 32 black squares remain to be covered by dominoes, so
this is impossible. If the two black corners are removed instead, then 32 wh ite squares and 30
black squares remain, so it is again impossible.
The solution is number of squares must be equal for positive solution.

In the above figure, the dotted line across the top represents the abstract reasoning process that
a program is intended to model. The solid line across the bottom represents the concrete
reasoning process that a particular program performs. This program successfully models the
abstract process to the extent that, when the backward representation mapping is applied to the

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 6


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

program’s output, the appropriate final facts are actually generated.


If no good mapping can be defined for a problem, then no matter how good the program to
solve the problem is, it will not be able to produce answers that correspond to real answers to
the problem.

Knowledge Representation Schemes


Using Knowledge
Let us consider to what applications and how knowledge may be used.
• Learning: acquiring knowledge. This is more than simply adding new facts to a
knowledge base. New data may have to be classified prior to storage for easy retrieval,
etc.. Interaction and inference with existing facts to avoid redundancy and replication
in the knowledge and also so that facts can be updated.
• Retrieval: The representation scheme used can have a critical effect on the efficiency of
the method. Humans are very good at it. Many AI methods have tried to model human.
• Reasoning: Infer facts from existing data.
If a system on only knows:
• Miles Davis is a Jazz Musician.
• All Jazz Musicians can play their instruments well.
If things like Is Miles Davis a Jazz Musician? or Can Jazz Musicians play their instruments
well? Are asked then the answer is readily obtained from the data structures and procedures.
However, a question like “Can Miles Davis play his instrument well?” requires reasoning.
The above are all related. For example, it is fairly obvious that learning and reasoning involve
retrieval etc.
The following properties should be possessed by a knowledge representatio n system.
• Representational Adequacy: the ability to represent all kinds of knowledge that are
needed in that domain;
• Inferential Adequacy: the ability to manipulate the knowledge represented to produce
new knowledge corresponding to that inferred from the original;
• Inferential Efficiency: the ability to incorporate into the knowledge structure additional
information that can be used to focus the attention of the inference mechanisms in the most
promising directions.
• Acquisitional Efficiency: the ability to acquire new information easily. The simplest case
involves direct insertion, by a person of new knowledge into the database.

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 7


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

Ideally, the program itself would be able to control knowledge acquisition. No single system
that optimizes all of the capabilities for all kinds of knowledge has yet been found.
As a result, multiple techniques for knowledge representation exist.
Relational Knowledge
• The simplest way to represent declarative facts is a set of relations of the same sort used
in the database system.
• Provides a framework to compare two objects based on equivalent attributes.
o Any instance in which two different objects are compared is a relational type of
knowledge.
• The table below shows a simple way to store facts. Also, The facts about a set of objects
are put systematically in columns. This representation provides little opportunity for
inference.

• Given the facts, it is not possible to answer a simple question such as: “Who is the
heaviest player?”
• Also, But if a procedure for finding the heaviest player is provided, then these facts will
enable that procedure to compute an answer.
• Moreover, We can ask things like who “bats – left” and “throws – right”.
Inheritable Knowledge
• Here the knowledge elements inherit attributes from their parents.
• The knowledge embodied in the design hierarchies found in the functional, physical and
process domains.
• Within the hierarchy, elements inherit attributes from their parents, but in many cases,
not all attributes of the parent elements prescribed to the child elements.
• Also, The inheritance is a powerful form of inference, but not adequate.
• Moreover, The basic KR (Knowledge Representation) needs to augment with inference
mechanism.
• Property inheritance: The objects or elements of specific classes inherit attributes and
values from more general classes.
• So, The classes organized in a generalized hierarchy.

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 8


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

• Boxed nodes — objects and values of attributes of objects.


• Arrows — the point from object to its value.
• This structure is known as a slot and filler structure, semantic network or a collection of
frames.
The steps to retrieve a value for an attribute of an instance object:
1. Find the object in the knowledge base
2. If there is a value for the attribute report it
3. Otherwise look for a value of an instance, if none fail
4. Also, Go to that node and find a value for the attribute and then report it
5. Otherwise, search through using is until a value is found for the attribute.
Inferential Knowledge
• This knowledge generates new information from the given information.
• This new information does not require further data gathering form source but does
require analysis of the given information to generate new knowledge.
• Example: given a set of relations and values, one may infer other values or relations. A
predicate logic (a mathematical deduction) used to infer from a set of attributes.
• Moreover, Inference through predicate logic uses a set of logical operations to relate
individual data.
• Represent knowledge as formal logic: All dogs have tails ∀x: dog(x) → hastail(x)
• Advantages:
• A set of strict rules.
• Can use to derive more facts.

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 9


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

• Also, Truths of new statements can be verified.


• Guaranteed correctness.
• So, Many inference procedures available to implement standard rules of logic popular
in AI systems. e.g Automated theorem proving.

Declarative Knowledge
• a statement in which knowledge is specified, but the use to which that knowledge is to be put
is not given.
• Example: laws, people’s name; there are facts which can stand alone, not dependent on other
knowledge
Procedural Knowledge
• A representation in which the control information, to use the knowledge, embedded in
the knowledge itself. For example, computer programs, directions, and recipes; these
indicate specific use or implementation;
• Moreover, Knowledge encoded in some procedures, small programs that know how to
do specific things, how to proceed.

Advantages:
• Heuristic or domain-specific knowledge can represent.
• Moreover, Extended logical inferences, such as default reasoning facilitated.
• Also, Side effects of actions may model. Some rules may become false in time.Keeping
track of this in large systems may be tricky.

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 10


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

Disadvantages:
• Completeness — not all cases may represent.
• Consistency — not all deductions may be correct. e.g If we know that Fred is a bird we
might deduce that Fred can fly. Later we might discover that Fred is an emu.
• Modularity sacrificed. Changes in knowledge base might have far-reaching effects.
• Cumbersome control information.

Issues in Knowledge Representation


Below are listed issues that should be raised when using knowledge representation techniques:
1. Important Attributes
2. Relationships among Attributes
3. Choosing the Granularity of Representation
4. Representing Sets of Objects
5. Finding the Right Structures as Needed

1. Important Attributes:
There are two attributes that are of very general significance, and we have already seen
their use: instance and isa, they support property inheritance. They are called a variety
of things in AI systems.
2. Relationships among Attributes
There are four such properties that deserve mention here:
• Inverses
• Existence in an isa hierarchy
• Techniques for reasoning about values
• Single-valued attributes
a. Inverses:
• Entities in the world are related to each other in many different ways. But as
soon as we decide to describe those relationships as attributes, we commit to a
perspective in which we focus on one object and lock for binary relationships
between it and others. Attributes are those relationships.
• The first is to represent both relationships in a single representation that ignores
focus. Logica] representations are usually interpreted as doing this. For example,
the assertion:

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 11


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

team(Pee-Wee-Reese, Brooklyn-Dodgers)
• The second approach is to use attributes that focus on a single entity but to use
them in pairs, one the inverse of the other. In this approach, we would represent
the team information with two attributes:
• one associated with Pee Wee Reese:
team = Brooklyn-Dodgers
• one associated with Brooklyn Dodgers:
team-members = Pee-Wee-Reese....
b. Existence in an isa hierarchy:
• This is about generalization-specialization, like classes of objects and
specialized subsets of those classes. There are attributes and specialization of
attributes.
• Example: The attribute Height. It is actually a specialization of the more general
attribute physical-size which is, in turn, a specialization of physical-attribute.
These generalization-specialization relationships are important because they
support inheritance. This also provides information about constraints on the
values that the attribute can have and mechanisms for computing those values.
c. Techniques for Reasoning about Values
« Information about the type of the value. For example, the value of Height must
be a number measured in a unit of length.
« Constraints on the value, often stated in terms of related entities. For example,
the age of a person cannot be greater than the age of either of that person’s
parents.
« Rules for computing the value whenit is needed. We showed an example of
such a rule in Fig. 4.5 for the bats attribute. These rules are called backward
rules, Such rules have also been called if -needed rules.
«Rules that describe actions that should be taken if a value ever becomes
known. These rules are called forward rules, or sometimes if -added rules.

d. Single-Valued Attributes
Knowledge-representation systems have taken several! different approaches to
providing support for single-valued attributes, including:

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 12


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

• Introduce an explicit notation for temporal interval. If two different values are
ever asserted for the same temporal interval, signal a contradiction
automatically.
• Assume that the only temporal interval that is of interest is now. So if a new
value is asserted, replace the old value.
• Provide no explicit support. Logic-based systems are in this category. But in
these systems, knowledge base builders can add axioms that state that if an
attribute has one value then it is known not to have all other values.

3. Choosing the Granularity of Representation


• Choosing the Granularity of Representation Primitives are fundamental concepts such
as holding, seeing, playing and as English is a very rich language with over half a
million words it is clear we will find difficulty in deciding upon which words to choose
as our primitives in a series of situations.
• Separate levels of understanding require different levels of primitives and these need
many rules to link together similar primitives.
• Suppose we are interested in following facts:
John spotted Sue.
• This could be represented as
spotted(agent (Jahn), object (Sue})
• Such a representation would make it easy to answer questions such as:
Who spotted Sue?
• suppose we want to know:
Did John see Sue?
• Given only one fact, we cannot discover that answer
• We can add other facts, such as
Spotted (x, y) -> saw (x, y)
An alternative solution to this problem is to represent the fact that spotting is really a special
type of seeing explicitly in the representation of the fact. We might write something such as
saw(agent(John),
object(Sue),
timespan(briefly)})

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 13


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

4. Representing Sets of Objects


Represent sets of objects for several reasons:
a. some properties that are true of sets that are not true of the individual membersof a set.
As examples, consider the assertions that are being made in the sentences “There are
more sheep than people in Australia” and “English speakers can be found all over the
world.” The only way to represent the facts described in these sentences1s to attach
assertions to the sets representing people, sheep, and English speakers, since, for
example, no single English speaker can be found al! over the world.
b. represent sets of objects is that if a property is true of all (or even most) elements of a
set, then it is more efficient to associate it once with the set rather than to associate it
explicitly with every element of the set.
There are three obvious ways in which sets may be represented:
a. Name: as in example- the node Baseball-player and predicates as Ball and Batter in
logical representation.
b. Extensional definition is to list the numbers, and
c. In tensional definition is to provide a rule, that returns true or false depending on
whether the object is in the set or not

5. Finding the Right Structures as Needed


They include the process on how to:
• Select an initial appropriate structure.
• Fill the necessary details from the current situations.
• Determine a better structure if the initially selected structure is not appropriate to fulfill
other conditions.
• Find the solution if none of the available structures is appropriate.
• Create and remember a new structure for the given condition.
• There is no specific way to solve these problems, but some of the effective knowledge
representation techniques have the potential to solve them.

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 14


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

PREDICATE LOGIC
Introduction
Predicate logic is used to represent Knowledge. Predicate logic will be met in Knowledge
Representation Schemes and reasoning methods. There are other ways but this form is popular.
Propositional Logic
• It is simple to deal with and decision procedure for it exists. We can represent real-
world facts as logical propositions written as well-formed formulas.
• To explore the use of predicate logic as a way of representing knowledge by looking at
a specific example.

• The above two statements becomes totally separate assertion, we would not be able to
draw any conclusions about similarities between Socrates and Plato.

• These representations reflect the structure of the knowledge itself. These use predicates
applied to arguments.

• It fails to capture the relationship between any individual being a man and that
individual being a mortal.
We need variables and quantification unless we are willing to write separate statements.
Predicate:
• A Predicate is a truth assignment given for a particular statement which is either true or
false. To solve common sense problems by computer system, we use predicate logic.
Logic Symbols used in predicate logic:

Predicate Logic

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 15


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

• Terms represent specific objects in the world and can be constants, variables or functions.
• Predicate Symbols refer to a particular relation among objects.
• Sentences represent facts, and are made of terms, quantifiers and predicate symbols.
• Functions allow us to refer to objects indirectly (via some relationship).
• Quantifiers and variables allow us to refer to a collection of objects without explicitly aming
each object.
• Some Examples
o Predicates: Brother, Sister, Mother , Father
o Objects: Bill, Hillary, Chelsea, Roger
o Facts expressed as atomic sentences a.k.a.
literals:
o Father(Bill,Chelsea)
o Mother(Hillary,Chelsea)
o Brother(Bill,Roger)
o Father(Bill,Chelsea)
Variables and Universal Quantification
Universal Quantification allows us to make a statement about a collection of objects:

Variables and Existential Quantification


Existential Quantification allows us to state that an object does exist (without naming it):

Nested Quantification

Functions
• Functions are terms - they refer to a specific object.
• We can use functions to symbolically refer to objects without naming them.
• Examples:
fatherof(x) age(x) times(x,y) succ(x)

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 16


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

• Using functions

If we use logical statements as a way of representing knowledge, then we have available a good
way of reasoning with that knowledge.
Representing facts with Predicate Logic
1) Marcus was a man man(Markus)
2) Marcus was a Pompeian pompeian(Markus)
3) All Pompeians were Romans

4) Caeser was a ruler. ruler(caeser)


5) All romans were either loyal to caeser or hated him.

6) Everyone loyal to someone.

7) People only try to assassinate rulers they are not loyal to.

8) Marcus try to assassinate Ceaser

Q. Prove that Marcus is not loyal to Ceaser by backward substitution

Representing Instance and Isa Relationships


Two attributes isa and instance play an important role in many aspects of knowledge
representation.
The reason for this is that they support property inheritance.
isa - used to show class inclusion, e.g. isa (mega_star,rich).

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 17


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

instance - used to show class membership, e.g. instance(prince,mega_star).

In the figure above,


• The first five sentences of the represent the pure predicate logic. In these
representations, class membership is represented with unary predicates (such as
Roman), each of which corresponds to a class. Asserting that P(x) is true is equivalent
to asserting that x is an instance of P.
• The second part of the figure contains representations that use the instance predicate
explicitly. The predicate instance is a binary one, whose first argument is an object and
whose second argument is a class to which the object belongs. But these representations
do not use an explicit isa predicate.
• The third part contains representations that use both the instance and isa predicates
explicitly. The use of the isa predicate simplifies the representation of sentence 3, but
it requires that one additional axiom be provided. This additional axiom describes how
an instance relation and an isa relation can be combined to derive a new instance
relation.
COMPUTABLE FUNCTIONS AND PREDICATES
This is fine if the number of facts is not very large or if the facts themselves are sufficiently
unstructured that there is little alternative. But suppose we want to express simple facts, such
as the following greater-than and less-than relationships:
gt(1,0) lt(0,1)
gt(2,1) lt(1,2)
gt(3,1)…. lt(2,3)….

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 18


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

Clearly we do not want to have to write out the representation of each of these facts
individually. For one thing, there are infinitely many of them. But even if we only consider the
finite number of them that can be represented, say, using a single machine word per number, it
would be extremely in efficient to store explicitly a large set of statements when we could,
instead, so easily compute each one as we need it, Thus it becomes useful to augment our
representation by these computable predicates.
Consider the following set of facts, again involving
1. Marcus was a man.
Man (Marcus)
Again we ignore the issue of tense.
2. Marcus was a Pompeian.
Pompeian(Marcus)
3. Marcus was bom in 40 A.D.
born(Marcus, 40)
4. All men are mortal.
Ɐ man(x) -> mortal(x)
5. All Pompeians died when the volcano erupted in 79 A.D.
erupted( volcano, 79) ꓥⱯx: : [Pompeian(x) -> died(x, 79)
6. No mortal lives longer than 150 years.
Ɐx : Ɐt1: Vt2: mortal(x) ꓥborn(x, t1,) ꓥ gt(t2, — t1,150) -> dead(x, t2)
7. It is now 1991.
now = 1991
8. Alive means not dead.
Ɐx : Ɐt:[alive(x,t)-> ¬dead(x,t)] ꓥ[¬dead(x,t)-> alive(x,t)]
9. If someone dies, then he is dead at all later times.
Ɐx : Ɐt1 : Ɐt2 : died(x,t1) ꓥ gt(t2,t1) -> dead(x,t2)

This representation says that one is dead in all years after the one in which one died. It ignores
the question of whether one is dead in the year in which one died.
1. Man (Marcus)
2. Pompeian(Marcus)
3. born(Marcus, 40)
4. Ɐ man(x) -> mortal(x)

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 19


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

5. Ɐ : [Pompeian(x) -> died(x, 79)


6. erupted( volcano, 79)
7. Ɐx : Ɐt1: Vt2: mortal(x) ꓥborn(x, t1,) ꓥ gt(t2, — t1,150) -> dead(x, t2)
8. now = 1991
9. Ɐx : Ɐt:[alive(x,t)-> ¬dead(x,t)] ꓥ[¬dead(x,t)-> alive(x,t)]

10. Ɐx : Ɐt1 : Ɐt2 : died(x,t1) ꓥ gt(t2,t1) -> dead(x,t2)


Fig: A Set of Facts about Marcus
One Way of Proving That Marcus Is Dead
¬alive( Marcus, now)
(9, substitution}
dead{Marcus, now)
(10, substitution)
DieD( Marcus, t,) /\ gt(now, t,)
(5, substitution)
Pompeian{Marcus) /\ gt(now, 79)
(2)
gt(now, 73)
(8, substitute equals)
gt(1991,79)
(compute gt)
Nil
From looking at the proofs we have just shown, two things should be clear: e
* Even very simple conclusions can require many steps to prove.
* A variety of processes, such as matching, substitution, and application of modus ponens are
involved in the production of a proof. This is true even for the simple statements we are using.
It would be worse if we had implications with more than a single term on the right or with
complicated expressions involving amis and ors on the left.

RESOLUTION
A procedure to prove a statement, Resolution attempts to show that Negation of
Statement givesContradiction with known statements. It simplifies proof procedure by first
converting the statements into canonical form. Simple iterative process; at each step, 2 clauses
called the parent clauses are compared, yielding a new clause that has been inferred from them.

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 20


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

Resolution refutation:
• Convert all sentences to CNF (conjunctive normal form)
• Negate the desired conclusion (converted to CNF) Apply resolution rule until either
• Derive false (a contradiction)
• Can’t apply any more Resolution refutation is sound and complete
• If we derive a contradiction, then the conclusion follows from the axioms
• If we can’t apply any more, then the conclusion cannot be proved from the axioms.

Sometimes from the collection of the statements we have, we want to know the answer of this
question - "Is it possible to prove some other statements from what we actually know?" In
orderto prove this we need to make some inferences and those other statements can be shown
true using Refutation proof method i.e. proof by contradiction using Resolution. So for the
asked goal we will negate the goal and will add it to the given statements to prove the
contradiction.
So resolution refutation for propositional logic is a complete proof procedure. So if the thing
that you're trying to prove is, in fact, entailed by the things that you've assumed, then you can
prove itusing resolution refutation.
Clauses:
▪ Resolution can be applied to certain class of wff called clauses.
▪ A clause is defined as a wff consisting of disjunction of literals.

Conjunctive Normal Form or Clause Normal Form:


Clause form is an approach to Boolean logic that expresses formulas as con junctions of clauses
with an AND or OR. Each clause connected by a conjunction or AND must be wither a literal
or contain a disjunction or OR operator. In clause form, a statement is a series of ORs
connected by ANDs.
A statement is in conjunctive normal form if it is a conjunction (sequence of ANDs) consisting
of one or more conjuncts, each of which is a disjunction (OR) of one or more literals (i.e.,
statementletters and negations of statement letters).
All of the following formulas in the variables A, B, C, D, and E are in conjunctive normal form:

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 21


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

Conversion to Clause Form:

Clause Form:

Algorithm: Convert to Clause Form


1. Eliminate ->, using the fact that a -> b is equivalent to ¬a \/ b. Performing this
transformation on the wff given above yields

2. Reduce the scope of each ¬to a single term, using the fact that ¬(¬p) = p,

3. Standardize variables so that each quantifier binds a unique variable.

4. Move all quantifiers to the left of the formulas without changing their relative order.

5. Eliminate existential quantifiers. We can eliminate the quantifier by substituting for the
variable a reference to a function that produces the desired value.

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 22


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

6. Drop the prefix. At this point, all remaining variables are universally quantified

7. Convert the matrix into a conjunction of disjunctions.

8. Create a separate clause corresponding to each conjunct in order for a well formed formula
to be true, all the clauses that are generated from it must be true.
9. Standardize apart the variables in set of clauses generated in step 8. Rename the
variables.So that no two clauses make reference to same variable.

The resultant clause form is

The Basis of Resolution:


Resolution process is applied to pair of parent clauses to produce a derived clause. Resolution
procedure operates by taking 2 clauses that each contain the same literal. The literal must occur
in the positive form in one clause and negative form in the other. The resolvent is obtained by
combining all of the literals of two parent clauses except ones that cancel. If the clause that is
produced in an empty clause, then a contradiction has been found.
Eg: winter and winter will produce the empty clause.
If a contradiction exists, then eventually it will be found. Of course, if no contradiction exists,
it is possible that the procedure will never terminate, although as we will see, there are often

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 23


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

ways of detecting that no contradiction exists.

Resolution in Propositional Logic:

Example: Consider the following axioms


P (P Q) → R (S T) → Q T
Convert them into clause form and prove that R is true

UNIFICATION ALGORITHM
• In propositional logic it is easy to determine that two literals cannot both be true
at the sametime.
• Simply look for L and ~L . In predicate logic, this matching process is more
complicated, sincebindings of variables must be considered.

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 24


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

• In order to determine contradictions we need a matching procedure that compares


two literalsand discovers whether there exist a set of substitutions that makes them
identical.
• There is a recursive procedure that does this matching. It is called Unification
algorithm.
• The process of finding a substitution for predicate parameters is called unification.
• We need to know:
– that 2 literals can be matched.
– the substitution is that makes the literals identical.
• There is a simple algorithm called the unification algorithm that does this.
The Unification Algorithm
1. Initial predicate symbols must match.
2. For each pair of predicate arguments:
– Different constants cannot match.
– A variable may be replaced by a constant.
– A variable may be replaced by another variable.
– A variable may be replaced by a function as long as the function does not
contain an instance of the variable.
• When attempting to match 2 literals, all substitutions must be made to the entire
literal.
• There may be many substitutions that unify 2 literals; the most general unifier
is alwaysdesired.

Unification Example:

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 25


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

The object of the Unification procedure is to discover at least one substitution that
causes two literals to match. Usually, if there is one such substitution there are many

Unification algorithm each literal is represented as a list, where first element is the name of a
predicate and the remaining elements are arguments. The argument may be a single element
(atom) or may be another list.
The unification algorithm recursively matches pairs of elements, one pair at a time. The
matching rules are:
• Different constants, functions or predicates cannot match, whereas identical ones can.
• A variable can match another variable, any constant or a function or predicateexpression,
subject to the condition that the function or [predicate expression must notcontain any instance
of the variable being matched (otherwise it will lead to infinite recursion).
• The substitution must be consistent. Substituting y for x now and then z for x later is
inconsistent. (a substitution y for x written as y/x)

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 26


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

Example:
Suppose we want to unify p(X,Y,Y)
with p(a,Z,b). Initially E is
{p(X,Y,Y)=p(a,Z,b)}.

The first time through the while loop, E becomes


{X=a,Y=Z,Y=b}.Suppose X=a is selected next.

Then S becomes{X/a} and E


becomes {Y=Z,Y=b}.Suppose Y=Z
is selected.

Then Y is replaced by Z in S and E.


S becomes{X/a,Y/Z} and E becomes {Z=b}.
Finally Z=b is selected, Z is replaced by b, S becomes {X/a,Y/b,Z/b}, and
E becomes empty.
The substitution {X/a,Y/b,Z/b} is returned as an MGU.
Unification:

Resolution in Predicate Logic


• Two literals are contradictory if one can be unified with the negation of the other.
• For example man(x) and man (Himalayas) are contradictory since man(x) and man
(Himalayas) can be unified.
• In predicate logic unification algorithm is used to locate pairs of literals that cancel out.

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 27


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

• It is important that if two instances of the same variable occur, then they must be
given identicalsubstitutions

Prove that Marcus hates ceaser using resolution

Example: (a) Convert all the above statements into predicate


logic
John likes all kinds of food. (b) Show that John likes peanuts using back chaining
(c) Convert the statements into clause form
Apples are food.
(d) Using Resolution show that “John likes peanuts”
Chicken is food.
Anything anyone eats and it is not killed is food.
Bill eats peanuts and is still alive. Swe eats everything bill eats

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 28


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

Answer:
Predicate Logic

Backward Chaining Proof:

Resolution proof:

Answering Questions
We can also use the proof procedure to answer questions such as “who tried to assassinate
Caesar” by proving:
– Try assassinate(y,Caesar).
Once the proof is complete we need to find out what was substitution was made fory.
We show how resolution can be used to answer fill-in-the-blank questions, such as "When did
Marcus die?" or "Who tried to assassinate a ruler?” Answering these questions involves finding
a known statement that matches the terms given in the question and then responding with

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 29


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

anotherpiece of the same statement that fills the slot demanded by the question.
From Clause Form to Horn Clauses
The operation is to convert Clause form to Horn Clauses. This operation is not always pos sible.
Horn clauses are clauses in normal form that have one or zero positive literals. The conversion
from a clause in normal form with one or zero positive literals to a Horn clause is done by using
the implication property.

Example:

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 30


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

REPRESENTING KNOWLEDGE USING RULES


Procedural versus Declaration Knowledge
Declarative Knowledge Procedural Knowledge
Factual information stored in memory and the knowledge of how to perform, or how to
known to be static in nature. operate
knowledge of facts or concepts a skill or action that you are capable of
performing
knowledge about that something true or false Knowledge about how to do something to
reach a particular objective or goal
knowledge is specified but how to use to which control information i.e., necessary to use the
that knowledge is to be put is not given knowledge is considered to be embedded in
the knowledge itself
E.g.: concepts, facts, propositions, assertions, E.g.: procedures, rules, strategies, agendas,
semantic nets … models
It is explicit knowledge (describing) It is tacit knowledge (doing)
The declarative representation is one in which the knowledge is specified but how to use to
which thatknowledge is to be put is not given.
• Declarative knowledge answers the question 'What do you know?'
• It is your understanding of things, ideas, or concepts.
• In other words, declarative knowledge can be thought of as the who, what,
when, and where of information.
• Declarative knowledge is normally discussed using nouns, like the names of
people,places, or things or dates that events occurred.

The procedural representation is one in which the control information i.e., necessary to
use theknowledge is considered to be embedded in the knowledge itself.
• Procedural knowledge answers the question 'What can you do?'
• While declarative knowledge is demonstrated using nouns,
• Procedural knowledge relies on action words, or verbs.
• It is a person's ability to carry out actions to complete a task.
The real difference between declarative and procedural views of knowledge lies in which
the controlinformation presides.

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 31


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

Example

The statements 1, 2 and 3 are procedural knowledge and 4 is a declarative knowledge.


Forward & Backward Reasoning
The object of a search procedure is to discover a path through a problem space from an initial
configuration to a goal state. There are actually two directions in which such a search could
proceed:
• Forward Reasoning,
▪ from the start states
▪ LHS rule must match with initial state
▪ Eg: A → B, B→C => A→C
• Backward Reasoning,
▪ from the goal states
▪ RHS rules must match with goal state
▪ Eg: 8-Puzzle Problem

In both the cases, the control strategy is it it must cause motion and systematic. The production
system model of the search process provides an easy way of viewing forward and backward
reasoning as symmetric processes.
Consider the problem of solving a particular instance of the 8-puzzle problem. The rules
to be used for solving the puzzle can be written as:

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 32


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

Reasoning Forward from Initial State:


➢ Begin building a tree of move sequences that might be solved with initial
configuration at rootof the tree.
➢ Generate the next level of the tree by finding all the rules whose left sides match
the rootnode and using their right sides to create the new configurations.
➢ Generate the next level by taking each node generated at the previous level and
applying to itall of the rules whose left sides match it.
➢ Continue until a configuration that matches the goal state is generated.
Reasoning Backward from Goal State:
➢ Begin building a tree of move sequences that might be solved with goal
configuration at rootof the tree Generate the next level of the tree by finding all the
rules whose right sides match the root node. These are all the rules that, if only we
could apply them, would generate the state we want. Use the left sides of the rules
to generate the nodes at this second level of the tree.
➢ Generate the next level of the tree by taking each node at the previous level and
finding allthe rules whose right sides match it. Then use the corresponding left sides
to generate the new nodes.
➢ Continue until a node that matches the initial state is generated.
➢ This method of reasoning backward from the desired final state is often called
goaldirected reasoning.

To reason forward, the left sides (preconditions) are matched against the current state and the
right sides (results) are used to generate new nodes until the goal is reached. To reason
backward, the right sides are matched against the current node and the left sides are used to
generate new nodes representing new goal states to be achieved.
The following 4 factors influence whether it is better to reason Forward or Backward:
1. Are there more possible start states or goal states? We would like to move from the
smaller set of states to the larger (and thus easier to find) set of states.
2. In which direction branching factor (i.e, average number of nodes that can be reached
directly from a single node) is greater? We would like to proceed in th e direction
with lower branching factor.
3. Will the program be used to justify its reasoning process to a user? If so, it is
important to proceed in the direction that corresponds more closely with the way the
user will think.

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 33


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

4. What kind of event is going to trigger a problem-solving episode? If it is arrival of a


new fact, forward reasoning makes sense. If it is a query to which a response is
desired, backward reasoning is more natural.
Backward-Chaining Rule Systems
➢ Backward-chaining rule systems are good for goal-directed problem solving.
➢ For example, a query system would probably use backward chaining to reason
about and answer user questions.
➢ Unification tries to find a set of bindings for variables to equate a (sub) goal with
the head of some rule.
➢ Medical expert system, diagnostic problems
Forward-Chaining Rule Systems
➢ Instead of being directed by goals, we sometimes want to be directed by incoming
data.
➢ For example, suppose you sense searing heat near your hand. You are likely to jerk
your hand away.
➢ Rules that match dump their right-hand side assertions into the state and the process
repeats.
➢ Matching is typically more complex for forward-chaining systems than backward
ones.
➢ Synthesis systems – Design/Configuration
Example of Typical Forward Chaining Rules
1) If hot and smoky then ADD fire
2) If alarm_beeps then ADD smoky
3) If fire then ADD switchon_sprinkles
Facts
1) alarm_beeps (given)
2) hot (given)
………
(3) smoky (from F1 by R2)
(4) fire (from F2, F4 by R1)
(5) switch_on_sprinklers (from F2 by R3)

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 34


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

Example of Typical Backward Chaining


Goal: Should I switch on sprinklers?
Combining Forward and Backward Reasoning
Sometimes certain aspects of a problem are best handled via forward chaining and otheraspects
by backward chaining. Consider a forward-chaining medical diagnosis program. It might
accept twenty or so facts about a patient’s condition then forward chain on those concepts to
try to deduce the nature and/or cause of the disease.
Now suppose that at some point, the left side of a rule was nearly satisfied – nine out of ten of
its preconditions were met. It might be efficient to apply backward reasoning to satisfy the
tenth precondition in a directed manner, rather than wait for forward chaining to supply the fact
by accident.
Whether it is possible to use the same rules for both forward and backward reasoning also
depends on the form of the rules themselves. If both left sides and right sides contain pure
assertions, then forward chaining can match assertions on the left side of a rule and add to the
state description the assertions on the right side. But if arbitrary procedures are allowed as the
right sides of rules then the rules will not be reversible.
Logic Programming
➢ Logic Programming is a programming language paradigm in which logical
assertions areviewed as programs.
➢ There are several logic programming systems in use today, the most popular
of which isPROLOG.
➢ A PROLOG program is described as a series of logical assertions, each of which
is a Horn clause.
➢ A Horn clause is a clause that has at most one positive literal. Thus p, p q,
p -> q are allHorn clauses.
Programs written in pure PROLOG are composed only of Horn Clauses.
Syntactic Difference between the logic and the PROLOG representations, including:
➢ In logic, variables are explicitly quantified. In PROLOG, quantification is
provided implicitly by the way the variables are interpreted.
o The distinction between variables and constants is made in PROLOG by having
all variables begin with uppercase letters and all constants begin with
lowercase letters.
➢ In logic, there are explicit symbols for and ( ) and or ( ). In PROLOG, there is
an explicit symbol for and (,), but there is none for or.

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 35


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

➢ In logic, implications of the form “p implies q” as written as p->q. In PROLOG,


the same implication is written “backward” as q: -p.
Example:

The first two of these differences arise naturally from the fact that PROLOG programs are
actually sets of Horn Clauses that have been transformed as follows:
1. If the Horn Clause contains no negative literals (i.e., it contains a single literal
which ispositive), then leave it as it is.
2. Otherwise, return the Horn clause as an implication, combining all of the negative
literals into the antecedent of the implication and leaving the single positive
literal (if there is one)as the consequent.
This procedure causes a clause, which originally consisted of a disjunction of literals (all but
oneof which were negative), to be transformed to single implication whose an tecedent is a
conjunction of (what are now positive) literals.

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 36


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

Matching
We described the process of using search to solve problems as the application of appropriate
rules to individual problem states to generate new states to which the rules can then be applied
and so forth until a solution is found.
How we extract from the entire collection of rules those that can be applied at a given point?
To do so requires some kind of matching between the current state and the preconditions of the
rules. How should this be done? The answer to this question can be critical to the success of a
rule based system
A more complex matching is required when the preconditions of rule specify required
properties that are not stated explicitly in the description of the current state. In this case, a
separate set of rules must be used to describe how some properties can be inferred from others.
An even more complex matching process is required if rules should be applied and if
their pre condition approximately match the current situation. This is often the case in
situations involving physical descriptions of the world.

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 37


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

Indexing
One way to select applicable rules is to do a simple search though all the rules comparing each
one’s precondition to the current state and extracting all the one’s that match. There are two
problems with this simple solution:
i. The large number of rules will be necessary and scanning through all of them at
every step would be inefficient.
ii. It’s not always obvious whether a rule’s preconditions are satisfied by a particular
state.
Solution: Instead of searching through rules use the current state as an index into the rules
and select the matching one’s immediately.

Matching process is easy but at the price of complete lack of generality in the statement of the
rules. Despite some limitations of this approach, Indexing in some form is very important in
the efficient operation of rule based systems.
Matching with Variables
The problem of selecting applicable rules is made more difficult when preconditions are not
stated as exact descriptions of particular situations but rather describe properties that the
situations must have. It often turns out that discovering whether there is a match between a
particular situation and the preconditions of a given rule must itself involve a significant search
process.
backward-chaining systems usually use depth-first backtracking to select individual rules, but
forward-chaining systems generally employ sophisticated conflict resolution strategies to
choose among the applicable rules.
While it is possible to apply unification repeatedly over the cross product of preconditions
and statedescription elements, it is more efficient to consider the many-many match problem,
in which many rules are matched against many elements in the state description
simultaneously. One efficientmany-many match algorithm is RETE.
RETE Matching Algorithm

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 38


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

The matching consists of 3 parts


1. Rules & Productions
2. Working Memory
3. Inference Engine
The inference Engine is a cycle of production system which is match, select, execute.

The above cycle is repeated until no rules are put in the conflict set or until stopping condition
is reached. In order to verify several conditions, it is a time consuming process. To eliminate
the need to perform thousands of matches of cycles on effective matching algorithm is called
RETE.
The Algorithm consists of two Steps.
1. Working memory changes need to be examined.
2. Grouping rules which share the same condition & linking them to their common
terms.
RETE Algorithm is many-match algorithm (In which many rules are matched against many
elements). RETE uses forward chaining systems which generally employee sophisticated
conflictresolution strategies to choose among applicable rules. RETE gains efficiency from 3
major sources.
1. RETE maintains a network of rule condition and it uses changes in the state description to
determine which new rules might apply. Full matching is only pursued for candidates that could
be affected by incoming/outgoing data.
2. Structural Similarity in rules: RETE stores the rules so that they share structures in
memory, set of conditions that appear in several rules are matched once for cycle.
Persistence of variable binding consistency. While all the individual preconditions of the rule
might be met, there may be variable binding conflicts that prevent the rule from firing can be

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 39


MODULE-2 KNOWLEDGE REPRESENTATION 18CS71

minimized. RETE remembers its previous calculations and is able to merge new binding
information efficiently.

Approximate Matching:
Rules should be applied if their preconditions approximately match to the current situation Eg:
Speech understanding program
Rules: A description of a physical waveform to phones
Physical Signal: difference in the way individuals speak, result of background noise.
Conflict Resolution:
When several rules matched at once such a situation is called conflict resolution. There are 3
approaches to the problem of conflict resolution in production system.
1. Preference based on rule match:
a. Physical order of rules in which they are presented to the system
b. Priority is given to rules in the order in which they appear
2. Preference based on the objects match:
a. Considers importance of objects that are matched
b. Considers the position of the match able objects in terms of Long Term
Memory (LTM) & Short Term Memory(STM)
LTM: Stores a set of rules
STM (Working Memory): Serves as storage area for the facts deduced by
rules in longterm memory
3. Preference based on the Action:
a. One way to do is find all the rules temporarily and examine the results of
each. Using a Heuristic Function that can evaluate each of the resulting states
compare the merits of the result and then select the preferred one.
Search Control Knowledge:
➢ It is knowledge about which paths are most likely to lead quickly to a goal state
➢ Search Control Knowledge requires Meta Knowledge.
➢ It can take many forms. Knowledge about
which states are more preferable to others.
which rule to apply in a given situation the Order in which to pursue sub
goalsuseful Sequences of rules to apply.

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 40


MODULE-2 CONCEPT LEARNING 18CS71

CONCEPT LEARNING
Much of learning involves acquiring general concepts from specific training examples. People,
for example, continually learn general concepts or categories such as "bird," "car," etc. Each
concept can be viewed as describing some subset of objects/events d efined over a larger set.
We consider the problem of automatically inferring the general definition of some concept,
given examples labeled as members or nonmembers of the concept. This task is commonly
referred to as concept learning or approximating a boolean-valued function from examples.
Concept learning: Inferring a boolean-valued function from training examples of its input and
output.
“A task of acquiring potential hypothesis (solution) that best fits the given training examples.”
1. A CONCEPT LEARNING TASK
To ground our discussion of concept learning, consider the example task of learning the target
concept "Days on which my friend Sachin enjoys his favorite water sport” . Table given
below describes a set of example days, each represented by a set of attributes.

What hypothesis representation shall we provide to the learner in this case?


For each attribute, the hypothesis will either
• Indicate by a “ ? ” that any value is acceptable for this attribute, specify a single required value
(e.g., Warm) for the attribute, or indicate by a "Φ" that no value is acceptable.
•If some instance x satisfies all the constraints of hypothesis h, then h classifies x as a positive
example (h(x) = 1).
• To illustrate, the hypothesis that the Sachin enjoys his favorite sport only on cold days with
high humidity(independent of the values of the other attributes) is represented by the
expression(?, Cold, High, ?, ?, ?)
• The most general hypothesis-that every day is a positive example-is represented by (?, ?, ?, ?,
?, ?) and the most specific possible hypothesis-that no day is a positive example-is represented
by (Φ, Φ, Φ, Φ, Φ, Φ).
• To summarize, the EnjoySport concept learning task requires learning the set of days for which
EnjoySport=yes, describing this set by a conjunction of constraints over the instance attributes.

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 1


MODULE-2 CONCEPT LEARNING 18CS71

In general, any concept learning task can be described by the set of instances over which the
target function is defined, the target function, the set of candidate hypotheses considered by the
learner, and the set of available training examples.
Notation
• The set of items over which the concept is defined is called the set of instances, which we
denote by X. In the current example, X is the set of all possible days, each represented by the
attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast.
• The concept or function to be learned is called the target concept, which we denote by c.
In general, c can be any Boolean valued function defined over the instances X;
that is, c: X → {0, 1}
In the current example, the target concept corresponds to the value of the attribute EnjoySport
(i.e, c(x)=1 if EnjoySport=Yes, and c(x)=0 if EnjoySport= No).
•When learning the target concept, the learner is presented by a set of training examples, each
consisting of an instance x from X, along with its target concept value c(x).
•Instances for which c(x) = 1 are called positive examples, or members of the target concept.
Instances for which c(x) = 0 are called negative examples. We will often write the ordered pair
(x, c(x)) to describe the training example consisting of the instance x and its target concept
value c(x).
•We use the symbol D to denote the set of available training examples.
•Given a set of training examples of the target concept c, the problem faced by the learner is to
hypothesize, or estimate, c. We use the symbol H to denote the set of all possible hypotheses
that the learner may consider regarding the identity of the target concept.
•In general, each hypothesis h in H represents a boolean-valued function defined over X; that
is,
h : X →{0, 1}. The goal of the learner is to find a hypothesis h such that h(x) = c(x) for all
x in X.
h : X →{0, 1}. The goal of the learner is to find a hypothesis h such that h(x) = c(x) for all
x in X.
• Given:
o Instances X: Possible days, each described by the attributes
▪ Sky (with possible values Sunny, Cloudy, and Rainy),
▪ AirTemp (with values Warm and Cold),
▪ Humidity (with values Normal and High),
▪ Wind (with values Strong and Weak),

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 2


MODULE-2 CONCEPT LEARNING 18CS71

▪ Water (with values Warm and Cool),


▪ Forecast (with values Same and Change).
• Hypotheses H: Each hypothesis is described by a conjunction of constraints on the
attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast. The constraints may be
"?" (any value is acceptable , “Φ” (no val e is acceptable , or a specific value.
• Target concept c: EnjoySport : X → {0, l}
• Training examples D: Positive and negative examples of the target function
• Determine:
o A hypothesis h in H such that h(x) = c(x) for all x in X.
Table: description of The EnjoySport concept learning task.
Inductive learning hypothesis
• Our assumption is that the best hypothesis regarding unseen instances is the hypothesis that
best fits the observed training data. This is the fundamental assumption of inductive learning.
• The inductive learning hypothesis. Any hypothesis found to approximate the target function
well over a sufficiently large set of training examples will also approximate the target function
well over other unobserved examples.
2. CONCEPT LEARNING AS SEARCH
Concept learning can be viewed as the task of searching through a large space of hypotheses
implicitly defined by the hypothesis representation. The goal of this search is to find the
hypothesis that best fits the training examples.
Consider, for example, the instances X and hypotheses H in the EnjoySport learning task. Given
that the attribute Sky has three possible values, and that AirTemp, Humidity, Wind, Water, and
Forecast each have two possible values, the instance space X contains exactly 3.2.2.2.2.2 = 96
distinct instances.
A similar calculation shows that there are 5.4.4.4.4.4 = 5120 syntactically distinct hypotheses
within H (including? and Φ for each). Most practical learning tasks involve much larger,
sometimes infinite, hypothesis spaces.
General-to-Specific Ordering of Hypotheses
Many algorithms for concept learning organize the search through the hypothesis space by
relying on a very useful structure that exists for any concept learning problem: a general-to-
specific ordering of hypotheses. To illustrate the general-to-specific ordering, consider the two
hypotheses
h1 = (Sunny, ?, ?, Strong, ?, ?)
h2 = (Sunny, ?, ?, ?, ?, ?)

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 3


MODULE-2 CONCEPT LEARNING 18CS71

Now consider the sets of instances that are classified positive by h l and by h 2. Because h2
imposes fewer constraints on the instance, it classifies more instances as positive. In fact, any
instance classified positive by hl will also be classified positive by h 2. Therefore, we say that
h2 is more general than hl.
This intuitive "more general than" relationship between hypotheses can be defined more
precisely as follows.
Definition: Let h j and h k be Boolean-valued functions defined over X. Then h j is more general-
than-or-equal-to h k (written h j ≥ h k) if and only if
(Ɐ x ∈ X ) [(hk (x) = 1 )→ (hj (x) = 1)]

• In the figure, the box on the left represents the set X of all instances, the box on the right
the set H of all hypotheses.
• Each hypothesis corresponds to some subset of X-the subset of instances that it classifies
positive.
• The arrows connecting hypotheses represent the more - general -than relation, with the
arrow pointing toward the less general hypothesis.
• Note the subset of instances characterized by h 2 subsumes the subset characterized by
h l , hence h 2 is more - general– than h 1

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 4


MODULE-2 CONCEPT LEARNING 18CS71

FIND-S: FINDING A MAXIMALLY SPECIFIC HYPOTHESIS


How can we use the more-general-than partial ordering to organize the search for a hypothesis
consistent with the observed training examples?
One way is to begin with the most specific possible hypothesis in H, then generalize this
hypothesis each time it fails to cover an observed positive training example. FIND-S algorithm
is used for this purpose.

1. Initialize h to the most specific hypothesis in H


2. For each positive training instance x
For each attribute constraint ai in h
If the constraint ai is satisfied by x
then do nothing
Else
replace ai in h by the next more general constraint
that is satisfied by x
3. Output the hypothesis h

To illustrate this algorithm, assume the learner is given the sequence of training examples from
• The first step of FIND-S is to initialize h to the most specific hypothesis in H
h - (Ø, Ø, Ø, Ø, Ø, Ø)
• Consider the first training example
x1 = <Sunny Warm Normal Strong Warm Same>, +
• Observing the first training example, it is clear that hypothesis h is too specific. None
of the "Ø" constraints in h are satisfied by this example, so each is replaced by the next
more general constraint that fits the example
h1 = <Sunny Warm Normal Strong Warm Same>
• Consider the second training example
x2 = <Sunny, Warm, High, Strong, Warm, Same>, +
• The second training example forces the algorithm to further generalize h, this time
substituting a "?" in place of any attribute value in h that is not satisf ied by the new example

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 5


MODULE-2 CONCEPT LEARNING 18CS71

h2 = <Sunny Warm ? Strong Warm Same>


• Consider the third training example
x3 = <Rainy, Cold, High, Strong, Warm, Change>, -
• Upon encountering the third training the algorithm makes no change to h. The
FIND-S algorithm simply ignores every negative example.
h3 = < Sunny Warm ? Strong Warm Same>
• Consider the fourth training example
x4 = <Sunny Warm High Strong Cool Change>, +
• The fourth example leads to a further generalization of h
h4 = < Sunny Warm ? Strong ? ? >

The key property of the FIND-S algorithm


• FIND-S is guaranteed to output the most specific hypothesis within H that is consistent
with the positive training examples
• FIND-S algorithm’s final hypothesis will also be consistent with the negative examples
provided the correct target concept is contained in H, and provided the training
examples are correct.
Unanswered Questions by Find-S algorithm in Machine Learning
1. Has the learner converged to the correct target concept? Although FIND-S will find a
hypothesis consistent with the training data, it has no way to determine whether it has
found the only hypothesis in H consistent with the data (i.e., the correct target concept),
or whether there are many other consistent hypotheses as well.
2. Why prefer the most specific hypothesis? In case there are multiple hypotheses
consistent with the training examples, FIND-S will find the most specific. It is unclear
whether we should prefer this hypothesis over the most general or some other hypothesis
of intermediate generality.

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 6


MODULE-2 CONCEPT LEARNING 18CS71

3. Are the training examples consistent? In most practical learning problems there is some
chance that the training examples will contain at least some errors or noise. Such
inconsistent sets of training examples can severely mislead FIND-S, given the fact that
it ignores negative examples. We would prefer an algorithm that could at least detect
when the training data is inconsistent and, preferably, accommodate such errors.
4. What if there are several maximally specific consistent hypotheses? In the hypothesis
language H for the EnjoySport task, there is always a unique, most specific hypothesis
consistent with any set of positive examples. However, for other hypothesis spaces there
can be several maximally specific hypotheses consistent with the data.

VERSION SPACE AND CANDIDATE ELIMINATION ALGORITHM


Candidate Elimination algorithm (CEA), addresses limitations of FIND-S. It finds all
describable hypotheses that are consistent with the observed training examples. In order to
define this algorithm precisely, we begin with a few basic definitions.
The key idea in the CANDIDATE-ELIMINATION algorithm is to output a description of the
set of all hypotheses consistent with the training examples.
Representation
Definition: consistent- A hypothesis h is consistent with a set of training examples D if and
only if h(x) = c(x) for each example (x, c(x)) in D.

Note difference between definitions of consistent and satisfies


• An example x is said to satisfy hypothesis h when h(x) = 1, regardless of whether
x is a positive or negative example of the target concept.
• An example x is said to consistent with hypothesis h iff h(x) = c(x)
Definition: version space- The version space, denoted V SH,D with respect to hypothesis
is space H and training examples D, is the subset of hypotheses from H consistent with
the training examples in D
V S H,D ≡ { h ≡ H | Consisten(h,D)}
THE LIST-THEN-ELIMINATE ALGORITHM
One obvious way to represent the version space is simply to list all of its members. This leads
to a simple learning algorithm, which we might call the List-Then-Eliminate algorithm.
The LIST-THEN-ELIMINATE algorithm first initializes the version space to contain all
hypotheses in H and then eliminates any hypothesis found inconsistent with any training

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 7


MODULE-2 CONCEPT LEARNING 18CS71

example.
1. Version Space c a list containing every hypothesis in H
2. For each training example, (x, c(x))
remove from Version Space any hypothesis h for which h(x) ≠ c(x)
3. Output the list of hypotheses in Version Space
The List-Then-Eliminate algorithm first initializes the version space to contain all hypotheses
in H, then eliminates any hypothesis found inconsistent with any training example. The version
space of candidate hypotheses thus shrinks as more examples are o bserved, until ideally just
one hypothesis remains that is consistent with all the observed examples.
It is intuitively plausible that we can represent the version space in terms of its most specific
and most general members.
A More Compact Representation for Version Spaces
The version space is represented by its most general and least general members. These
members form general and specific boundary sets that delimit the version space within the
partially ordered hypothesis space.
Definition: The general boundary G, with respect to hypothesis space H and training data D,
is the set of maximally general members of H consistent with D

Definition: The specific boundary S, with respect to hypothesis space H and training data D,
is the set of minimally general (i.e., maximally specific) members of H consistent with D.

Theorem: Version Space representation theorem


Theorem: Let X be an arbitrary set of instances and Let H be a set of Boolean -valued
hypotheses defined over X. Let c: X →{O, 1} be an arbitrary target concept defined over X,
and let D be an arbitrary set of training examples {(x, c(x))). For all X, H, c, and D such that S
and G are well defined,

VSH,D ={ h  H | (s  S ) (g  G ) ( g  hg  sg )}


To Prove:
1. Every h satisfying the right hand side of the above expression is in VS H,D
2. Every member of VSH,D satisfies the right-hand side of the expression

Sketch of proof:
• let g, h, s be arbitrary members of G, H, S respectively with g g h g s

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 8


MODULE-2 CONCEPT LEARNING 18CS71

• By the definition of S, s must be satisfied by all positive examples in D. Because


h g s, h must also be satisfied by all positive examples in D.
• By the definition of G, g cannot be satisfied by any negative example in D, and
because g g h h cannot be satisfied by any negative example in D. Because h is
satisfied by all positive examples in D and by no negative examples in D, h is
consistent with D, and therefore h is a member of VS H,D.
1. It can be proven by assuming some h in VS H,D,that does not satisfy the right-hand
side of the expression, then showing that this leads to an inconsistency

CANDIDATE-ELIMINATION Learning Algorithm


The CANDIDATE-ELIMINTION algorithm computes the version space containing all
hypotheses from H that are consistent with an observed sequence of training examples.
It begins by initializing the version space to the set of all hypotheses in H; that is, by initializing
the G boundary set to contain the most general hypothesis in H
Go { <?,?,?,?,?,?>}
and initializing the S boundary set to contain the most specific (least general) hypothesis
S0  {<Φ,Φ,Φ,Φ,Φ,Φ>}
These two boundary sets delimit the entire hypothesis space, because every other
hypothesis in H is more general than So and more specific than Go. As each training example
is considered, the S and G boundary sets are generalized and specialized, respectively, to
eliminate from the version space any hypotheses found inconsistent with the new training
example. After all examples have been processed, the computed version space contains all the
hypotheses consistent with these examples and only these hypotheses. This algorithm is
summarized in given below
For each training example d, do:

If d is positive example

Remove from G any hypothesis h inconsistent with d

For each hypothesis s in S not consistent with d:

Remove s from S

Add to S all minimal generalizations of s consistent with d

and having a generalization in G

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 9


MODULE-2 CONCEPT LEARNING 18CS71

Remove from S any hypothesis with a more specific h in S

If d is negative example

Remove from S any hypothesis h inconsistent with d

For each hypothesis g in G not consistent with d:

Remove g from G

Add to G all minimal specializations of g consistent with d

and having a specialization in S

Remove from G any hypothesis having a more general

hypothesis in G

CANDIDATE- ELIMINTION algorithm using version spaces

An Illustrative Example
Example Sky AirTemp Humidity Wind Water Forecast EnjoySport
1 Sunny Warm Normal Strong Warm Same Yes
2 Sunny Warm High Strong Warm Same Yes
3 Rainy Cold High Strong Warm Change No
4 Sunny Warm High Strong Cool Change Yes
CANDIDATE-ELIMINTION algorithm begins by initializing the version space to the set of
all hypotheses in H;
Initializing the G boundary set to contain the most general hypothesis in H
G0 <?, ?, ?, ?, ?, ?>
Initializing the S boundary set to contain the most specific (least general) hypothesis
S0 <Φ,Φ,Φ,Φ,Φ,Φ>

• When the first training example is presented, the CANDIDATE-ELIMINTION algorithm


checks the S boundary and finds that it is overly specific and it fails to cover the positive
example.
• The boundary is therefore revised by moving it to the least more general hypothesis that
covers this new example

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 10


MODULE-2 CONCEPT LEARNING 18CS71

• No update of the G boundary is needed in response to this training example because Go


correctly covers this example

• When the second training example is observed, it has a similar effect of


generalizing S further to S2, leaving G again unchanged i.e., G2 = G1 = G0

• Consider the third training example. This negative example reveals that the G
boundary of the version space is overly general, that is, the hypothesis in G
incorrectly predicts that this new example is a positive example.

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 11


MODULE-2 CONCEPT LEARNING 18CS71

• The hypothesis in the G boundary must therefore be specialized until it correctly


classifies this new negative example

Given that there are six attributes that could be specified to specialize G 2, why are there only
three new hypotheses in G 3?

• For example, the hypothesis h = (?, ?, Normal, ?, ?, ?) is a minimal specialization


of G2 that correctly labels the new example as a negative example, but it is not
included in G3. The reason this hypothesis is excluded is that it is inconsistent with
the previously encountered positive examples

• Consider the fourth training example.

• This positive example further generalizes the S boundary of the version space. It also results
in removing one member of the G boundary, because this member fails to cover the new
positive example
• After processing these four examples, the boundary sets S 4 and G4 delimit the version space
of all hypotheses consistent with the set of incrementally observed training examples.

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 12


MODULE-2 CONCEPT LEARNING 18CS71

INDUCTIVE BIAS
The fundamental questions for inductive inference
1. What if the target concept is not contained in the hypothesis space?
2. Can we avoid this difficulty by using a hypothesis space that includes every
possible hypothesis?
3. How does the size of this hypothesis space influence the ability of the

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 13


MODULE-2 CONCEPT LEARNING 18CS71

algorithm to generalize to unobserved instances?


4. How does the size of the hypothesis space influence the number of training
examples that must be observed?
These fundamental questions are examined in the context of the CANDIDATE -
ELIMINTION algorithm
A Biased Hypothesis Space
• Suppose the target concept is not contained in the hypothesis space H, then obvious
solution is to enrich the hypothesis space to include every possible hypothesis.
• Consider the EnjoySport example in which the hypothesis space is restricted to
include only conjunctions of attribute values. Because of this restriction, the
hypothesis space is unable to represent even simple disjunctive target concepts such
as
"Sky = Sunny or Sky = Cloudy."
• The following three training examples of disjunctive hypothesis, the algorithm
would find that there are zero hypotheses in the version space

<Sunny Warm Normal Strong Cool Change> Y


<Cloudy Warm Normal Strong Cool Change> Y
<Rainy Warm Normal Strong Cool Change> N

• If Candidate Elimination algorithm is applied, then it end up with empty Version


Space. After first two training example
S= <? Warm Normal Strong Cool Change>

• This new hypothesis is overly general and it incorrectly covers the third negative
training example! So H does not include the appropriate c.
• In this case, a more expressive hypothesis space is required.
An Unbiased Learner
• The solution to the problem of assuring that the target concept is in the hypothesis
space H is to provide a hypothesis space capable of representing every teachable
concept that is representing every possible subset of the instances X.
• The set of all subsets of a set X is called the power set of X
• In the EnjoySport learning task the size of the instance space X of days described

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 14


MODULE-2 CONCEPT LEARNING 18CS71

by the six attributes is 96 instances.


• Thus, there are 2 96 distinct target concepts that could be defined over this instance
space and learner might be called upon to learn.
• The conjunctive hypothesis space is able to represent only 973 of these - a
biased hypothesis space indeed
• Let us reformulate the EnjoySport learning task in an unbiased way by defining a
new hypothesis space H' that can represent every subset of instances
• The target concept "Sky = Sunny or Sky = Cloudy" could then be described as
(Sunny, ?, ?, ?, ?, ?) v (Cloudy, ?, ?, ?, ?, ?)

The Futility of Bias-Free Learning


Inductive learning requires some form of prior assumptions, or inductive bias
The fundamental property of inductive inference: a learner that makes no a priori
assumptions regarding the identity of the target concept has no rational basis for classifying
any unseen instances.
In fact, the only reason that the CEA was able to generalize beyond the observed training
examples in our original formulation of the EnjoySport task is that it was biased by the implicit
assumption that the target concept could be represented by a conjunction of attribute values.
In cases where this assumption is correct (and the training examples are error-free), its
classification of new instances will also be correct. If this assumption is incorrect, however, it
is certain that the CEA will mis-classify at least some instances from X.
Consider the general setting in which an arbitrary learning algorithm L is provided an arbitrary
set of training data Dc = {〈x, c(x) 〉} of some arbitrary target concept c. After training, L is
asked to classify a new instance xi. Let L(x i , Dc ) denote the classification (e.g., positive or
negative) that L assigns to xi after learning from the training data D c. We can describe this
inductive inference step performed by L as follows

Definition:
• Consider a concept learning algorithm L for the set of instances X.
• Let c be an arbitrary concept defined over X, and
• let Dc = {〈x, c(x) 〉} be an arbitrary set of training examples of c.

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 15


MODULE-2 CONCEPT LEARNING 18CS71

• Let L(x i , Dc ) denote the classification assigned to the instance x i by L after training on
the data Dc .
• The inductive bias of L is any minimal set of assertions B such that for any target concept
c and corresponding training examples D c
Inductive bias of CEA: The target concept c is contained in the given hypothesis space H.The
figure given below summarizes the situation schematically.

The following three learning algorithms, which are listed from weakest to strongest bias:
•Rote-Learner: Learning corresponds simply to storing each observed training example in
memory. Subsequent instances are classified by looking them up in memory. If the instance is
found in memory, the stored classification is returned. Otherwise, the system refuses to classify
the new instance.
The Rote-Learner has no inductive bias. The classifications it provides for new instances follow
deductively from the observed training examples, with no additional assumptions required.

•CEA: New instances are classified only in the case where all members of the current version
space agree on the classification. Otherwise, the system refuses to classify the new instance.
The CEA has a stronger inductive bias: that the target concept can be represented in its
hypothesis space. Because it has a stronger bias, it will classify some insta nces that the Rote-

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 16


MODULE-2 CONCEPT LEARNING 18CS71

Learner will not. Of course, the correctness of such classifications will depend completely on
the correctness of this inductive bias.

•FIND-S: This algorithm, described earlier, finds the most specific hypothesis consistent with
the training examples. It then uses this hypothesis to classify all subsequent instances.
The FIND-S algorithm has an even stronger inductive bias. In addition to the assumption that
the target concept can be described in its hypothesis space, it has an additional inductive bias
assumption: that all instances are negative instances unless the opposite is entailed by its other
knowledge.

Nagarathna C, DEPT. of ISE, SVIT, Bengaluru 17

You might also like