Ai Final
Ai Final
The reasoning is the mental process of deriving logical conclusion and making predictions from
available knowledge, facts, and beliefs. Or we can say, "Reasoning is a way to infer facts from
existing data." It is a general process of thinking rationally, to find valid conclusions.
In artificial intelligence, the reasoning is essential so that the machine can also think rationally
as a human brain, and can perform like a human."
Humans are best at understanding, reasoning, and interpreting knowledge. Human knows
things, which is knowledge and as per their knowledge they perform various actions in the real
world. But how machines do all these things comes under knowledge representation and
reasoning. Hence we can describe Knowledge representation as following:
Knowledge representation and reasoning (KR, KRR) is the part of Artificial intelligence which
concerned with AI agents thinking and how thinking contributes to intelligent behavior of
agents.
It is responsible for representing information about the real world so that a computer can
understand and can utilize this knowledge to solve the complex real world problems such as
diagnosis a medical condition or communicating with humans in natural language.
It is also a way which describes how we can represent knowledge in artificial intelligence.
Knowledge representation is not just storing data into some database, but it also enables an
intelligent machine to learn from that knowledge and experiences so that it can behave
intelligently like a human.
What to Represent:
Object: All the facts about objects in our world domain. E.g., Guitars contains strings, trumpets
are brass instruments.
There are mainly four ways of knowledge representation which are given as follows:
Logical Representation
Semantic Network Representation
Frame Representation
Production Rules
1. Logical Representation
Logical representation is a language with some concrete rules which deals with propositions
and has no ambiguity in representation. Logical representation means drawing a conclusion
based on various conditions. This representation lays down some important communication
rules. It consists of precisely defined syntax and semantics which supports the sound inference.
Each sentence can be translated into logics using syntax and semantics.
Syntax:
Syntaxes are the rules which decide how we can construct legal sentences in the logic.
It determines which symbol we can use in knowledge representation.
How to write those symbols.
Semantics:
Semantics are the rules by which we can interpret the sentence in the logic.
Semantic also involves assigning a meaning to each sentence.
1. Propositional Logics
2. Predicate logics
Logical representations have some restrictions and are challenging to work with.
Logical representation technique may not be very natural, and inference may not be so
efficient.
Semantic networks are alternative of predicate logic for knowledge representation. In Semantic
networks, we can represent our knowledge in the form of graphical networks. This network
consists of nodes representing objects and arcs which describe the relationship between those
objects. Semantic networks can categorize the object in different forms and can also link those
objects. Semantic networks are easy to understand and can be easily extended.
Example: Following are some statements which we need to represent in the form of nodes and
arcs.
Statements:
Jerry is a cat.
Jerry is a mammal
Semantic networks take more computational time at runtime as we need to traverse the
complete network tree to answer some questions. It might be possible in the worst case
scenario that after traversing the entire tree, we find that the solution does not exist in this
network.
Semantic networks try to model human-like memory (Which has 1015 neurons and links) to
store the information, but in practice, it is not possible to build such a vast semantic network.
These types of representations are inadequate as they do not have any equivalent quantifier,
e.g., for all, for some, none, etc.
Semantic networks do not have any standard definition for the link names.
These networks are not intelligent and depend on the creator of the system.
A frame is a record like structure which consists of a collection of attributes and its values to
describe an entity in the world. Frames are the AI data structure which divides knowledge into
substructures by representing stereotypes situations. It consists of a collection of slots and slot
values. These slots may be of any type and sizes. Slots have names and values which are called
facets.
Facets: The various aspects of a slot is known as Facets. Facets are features of frames which
enable us to put constraints on the frames.
Example: IF-NEEDED facts are called when data of any particular slot is needed. A frame may
consist of any number of slots, and a slot may include any number of facets and facets may
have any number of values. A frame is also known as slot-filter knowledge representation in
artificial intelligence.
Frames are derived from semantic networks and later evolved into our modern-day classes and
objects. A single frame is not much useful. Frames system consists of a collection of frames
which are connected. In the frame, knowledge about an object or event can be stored together
in the knowledge base. The frame is a type of technology which is widely used in various
applications including Natural language processing and machine visions.
Example: 1
Slots Filters
Year 1996
Page 1152
Example 2:
Let's suppose we are taking an entity, Peter. Peter is an engineer as a profession, and his age is
25, he lives in city London, and the country is England. So following is the frame representation
for this:
Slots Filter
Name Peter
Profession Doctor
Age 25
Weight 78
The frame knowledge representation makes the programming easier by grouping the related
data.
The frame representation is comparably flexible and used by many applications in AI.
4. Production Rules
Production rules system consist of (condition, action) pairs which mean, "If condition then
action". It has mainly three parts:
The working memory contains the description of the current state of problems-solving and rule
can write knowledge to the working memory. This knowledge match and may fire other rules.
If there is a new situation (state) generates, then multiple production rules will be fired
together, this is called conflict set. In this situation, the agent needs to select a rule from these
sets, and it is called a conflict resolution.
Example:
IF (at bus stop AND bus arrives) THEN action (get into the bus)
IF (on the bus AND paid AND empty seat) THEN action (sit down).
IF (bus arrives at destination) THEN action (get down from the bus).
The production rules are highly modular, so we can easily remove, add or modify an individual
rule.
Production rule system does not exhibit any learning capabilities, as it does not store the result
of the problem for the future uses.
During the execution of the program, many rules may be active hence rule-based production
systems are inefficient.
Propositional logic (PL) is the simplest form of logic where all the statements are made by
propositions. A proposition is a declarative statement which is either true or false. It is a
technique of knowledge representation in logical and mathematical form.
Example:
a) It is Sunday.
d) 5 is a prime number.
The syntax of propositional logic defines the allowable sentences for the knowledge
representation. There are two types of Propositions:
Atomic Propositions
Compound propositions
Atomic Proposition: Atomic propositions are the simple propositions. It consists of a single
proposition symbol. These are the sentences which must be either true or false.
Example:
Example:
Logical Connectives:
Logical connectives are used to connect two simpler propositions or representing a sentence
logically. We can create compound propositions with the help of logical connectives. There are
mainly five connectives, which are given as follows:
Negation: A sentence such as ¬ P is called negation of P. A literal can be either Positive literal or
negative literal.
P= Rohan is intelligent,
Q= Rohan is hardworking. → P∧ Q.
Implication: A sentence such as P → Q, is called an implica on. Implica ons are also known as
if-then rules. It can be represented as
In propositional logic, we need to know the truth values of propositions in all possible
scenarios. We can combine all the possible combination with logical connectives, and the
representation of these combinations in a tabular format is called Truth table. Following are the
truth table for all logical connectives:
The Wumpus World in Artificial intelligence
The Wumpus world is a simple world example to illustrate the worth of a knowledge-based
agent and to represent knowledge representation. It was inspired by a video game Hunt the
Wumpus by Gregory Yob in 1973.
The Wumpus world is a cave which has 4/4 rooms connected with passageways. So there are
total 16 rooms which are connected with each other. We have a knowledge-based agent who
will go forward in this world. The cave has a room with a beast which is called Wumpus, who
eats anyone who enters the room. The Wumpus can be shot by the agent, but the agent has a
single arrow. In the Wumpus world, there are some Pits rooms which are bottomless, and if
agent falls in Pits, then he will be stuck there forever. The exciting thing with this cave is that in
one room there is a possibility of finding a heap of gold. So the agent goal is to find the gold and
climb out the cave without fallen into Pits or eaten by Wumpus. The agent will get a reward if
he comes out with gold, and he will get a penalty if eaten by Wumpus or falls in the pit.
Following is a sample diagram for representing the Wumpus world. It is showing some rooms
with Pits, one room with Wumpus and one agent at (1, 1) square location of the world."
There are also some components which can help the agent to navigate the cave. These
components are given as follows:
The rooms adjacent to the Wumpus room are smelly, so that it would have some
stench.
The room adjacent to PITs has a breeze, so if the agent reaches near to PIT, then he will
perceive the breeze.
There will be glitter in the room if and only if the room has gold.
The Wumpus can be killed by the agent if the agent is facing to it, and Wumpus will emit
a horrible scream which can be heard anywhere in the cave.
Performance measure:
+1000 reward points if the agent comes out of the cave with the gold.
-1000 points penalty for being eaten by the Wumpus or falling into the pit.
-1 for each action, and -10 for using an arrow.
The game ends if either agent dies or came out of the cave.
Environment:
Actuators:
Left turn,
Right turn
Move forward
Grab
Release
Shoot.
Sensors:
The agent will perceive the stench if he is in the room adjacent to the Wumpus. (Not
diagonally).
The agent will perceive breeze if he is in the room directly adjacent to the Pit.
The agent will perceive the glitter in the room where the gold is present.
The agent will perceive the bump if he walks into a wall.
When the Wumpus is shot, it emits a horrible scream which can be perceived anywhere
in the cave.
These percepts can be represented as five element list, in which we will have different
indicators for each sensor.
Example if agent perceives stench, breeze, but no glitter, no bump, and no scream then it can
be represented as:
Partially observable: The Wumpus world is partially observable because the agent can only
perceive the close environment such as an adjacent room.
Deterministic: It is deterministic, as the result and outcome of the world are already known.
One agent: The environment is a single agent as we have one agent only and Wumpus is not
considered as an agent.
Now we will explore the Wumpus world and will determine how the agent will find its goal by
applying logical reasoning.
Initially, the agent is in the first room or on the square [1,1], and we already know that this
room is safe for the agent, so to represent on the below diagram (a) that room is safe we will
add symbol OK. Symbol A is used to represent agent, symbol B for the breeze, G for Glitter or
gold, V for the visited room, P for pits, W for Wumpus.
At Room [1,1] agent does not feel any breeze or any Stench which means the adjacent squares
are also OK.
"Agent's second Step:
Now agent needs to move forward, so it will either move to [1, 2], or [2,1]. Let's suppose agent
moves to the room [2, 1], at this room agent perceives some breeze which means Pit is around
this room. The pit can be in [3, 1], or [2,2], so we will add symbol P? to say that, is this Pit room?
Now agent will stop and think and will not make any harmful move. The agent will go back to
the [1, 1] room. The room [1,1], and [2,1] are visited by the agent, so we will use symbol V to
represent the visited squares.
At the third step, now agent will move to the room [1,2] which is OK. In the room [1,2] agent
perceives a stench which means there must be a Wumpus nearby. But Wumpus cannot be in
the room [1,1] as by rules of the game, and also not in [2,2] (Agent had not detected any stench
when he was at [2,1]). Therefore agent infers that Wumpus is in the room [1,3], and in current
state, there is no breeze which means in [2,2] there is no Pit and no Wumpus. So it is safe, and
we will mark it OK, and the agent moves further in [2,2].
At room [2,2], here no stench and no breezes present so let's suppose agent decides to move to
[2,3]. At room [2,3] agent perceives glitter, so it should grab the gold and climb out of the
cave."
First-Order Logic in Artificial intelligence
In the topic of Propositional logic, we have seen that how to represent statements using
propositional logic. But unfortunately, in propositional logic, we can only represent the facts,
which are either true or false. PL is not sufficient to represent the complex sentences or natural
language statements. The propositional logic has very limited expressive power. Consider the
following sentence, which we cannot represent using PL logic.
To represent the above statements, PL logic is not sufficient, so we required some more
powerful logic, such as first-order logic.
First-Order logic:
FOL is sufficiently expressive to represent the natural language statements in a concise way.
First-order logic is also known as Predicate logic or First-order predicate logic. First-order logic is
a powerful language that develops information about the objects in a more easy way and can
also express the relationship between those objects.
First-order logic (like natural language) does not only assume that the world contains facts like
propositional logic but also assumes the following things in the world:
Objects: A, B, people, numbers, colors, wars, theories, squares, pits, wumpus, ......
Relations: It can be unary relation such as: red, round, is adjacent, or n-any relation such as: the
sister of, brother of, has color, comes between
Function: Father of, best friend, third inning of, end of, ......
Syntax
Semantics
The syntax of FOL determines which collection of symbols is a logical expression in first-order
logic. The basic syntactic elements of first-order logic are symbols. We write statements in
short-hand notation in FOL.
Atomic sentences are the most basic sentences of first-order logic. These sentences are formed
from a predicate symbol followed by a parenthesis with a sequence of terms.
We can represent atomic sentences as Predicate (term1, term2, ......, term n).
Complex Sentences:
Consider the statement: "x is an integer.", it consists of two parts, the first part x is the subject
of the statement and second part "is an integer," is known as a predicate.
These are the symbols that permit to determine or identify the range and scope of the variable
in the logical expression.
Universal Quantifier:
Universal quantifier is a symbol of logical representation, which specifies that the statement
within its range is true for everything or every instance of a particular thing.
For all x
For each x
For every x.
Example:
Let a variable x which refers to a cat so all x can be represented in UOD as below:
It will be read as: There are all x where x is a man who drink coffee.
Existential Quantifier:
Existential quantifiers are the type of quantifiers, which express that the statement within its
scope is true for at least one instance of something.
It is denoted by the logical operator ∃, which resembles as inverted E. When it is used with a
predicate variable then it is called as an existential quantifier.
If x is a variable, then existential quantifier will be ∃x or ∃(x). And it will be read as:
Example:
It will be read as: There are some x where x is a boy who is intelligent.
Points to remember:
Properties of Quantifiers:
And since there are all birds who fly so it will be represented as follows.
∀x bird(x) →fly(x).
In this question, the predicate is "respect(x, y)," where x=man, and y= parent.
Since there is every man so will use ∀, and it will be represented as follows:
In this question, the predicate is "play(x, y)," where x= boys, and y= game. Since there are some
boys so we will use ∃, and it will be represented as:
In this question, the predicate is "like(x, y)," where x= student, and y= subject.
Since there are not all students, so we will use ∀ with negation, so following representation for
this:
In this question, the predicate is "failed(x, y)," where x= student, and y= subject.
Since there is only one student who failed in Mathematics, so we will use following
representation for this:
∃(x) [ student(x) → failed (x, Mathema cs) ∧∀ (y) [¬(x==y) ∧ student(y) → ¬failed (x,
Mathematics)].
The quantifiers interact with variables which appear in a suitable way. There are two types of
variables in First-order logic which are given below:
Free Variable: A variable is said to be a free variable in a formula if it occurs outside the scope
of the quantifier.
Bound Variable: A variable is said to be a bound variable in a formula if it occurs within the
scope of the quantifier.
Forward chaining starts from known facts and applies inference rule to extract more data unit
it reaches to the goal. It is a bottom-up approach. Forward chaining is known as data-driven
inference technique as we reach to the goal using the available data. Forward chaining
reasoning applies a breadth-first search strategy. Backward chaining reasoning applies a
depth-first search strategy. Forward chaining tests for all the available rules. Forward
chaining is suitable for the planning, monitoring, control, and interpretation application.
Forward chaining can generate an infinite number of possible conclusions. It operates in the
forward direction. Forward chaining is aimed for any conclusion.
Backward chaining starts from the goal and works backward through inference rules to find the
required facts that support the goal. It is a top-down approach. Backward chaining is known as
goal-driven technique as we start from the goal and divide into sub-goal to extract the facts.
Backward chaining only tests for few required rules. Backward chaining is suitable for
diagnostic, prescription, and debugging application. Backward chaining generates a finite
number of possible conclusions. It operates in the backward direction. Backward chaining is
only aimed for the required data."
Semantic network
The structural idea is that knowledge can be stored in the form of graphs, with nodes
representing objects in the world, and arcs representing relationships between those objects.
In these networks diagram, nodes appear in form of circles or ellipses or even rectangles which
represents objects such as physical objects, concepts or situations.
Links appear as arrows to express the relationships between objects, and link labels specify
relations.
Relationships provide the basic needed structure for organizing the knowledge, so therefore
objects and relations involved are also not needed to be concrete.
Semantic nets are also referred to as associative nets as the nodes are associated with other
nodes
Representing data
Revealing structure (relations, proximity, relative importance)
Supporting conceptual edition
Supporting navigation
Lexical component: nodes denoting physical objects or links are relationships between objects;
labels denote the specific objects and relationships
Semantic component: Here the definitions are related only to the links and label of nodes,
whereas facts depend on the approval areas.
Procedural part: constructors permit the creation of the new links and nodes. The removal of
links and nodes are permitted by destructors.
Search tree
In computer science, a search tree is a tree data structure used for locating specific keys from
within a set. In order for a tree to function as a search tree, the key for each node must be
greater than any keys in subtrees on the left, and less than any keys in subtrees on the right.
The advantage of search trees is their efficient search time given the tree is reasonably
balanced, which is to say the leaves at either end are of comparable depths. Various search-
tree data structures exist, several of which also allow efficient insertion and deletion of
elements, which operations then have to maintain tree balance.
Search trees are often used to implement an associative array. The search tree algorithm uses
the key from the key–value pair to find a location, and then the application stores the entire
key– value pair at that particular location.
Types of Trees
A Binary Search Tree is a node-based data structure where each node contains a key and two
subtrees, the left and right. For all nodes, the left subtree's key must be less than the node's
key, and the right subtree's key must be greater than the node's key. These subtrees must all
qualify as binary search trees.
The worst-case time complexity for searching a binary search tree is the height of the tree,
which can be as small as O(log n) for a tree with n elements.
B-tree
B-trees are generalizations of binary search trees in that they can have a variable number of
subtrees at each node. While child-nodes have a pre-defined range, they will not necessarily be
filled with data, meaning B-trees can potentially waste some space. The advantage is that
Btrees do not need to be re-balanced as frequently as other self-balancing trees.
Due to the variable range of their node length, B-trees are optimized for systems that read
large blocks of data, they are also commonly used in databases.
An (a,b)-tree is a search tree where all of its leaves are the same depth. Each node has at least a
children and at most b children, while the root has at least 2 children and at most b children.
A ternary search tree is a type of tree that can have 3 nodes: a low child, an equal child, and a
high child. Each node stores a single character and the tree itself is ordered the same way a
binary search tree is, with the exception of a possible third node.
Searching a ternary search tree involves passing in a string to test whether any path contains it.
The time complexity for searching a balanced ternary search tree is O(log n).
Frame
Frames are an artificial intelligence data structure used to divide knowledge into substructures
by representing "stereotyped situations". They were proposed by Marvin Minsky in his 1974
article "A Framework for Representing Knowledge". Frames are the primary data structure used
in artificial intelligence frame language; they are stored as ontologies of sets.
Frames are also an extensive part of knowledge representation and reasoning schemes. They
were originally derived from semantic networks and are therefore part of structure based
knowledge representations. According to Russell and Norvig's "Artificial Intelligence: A Modern
Approach", structural representations assemble " facts about particular object and event types
and arrange the types into a large taxonomic hierarchy analogous to a biological taxonomy".
The frame contains information on how to use the frame, what to expect next, and what to do
when these expectations are not met. Some information in the frame is generally unchanged
while other information, stored in "terminals", usually change. Terminals can be considered as
variables. Top level frames carry information that is always true about the problem in hand,
however, terminals do not have to be true. Their value might change with the new information
encountered. Different frames may share the same terminals.
Each piece of information about a particular frame is held in a slot. The information can
contain:
Frame structure
Facts or Data Values (called facets) Procedures (also called procedural attachments) IF-
NEEDED : deferred evaluation IF-ADDED : updates linked information Default Values For Data
For Procedures Other Frames or Subframes
A frame's terminals are already filled with default values, which is based on how the human
mind works. For example, when a person is told "a boy kicks a ball", most people will visualize a
particular ball (such as a familiar soccer ball) rather than imagining some abstract ball with no
attributes.
One particular strength of frame based knowledge representations is that, unlike semantic
networks, they allow for exceptions in particular instances. This gives frames an amount of
flexibility that allow representations of real world phenomena to be reflected more accurately.
Like semantic networks, frames can be queried using spreading activation. Following the rules
of inheritance, any value given to a slot that is inherited by subframes will be updated
(IFADDED) to the corresponding slots in the subframes and any new instances of a particular
frame will feature that new value as the default.
Worth noticing here is the easy analogical reasoning (comparison) that can be done between a
boy and a monkey just by having similarly named slots.
A script is a structure that prescribes a set of circumstances which could be expected to follow
on from one another. It is similar to a thought sequence or a chain of situations which could be
anticipated. It could be considered to consist of a number of slots or frames but with more
specialised roles.
Prerequisites exist upon events taking place. E.g. when a student progresses through a degree
scheme or when a purchaser buys a house.
Entry Conditions
Results
Props
Roles
Track
-- Variations on the script. Different tracks may share components of the same script.
Scenes
-- The sequence of events that occur. Events are represented in conceptual dependency form.
Hold up a bank.
Gun, G.
Loot, L.
Bag, B
Robber, S.
Cashier, M.
Bank Manager, O.
Policeman, P.
S is poor.
S is destitute.
O is angry.
M is in a state of shock.
P is shot."
Conceptual Dependency:
In 1977, Roger C. Schank has developed a Conceptual Dependency structure. The Conceptual
Dependency is used to represent knowledge of Artificial Intelligence. It should be powerful
enough to represent these concepts of the sentence of natural language. It states that different
sentence which has the same meaning should have some unique representation.
1. Entities
2. Actions
3. Conceptual cases
4. Conceptual dependencies
5. Conceptual tense
3. For any two or more sentences that are identical in meaning. It should be only one
representation of meaning.
Rule-1: It describes the relationship between an actor and the event he or she causes.
Rule-2: It describes the relationship between PP and PA that are asserted to describe it.
Rule-3: It describes the relationship between two PPs, one of which belongs to the set defined
by the other.
Rule-4: It describes the relationship between a PP and an attribute that has already been
predicated on it.
Rule-5: It describes the relationship between two PPs one of which provides a particular kind of
information about the other.
Rule-6: It describes the relationship between an ACT and the PP that is the object of that ACT.
Rule-7: It describes the relationship between an ACT and the source and the recipient of the
ACT.
Rule-8: It describes the relationship between an ACT and the instrument with which it is
performed. This instrument must always be a full conceptualization, not just a single physical
object.
Rule-9: It describes the relationship between an ACT and its physical source and destination.
Rule-10: It represents the relationship between a PP and a state in which it started and another
in which it ended.
Rule-11: It represents the relationship between one conceptualization and another that causes
it.
Rule-12: It represents the relationship between conceptualization and the time at which the
event occurred is described.
Rule-13: It describes the relationship between one conceptualization and another, that is the
time of the first.
Rule-14: It describes the relationship between conceptualization and the place at which it
occurred.
Introduction to Production System in AI
A Production System in AI is a system program that is used to feed some form of artificial
intelligence. It comprises a set of rules to design the characteristic behaviour and involves a
mechanism to obey the rules of the system and respond accordingly. That set of rules is
referred to as production, and it is a fundamental representation for action selection, expert
system, and automated planning. The upcoming section of the articles explains features,
generated rules, advantages, and limitations of the production system in artificial intelligence.
The architecture of every sentence in a production system is uniform and simple. The entire
system is unique, as they execute IF-THEN code in every set of executions. It is a source of
knowledge representations and increases the readability of production rules. Hence it is user-
friendly and can be managed without any complexities and difficulties as they are less prone to
challenging tasks.
The code of the production rule and its related knowledge are available in distinct units. So that
the information can be accessed without any dependencies, it is an array of independent facts
that can be edited easily which has no reflections in the production system. The modularity of
the production system possesses a set of finite dimensions that are easily flexible to any
modifications in the system.
The adaptability for editing or altering the rules is easy and enables the enhancement of
production rules in a skeletal format and then selects a concern application that is perfect and
accurate to execute without any delay or imperfections.
The knowledge base of the production system is intensive and doesn’t find any corrupted data
or false information. The data is stored in a pure format and doesn’t comprise of any controlling
strategy or programming information. The production rule is stated in a simple sentence in
English. The semantic problem is rectified by every part of the representation.
Rules of Production System in AI
The rules in the production system fall into two broad categories, such as abductive inference
rules and deductive inference rules. The representation of rules in the production system is an
important part of the functions of the entire system is dependent on rules. The rules are fed
into the operation of database and control system and can be written as follows,
The representation of rules in the production system is natural and expressed in a simple
format. It has a rapid response to the action cycle, which can recognize and react according to
the separation of control and knowledge. The data or goal-driven is a natural mapping which is
onto research on state space.
The modularity and adaptability of the production rules are efficient and user-friendly. The
flexibility to any modification in the rules is high without affecting the production system.
The production system executes pattern directed control which is more adaptable than
algorithmized control. It enables the exploratory control of search in a hierarchical way if any
complexities occur.
The troubleshooting methods in the production system are reliable, and it takes minimum time
to find the affected parts and provides simple tracing of the systems. It provides a generic
control and informative rules to manage the challenging tasks.
It is a reliable model because of the state-driven attitude of the intelligent machines and
behaves as a reasonable design to the decision making and problem-solving act of humans. It is
robust and provides a rapid response in real-time applications.
Apart from this, the significant features of the production system include incompetence,
opaqueness, lack of learning ability, and resolution of conflicts.
The occurrence of opaqueness is due to the less prioritization of rules. It is executed when
there is any merging or combination of two or more production rules. If the priority of rule is
predetermined, then the probability of opaqueness is less.
Most of the production systems are prone to incompetence in the applied environment. But
well-assembled control methodologies minimize this kind of problem, especially when a
program is executed, multiple rules became active and executed. It happens because there are
many predefined rules in the production system, and a complex search is carried in the
hierarchical method throughout every set of rules for every iteration of a control program.
The production system that depends on the rules doesn’t store the outcomes of the problem,
which helps to solve any future issues. Instead, it goes for every new solution for the same
particular problems and doesn’t exhibit any kind of learning capacities. Hence the lack of
learning capabilities in the production system in artificial intelligence needs to be improvised
for better efficacy and operation.
The rules in the production system should not get involved in any conflict operations. If the
database is updated with new rules, the system should check that there should not be any
conflicts executed between the existing rules and newly updated rules."
An expert system is a computer program that is designed to solve complex problems and to
provide decision-making ability like a human expert. It performs this by extracting knowledge
from its knowledge base using the reasoning and inference rules according to the user queries.
The expert system is a part of AI, and the first ES was developed in the year 1970, which was
the first successful approach of artificial intelligence. It solves the most complex issue as an
expert by extracting the knowledge stored in its knowledge base. The system helps in decision
making for complex problems using both facts and heuristics like a human expert. It is called so
because it contains the expert knowledge of a specific domain and can solve any complex
problem of that particular domain. These systems are designed for a specific domain, such as
medicine, science, etc.
The performance of an expert system is based on the expert's knowledge stored in its
knowledge base. The more knowledge stored in the KB, the more that system improves its
performance. One of the common examples of an ES is a suggestion of spelling errors while
typing in the Google search box.
DENDRAL: It was an artificial intelligence project that was made as a chemical analysis expert
system. It was used in organic chemistry to detect unknown organic molecules with the help of
their mass spectra and knowledge base of chemistry.
MYCIN: It was one of the earliest backward chaining expert systems that was designed to find
the bacteria causing infections like bacteraemia and meningitis. It was also used for the
recommendation of antibiotics and the diagnosis of blood clotting diseases.
PXDES: It is an expert system that is used to determine the type and level of lung cancer. To
determine the disease, it takes a picture from the upper body, which looks like the shadow. This
shadow identifies the type and degree of harm.
CaDeT: The CaDet expert system is a diagnostic support system that can detect cancer at early
stages.
High Performance: The expert system provides high performance for solving any type of
complex problem of a specific domain with high efficiency and accuracy.
Understandable: It responds in a way that can be easily understandable by the user. It can take
input in human language and provides the output in the same way.
Highly responsive: ES provides the result for any complex query within a very short period of
time.
User Interface
Inference Engine
Knowledge Base
1. User Interface
With the help of a user interface, the expert system interacts with the user, takes queries as an
input in a readable format, and passes it to the inference engine. After getting the response
from the inference engine, it displays the output to the user. In other words, it is an interface
that helps a non-expert user to communicate with the expert system to find a solution.
The inference engine is known as the brain of the expert system as it is the main processing unit
of the system. It applies inference rules to the knowledge base to derive a conclusion or deduce
new information. It helps in deriving an error-free solution of queries asked by the user.
With the help of an inference engine, the system extracts the knowledge from the knowledge
base.
Deterministic Inference engine: The conclusions drawn from this type of inference
engine are assumed to be true. It is based on facts and rules.
Probabilistic Inference engine: This type of inference engine contains uncertainty in
conclusions, and based on the probability.
Forward Chaining: It starts from the known facts and rules, and applies the inference
rules to add their conclusion to the known facts.
Backward Chaining: It is a backward reasoning method that starts from the goal and
works backward to prove the known facts.
3. Knowledge Base
The knowledgebase is a type of storage that stores knowledge acquired from the different
experts of the particular domain. It is considered as big storage of knowledge. The more the
knowledge base, the more precise will be the Expert System.
It is similar to a database that contains information and rules of a particular domain or subject.
One can also view the knowledge base as collections of objects and their attributes. Such as a
Lion is an object and its attributes are it is a mammal, it is not a domestic animal, etc.
Factual Knowledge: The knowledge which is based on facts and accepted by knowledge
engineers comes under factual knowledge.
Heuristic Knowledge: This knowledge is based on practice, the ability to guess, evaluation, and
experiences.
Knowledge Representation: It is used to formalize the knowledge stored in the knowledge base
using the If-else rules.
Knowledge Acquisitions: It is the process of extracting, organizing, and structuring the domain
knowledge, specifying the rules to acquire the knowledge.
Development of Expert System
Here, we will explain the working of an expert system by taking an example of MYCIN ES. Below
are some steps to build an MYCIN:
Firstly, ES should be fed with expert knowledge. In the case of MYCIN, human experts
specialized in the medical field of bacterial infection, provide information about the causes,
symptoms, and other knowledge in that domain.
The KB of the MYCIN is updated successfully. In order to test it, the doctor provides a new
problem to it. The problem is to identify the presence of the bacteria by inputting the details of
a patient, including the symptoms, current condition, and medical history.
The ES will need a questionnaire to be filled by the patient to know the general information
about the patient, such as gender, age, etc.
Now the system has collected all the information, so it will find the solution for the problem by
applying if-then rules using the inference engine and using the facts stored within the KB.
In the end, it will provide a response to the patient by using the user interface.
Expert: The success of an ES much depends on the knowledge provided by human experts.
These experts are those persons who are specialized in that specific domain.
Knowledge Engineer: Knowledge engineer is the person who gathers the knowledge from the
domain experts and then codifies that knowledge to the system according to the formalism.
End-User: This is a particular person or a group of people who may not be experts, and working
on the expert system needs the solution or advice for his queries, which are complex.
Before using any technology, we must have an idea about why to use that technology and
hence the same for the ES. Although we have human experts in every field, then what is the
need to develop a computer-based system.
So below are the points that are describing the need of the ES:
No memory Limitations: It can store as much data as required and can memorize it at the time
of its application. But for human experts, there are some limitations to memorize all things at
every time.
High Efficiency: If the knowledge base is updated with the correct knowledge, then it provides a
highly efficient output, which may not be possible for a human.
Expertise in a domain: There are lots of human experts in each domain, and they all have
different skills, different experiences, and different skills, so it is not easy to get a final output
for the query. But if we put the knowledge gained from human experts into the expert system,
then it provides an efficient output by mixing all the facts and knowledge
Not affected by emotions: These systems are not affected by human emotions such as fatigue,
anger, depression, anxiety, etc.. Hence the performance remains constant.
High security: These systems provide high security to resolve any query.
Considers all the facts: To respond to any query, it checks and considers all the available facts
and provides the result accordingly. But it is possible that a human expert may not consider
some facts due to any reason.
Regular updates improve the performance: If there is an issue in the result provided by the
expert systems, we can improve the performance of the system by updating the knowledge
base."
Advising: It is capable of advising the human being for the query of any domain from the
particular ES.
Demonstrate a device: It is capable of demonstrating any new products such as its features,
specifications, how to use that product, etc.
Diagnosis: An ES designed for the medical field is capable of diagnosing a disease without using
multiple components as it already contains various inbuilt medical tools.
They can be used for risky places where the human presence is not safe.
The performance of these systems remains steady as it is not affected by emotions, tension, or
fatigue.
The response of the expert system may get wrong if the knowledge base contains the wrong
information.
Like a human being, it cannot produce a creative output for different scenarios.
For each domain, we require a specific ES, which is one of the big limitations.
It can be broadly used for designing and manufacturing physical devices such as camera lenses
and automobiles.
These systems are primarily used for publishing the relevant knowledge to the users. The two
popular ES used for this domain is an advisor and a tax advisor.
In the finance domain
In the finance industries, it is used to detect any type of possible fraud, suspicious activity, and
advise bankers that if they should provide loans for business or not.
In medical diagnosis, the ES system is used, and it was the first area where these systems were
used.
The expert systems can also be used for planning and scheduling some particular tasks for
achieving the goal of that task.
Artificial Neural Network Tutorial provides basic and advanced concepts of ANNs. Our Artificial
Neural Network tutorial is developed for beginners as well as professions.
The term "Artificial neural network" refers to a biologically inspired sub-field of artificial
intelligence modeled after the brain. An Artificial neural network is usually a computational
network based on biological neural networks that construct the structure of the human brain.
Similar to a human brain has neurons interconnected to each other, artificial neural networks
also have neurons that are linked to each other in various layers of the networks. These
neurons are known as nodes.
Artificial neural network tutorial covers all the aspects related to the artificial neural network.
In this tutorial, we will discuss ANNs, Adaptive resonance theory, Kohonen self-organizing map,
Building blocks, unsupervised learning, Genetic algorithm, etc.
What is Artificial Neural Network?
The term "Artificial Neural Network" is derived from Biological neural networks that develop
the structure of a human brain. Similar to the human brain that has neurons interconnected to
one another, artificial neural networks also have neurons that are interconnected to one
another in various layers of the networks. These neurons are known as nodes.
The given figure illustrates the typical diagram of Biological Neural Network.
The typical Artificial Neural Network looks something like the given figure.
Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks, cell
nucleus represents Nodes, synapse represents Weights, and Axon represents Output.
Dendrites Inputs
Synapse Weights
Axon Output
An Artificial Neural Network in the field of Artificial intelligence where it attempts to mimic the
network of neurons makes up a human brain so that computers will have an option to
understand things and make decisions in a human-like manner. The artificial neural network is
designed by programming computers to behave simply like interconnected brain cells.
There are around 1000 billion neurons in the human brain. Each neuron has an association
point somewhere in the range of 1,000 and 100,000. In the human brain, data is stored in such
a manner as to be distributed, and we can extract more than one piece of this data when
necessary from our memory parallelly. We can say that the human brain is made up of
incredibly amazing parallel processors.
We can understand the artificial neural network with an example, consider an example of a
digital logic gate that takes an input and gives an output. "OR" gate, which takes two inputs. If
one or both the inputs are "On," then we get "On" in output. If both the inputs are "Off," then
we get "Off" in output. Here the output depends upon input. Our brain does not perform the
same task. The outputs to inputs relationship keep changing because of the neurons in our
brain, which are "learning."
Input Layer:
As the name suggests, it accepts inputs in several different formats provided by the
programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the calculations
to find hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally results
in output that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs and
includes a bias. This computation is represented in the form of a transfer function.
It determines weighted total is passed as an input to an activation function to produce the
output. Activation functions choose whether a node should fire or not. Only those who are fired
make it to the output layer. There are distinctive activation functions available that can be
applied upon the sort of task we are performing.
Artificial neural networks have a numerical value that can perform more than one task
simultaneously.
Data that is used in traditional programming is stored on the whole network, not on a database.
The disappearance of a couple of pieces of data in one place doesn't prevent the network from
working.
After ANN training, the information may produce output even with inadequate data. The loss of
performance here relies upon the significance of missing data.
For ANN is to be able to adapt, it is important to determine the examples and to encourage the
network according to the desired output by demonstrating these examples to the network. The
succession of the network is directly proportional to the chosen instances, and if the event can't
appear to the network in all its aspects, it can produce false output.
Extortion of one or more cells of ANN does not prohibit it from generating output, and this
feature makes the network fault-tolerance.
There is no particular guideline for determining the structure of artificial neural networks. The
appropriate network structure is accomplished through experience, trial, and error.
Hardware dependence:
Artificial neural networks need processors with parallel processing power, as per their
structure. Therefore, the realization of the equipment is dependent.
ANNs can work with numerical data. Problems must be converted into numerical values before
being introduced to ANN. The presentation mechanism to be resolved here will directly impact
the performance of the network. It relies on the user's abilities.
The network is reduced to a specific value of the error, and this value does not give us optimum
results."
Perceptron
A perceptron is a neural network unit (an artificial neuron) that does certain computations to
detect features or business intelligence in the input data. And this perceptron tutorial will give
you an in-depth knowledge of Perceptron and its activation functions."
Single layer - Single layer perceptrons can learn only linearly separable patterns
Multilayer - Multilayer perceptrons or feedforward neural networks with two or more
layers have the greater processing power
The Perceptron algorithm learns the weights for the input signals in order to draw a linear
decision boundary.
This enables you to distinguish between the two linearly separable classes +1 and -1.
Note: Supervised Learning is a type of Machine Learning used to learn models from labeled
training data. It enables output prediction for future or unseen data. Let us focus on the
Perceptron Learning Rule in the next section.
Perceptron Learning Rule states that the algorithm would automatically learn the optimal
weight coefficients. The input features are then multiplied with these weights to determine if a
neuron fires or not.
The Perceptron receives multiple input signals, and if the sum of the input signals exceeds a
certain threshold, it either outputs a signal or does not return an output. In the context of
supervised learning and classification, this can then be used to predict the class of a sample.
Perceptron Function
Perceptron is a function that maps its input “x,” which is multiplied with the learned weight
coefficient; an output value ”f(x)”is generated.
“b” = bias (an element that adjusts the boundary away from origin without any dependence on
the input value)
The output can be represented as “1” or “0.” It can also be represented as “1” or “-1”
depending on which activation function is used.
Inputs of a Perceptron
A Perceptron accepts inputs, moderates them with certain weight values, then applies the
transformation function to output the final result. The image below shows a Perceptron with a
Boolean output
A Boolean output is based on inputs such as salaried, married, age, past credit profile, etc. It has
only two values: Yes and No or True and False. The summation function “∑” multiplies all inputs
of “x” by weights “w” and then adds them up as follows:
The activation function applies a step rule (convert the numerical output into +1 or -1) to check
if the output of the weighting function is greater than zero or not.
For example:
Step function gets triggered above a certain value of the neuron output; else it outputs zero.
Sign Function outputs +1 or -1 depending on whether neuron output is greater than zero or
not. Sigmoid is the S-curve and outputs a value between 0 and 1
Output of Perceptron
Output: o(x1….xn)
If ∑w.x > 0, output is +1, else -1. The neuron gets triggered only when weighted input reaches a
certain threshold value.
An output of +1 specifies that the neuron is triggered. An output of -1 specifies that the neuron
did not get triggered.
Want to check the Course Preview of Deep Learing? Click here to watch!
Error in Perceptron
In the Perceptron Learning Rule, the predicted output is compared with the known output. If it
does not match, the error is propagated backward to allow weight adjustment to happen.
Bias Unit
For simplicity, the threshold θ can be brought to the left and represented as w0x0, where w0= -
θ and x0= 1.
The decision function squashes wTx to either +1 or -1 and how it can be used to discriminate
between two linearly separable classes.
Natural Language Processing, or NLP for short, is broadly defined as the automatic
manipulation of natural language, like speech and text, by software.
The study of natural language processing has been around for more than 50 years and grew out
of the field of linguistics with the rise of computers."
Natural language refers to the way we, humans, communicate with each other.
Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial
intelligence concerned with the interactions between computers and human language, in
particular how to program computers to process and analyze large amounts of natural language
data. The goal is a computer capable of "understanding" the contents of documents, including
the contextual nuances of the language within them. The technology can then accurately
extract information and insights contained in the documents as well as categorize and organize
the documents themselves.
History
Natural language processing has its roots in the 1950s. Already in 1950, Alan Turing published
an article titled "Computing Machinery and Intelligence" which proposed what is now called the
Turing test as a criterion of intelligence, though at the time that was not articulated as a
problem separate from artificial intelligence. The proposed test includes a task that involves the
automated interpretation and generation of natural language.
The premise of symbolic NLP is well-summarized by John Searle's Chinese room experiment:
Given a collection of rules (e.g., a Chinese phrasebook, with questions and matching answers),
the computer emulates natural language understanding (or other NLP tasks) by applying those
rules to the data it confronts.
1950s: The Georgetown experiment in 1954 involved fully automatic translation of more than
sixty Russian sentences into English. The authors claimed that within three or five years,
machine translation would be a solved problem. However, real progress was much slower, and
after the ALPAC report in 1966, which found that ten-year-long research had failed to fulfill the
expectations, funding for machine translation was dramatically reduced. Little further research
in machine translation was conducted until the late 1980s when the first statistical machine
translation systems were developed.
1960s: Some notably successful natural language processing systems developed in the 1960s
were SHRDLU, a natural language system working in restricted "blocks worlds" with restricted
vocabularies, and ELIZA, a simulation of a Rogerian psychotherapist, written by Joseph
Weizenbaum between 1964 and 1966. Using almost no information about human thought or
emotion, ELIZA sometimes provided a startlingly human-like interaction. When the "patient"
exceeded the very small knowledge base, ELIZA might provide a generic response, for example,
responding to "My head hurts" with "Why do you say your head hurts?".
1970s: During the 1970s, many programmers began to write "conceptual ontologies", which
structured real-world information into computer-understandable data. Examples are MARGIE
(Schank, 1975), SAM (Cullingford, 1978), PAM (Wilensky, 1978), TaleSpin (Meehan, 1976),
QUALM (Lehnert, 1977), Politics (Carbonell, 1979), and Plot Units (Lehnert 1981). During this
time, the first many chatterbots were written (e.g., PARRY).
1980s: The 1980s and early 1990s mark the hey-day of symbolic methods in NLP. Focus areas of
the time included research on rule-based parsing (e.g., the development of HPSG as a
computational operationalization of generative grammar), morphology (e.g., two-level
morphology, semantics (e.g., Lesk algorithm), reference (e.g., within Centering Theory and
other areas of natural language understanding (e.g., in the Rhetorical Structure Theory). Other
lines of research were continued, e.g., the development of chatterbots with Racter and
Jabberwacky. An important development (that eventually led to the statistical turn in the
1990s) was the rising importance of quantitative evaluation in this period.[5] Statistical NLP
(1990s–2010s)
Up to the 1980s, most natural language processing systems were based on complex sets of
hand-written rules. Starting in the late 1980s, however, there was a revolution in natural
language processing with the introduction of machine learning algorithms for language
processing. This was due to both the steady increase in computational power (see Moore's law)
and the gradual lessening of the dominance of Chomskyan theories of linguistics (e.g.
transformational grammar), whose theoretical underpinnings discouraged the sort of corpus
linguistics that underlies the machine-learning approach to language processing.[6]
1990s: Many of the notable early successes on statistical methods in NLP occurred in the field
of machine translation, due especially to work at IBM Research. These systems were able to
take advantage of existing multilingual textual corpora that had been produced by the
Parliament of Canada and the European Union as a result of laws calling for the translation of
all governmental proceedings into all official languages of the corresponding systems of
government. However, most other systems depended on corpora specifically developed for the
tasks implemented by these systems, which was (and often continues to be) a major limitation
in the success of these systems. As a result, a great deal of research has gone into methods of
more effectively learning from limited amounts of data. 2000s: With the growth of the web,
increasing amounts of raw (unannotated) language data has become available since the mid-
1990s. Research has thus increasingly focused on unsupervised and semi-supervised learning
algorithms. Such algorithms can learn from data that has not been hand-annotated with the
desired answers or using a combination of annotated and non-annotated data. Generally, this
task is much more difficult than supervised learning, and typically produces less accurate results
for a given amount of input data. However, there is an enormous amount of non-annotated
data available (including, among other things, the entire content of the World Wide Web),
which can often make up for the inferior results if the algorithm used has a low enough time
complexity to be practical.
In the 2010s, representation learning and deep neural network-style machine learning methods
became widespread in natural language processing. That popularity was due partly to a flurry of
results showing that such techniques can achieve state-of-the-art results in many natural
language tasks, e.g., in language modeling and parsing. This is increasingly important in
medicine and healthcare, where NLP helps analyze notes and text in electronic health records
that would otherwise be inaccessible for study when seeking to improve care.
Formal grammar
Formal grammar is a set of rules for rewriting strings, along with a "start symbol" from which
rewriting starts. Therefore, a grammar is usually thought of as a language generator. However,
it can also sometimes be used as the basis for a "recognizer"—a function in computing that
determines whether a given string belongs to the language or is grammatically incorrect. To
describe such recognizers, formal language theory uses separate formalisms, known as
automata theory. One of the interesting results of automata theory is that it is not possible to
design a recognizer for certain formal languages.[1]Parsing is the process of recognizing an
utterance (a string in natural languages) by breaking it down to a set of symbols and analyzing
each one against the grammar of the language. Most languages have the meanings of their
utterances structured according to their syntax—a practice known as compositional semantics.
As a result, the first step to describing the meaning of an utterance in language is to break it
down part by part and look at its analyzed form (known as its parse tree in computer science,
and as its deep structure in generative grammar).
WordNet
WordNet is a lexical database of semantic relations between words in more than 200
languages. WordNet links words into semantic relations including synonyms, hyponyms, and
meronyms. The synonyms are grouped into synsets with short definitions and usage examples.
WordNet can thus be seen as a combination and extension of a dictionary and thesaurus. While
it is accessible to human users via a web browser, its primary use is in automatic text analysis
and artificial intelligence applications. WordNet was first created in the English language[4] and
the English WordNet database and software tools have been released under a BSD style license
and are freely available for download from that WordNet website.