AI Notes
AI Notes
UNIT – 1 (NOTES)
Philosophy of AI
While exploiting the power of the computer systems, the curiosity of human, lead him to
wonder, “Can a machine think and behave like humans do?” Thus, the development of AI
started with the intention of creating similar intelligence in machines that we find and
regard high in humans.
1
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Goals of AI
To Create Expert Systems − The systems which exhibit intelligent behavior, learn,
demonstrate, explain, and advice its users.
To Implement Human Intelligence in Machines − Creating systems that understand,
think, learn, and behave like humans.
Program Program
ming ming
Without With AI
AI
A A
2
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
compute compute
r r
program program
without with AI
AI can can
answer answer
the speci the gener
fic questi ic questio
ons it is ns it is
meant to meant to
solve. solve.
Modifica AI
tion in programs
the can
program absorb
leads to new
change in modifica
its tions by
structure putting
. highly
independ
ent
pieces of
informati
on
together.
Hence
you can
modify
even a
3
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
minute
piece of
informati
on of
program
without
affecting
its
structure
.
Modifica
tion is
not quick
and easy. Quick
It may and Easy
lead to program
affecting modifica
the tion.
program
adversely
.
What is AI Technique?
In the real world, the knowledge has some unwelcomed properties −
Its volume is huge, next to unimaginable.
It is not well-organized or well-formatted.
It keeps changing constantly.
AI Technique is a manner to organize and use the knowledge efficiently in such a way that −
It should be perceivable by the people who provide it.
It should be easily modifiable to correct errors.
It should be useful in many situations though it is incomplete or inaccurate.
4
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
AI techniques elevate the speed of execution of the complex program it is equipped with.
Applications of AI
AI has been dominant in various fields such as −
Gaming − AI plays crucial role in strategic games such as chess, poker, tic-tac-toe,
etc., where machine can think of large number of possible positions based on
heuristic knowledge.
Natural Language Processing − It is possible to interact with the computer that
understands natural language spoken by humans.
Expert Systems − There are some applications which integrate machine, software,
and special information to impart reasoning and advising. They provide explanation
and advice to the users.
Vision Systems − These systems understand, interpret, and comprehend visual input
on the computer. For example,
o A spying aeroplane takes photographs, which are used to figure out spatial
information or map of the areas.
o Doctors use clinical expert system to diagnose the patient.
o Police use computer software that can recognize the face of criminal with
the stored portrait made by forensic artist.
Speech Recognition − Some intelligent systems are capable of hearing and
comprehending the language in terms of sentences and their meanings while a
human talks to it. It can handle different accents, slang words, noise in the
background, change in human’s noise due to cold, etc.
Handwriting Recognition − The handwriting recognition software reads the text
written on paper by a pen or on screen by a stylus. It can recognize the shapes of
the letters and convert it into editable text.
Intelligent Robots − Robots are able to perform the tasks given by a human. They
have sensors to detect physical data from the real world such as light, heat,
temperature, movement, sound, bump, and pressure. They have efficient
processors, multiple sensors and huge memory, to exhibit intelligence. In addition,
5
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
they are capable of learning from their mistakes and they can adapt to the new
environment.
History of AI
Here is the history of AI during 20th century −
1945 Isaac Asimov, a Columbia University alumni, coined the term Robotics.
6
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
1985 Harold Cohen created and demonstrated the drawing program, Aaron.
The Deep Blue Chess Program beats the then world chess champion,
1997
Garry Kasparov.
LISP
It serves as a common language, which can be easily extended for specific implementation.
Programs written in Common LISP do not depend on machine-specific characteristics, such
as word length etc.
7
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
It provides wide-ranging data types like, objects, structures, lists, vectors, adjustable
arrays, hash-tables, and symbols.
It is expression-based.
It provides an object-oriented condition system.
It provides a complete I/O library.
It provides extensive control structures.
What is Prolog?
Prolog stands for Programming in logic. It is used in artificial intelligence
programming.
Prolog is a declarative programming language.
For example: While implementing the solution for a given problem, instead of specifying
the ways to achieve a certain goal in a specific situation, user needs to specify about the
situation (rules and facts) and the goal (query). After these stages, Prolog interpreter
derives the solution.
Prolog is useful in AI, NLP, databases but useless in other areas such as graphics or
numerical algorithms.
Prolog facts
8
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
In Prolog, facts are used to form the statements. Facts consist of a specific item or
relation between two or more items.
It is very simple to convert English sentence into Prolog facts. Some examples are explained
in the following table.
In the above table, the statement 'Dog is barking' is a fact, while the
statement 'Jaya likes food if it is delicious' is called rule. In this statement, variable
like 'Food' has a first letter in capital, because its value came from previous fact. The
symbol ':-' is used to denote that “Jaya likes delicious food”.
Advantages:
1. Easy to build database. Doesn’t need a lot of programming effort.
2. Pattern matching is easy. Search is recursion based.
3. It has built in list handling. Makes it easier to play with any algorithm involving lists.
Disadvantages:
1. LISP (another logic programming language) dominates over prolog with respect to I/O
features.
2. Sometimes input and output is not easy.
Applications:
Prolog is highly used in artificial intelligence (AI). Prolog is also used for pattern matching
over natural language parse trees.
9
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Uninformed/Blind Search:
The uninformed search does not contain any domain knowledge such as closeness, the
location of the goal. It operates in a brute-force way as it only includes information about
how to traverse the tree and how to identify leaf and goal nodes. Uninformed search applies
a way in which search tree is searched without any information about the search space like
initial state operators and test for the goal, so it is also called blind search.It examines each
node of the tree until it achieves the goal node.
Informed Search
Heuristic search.
A heuristic is a way which might not always be guaranteed for best solutions but guaranteed
to find a good solution in reasonable time. Informed search can solve much complex
problem which could not be solved in another way. An example of informed search
algorithms is a traveling salesman problem.
1. Greedy Search
2. A* Search
10
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
1. Breadth-first Search:
o Breadth-first search is the most common search strategy for traversing a tree or
graph. This algorithm searches breadthwise in a tree or graph, so it is called breadth-
first search.
o BFS algorithm starts searching from the root node of the tree and expands all
successor node at the current level before moving to nodes of next level.
o The breadth-first search algorithm is an example of a general-graph search
algorithm.
o Breadth-first search implemented using FIFO queue data structure.
Advantages:
o BFS will provide a solution if any solution exists.
o If there are more than one solutions for a given problem, then BFS will provide the
minimal solution which requires the least number of steps.
Disadvantages:
o It requires lots of memory since each level of the tree must be saved into memory to
expand the next level.
o BFS needs lots of time if the solution is far away from the root node.
11
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Example:
In the below tree structure, we have shown the traversing of the tree using BFS algorithm
from the root node S to goal node K. BFS search algorithm traverse in layers, so it will follow
the path which is shown by the dotted arrow, and the traversed path will be:
1. S---> A--->B---->C--->D---->G--->H--->E---->F---->I---->K
Time Complexity: Time Complexity of BFS algorithm can be obtained by the number of
nodes traversed in BFS until the shallowest Node. Where the d= depth of shallowest
solution and b is a node at every state.
T (b) = 1+b2+b3+.......+ bd= O (bd)
Space Complexity: Space complexity of BFS algorithm is given by the Memory size of
frontier which is O(bd).
Completeness: BFS is complete, which means if the shallowest goal node is at some finite
depth, then BFS will find a solution.
Optimality: BFS is optimal if path cost is a non-decreasing function of the depth of the node.
2. Depth-first Search
o Depth-first search isa recursive algorithm for traversing a tree or graph data
structure.
o It is called the depth-first search because it starts from the root node and follows
each path to its greatest depth node before moving to the next path.
o DFS uses a stack data structure for its implementation.
12
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Advantages:
o DFS requires very less memory as it only needs to store a stack of the nodes on the
path from root node to the current node.
o It takes less time to reach to the goal node than BFS algorithm (if it traverses in the
right path).
Disadvantage:
o There is the possibility that many states keep re-occurring, and there is no guarantee
of finding the solution.
o DFS algorithm goes for deep down searching and sometime it may go to the infinite
loop.
Example:
In the below search tree, we have shown the flow of depth-first search, and it will follow the
order as:
Root node--->Left node ----> right node.
It will start searching from root node S, and traverse A, then B, then D and E, after traversing
E, it will backtrack the tree as E has no other successor and still goal node is not found. After
backtracking it will traverse node C and then G, and here it will terminate as it found goal
node.
Completeness: DFS search algorithm is complete within finite state space as it will expand
every node within a limited search tree.
13
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Time Complexity: Time complexity of DFS will be equivalent to the node traversed by the
algorithm. It is given by:
T(n)= 1+ n2+ n3 +.........+ nm=O(nm)
Where, m= maximum depth of any node and this can be much larger than d (Shallowest
solution depth)
Space Complexity: DFS algorithm needs to store only single path from the root node, hence
space complexity of DFS is equivalent to the size of the fringe set, which is O(bm).
Optimal: DFS search algorithm is non-optimal, as it may generate a large number of steps or
high cost to reach to the goal node
Informed Search Algorithms
So far we have talked about the uninformed search algorithms which looked through search
space for all possible solutions of the problem without having any additional knowledge
about search space. But informed search algorithm contains an array of knowledge such as
how far we are from the goal, path cost, how to reach to goal node, etc. This knowledge
help agents to explore less to the search space and find more efficiently the goal node.
The informed search algorithm is more useful for large search space. Informed search
algorithm uses the idea of heuristic, so it is also called Heuristic search.
Heuristics function: Heuristic is a function which is used in Informed Search, and it finds the
most promising path. It takes the current state of the agent as its input and produces the
estimation of how close agent is from the goal. The heuristic method, however, might not
always give the best solution, but it guaranteed to find a good solution in reasonable time.
Heuristic function estimates how close a state is to the goal. It is represented by h(n), and it
calculates the cost of an optimal path between the pair of states. The value of the heuristic
function is always positive.
14
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
On each iteration, each node n with the lowest heuristic value is expanded and generates all
its successors and n is placed to the closed list. The algorithm continues unit a goal state is
found.
In the informed search we will discuss two main algorithms which are given below:
Greedy best-first search algorithm always selects the path which appears best at that
moment. It is the combination of depth-first search and breadth-first search algorithms. It
uses the heuristic function and search. Best-first search allows us to take the advantages of
both algorithms. With the help of best-first search, at each step, we can choose the most
promising node. In the best first search algorithm, we expand the node which is closest to
the goal node and the closest cost is estimated by heuristic function, i.e.
1. f(n)= g(n).
Were, h(n)= estimated cost from node n to the goal.
The greedy best first algorithm is implemented by the priority queue.
Best first search algorithm:
15
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
o Step 5: Check each successor of node n, and find whether any node is a goal node or
not. If any successor node is goal node, then return success and terminate the
search, else proceed to Step 6.
o Step 6: For each successor node, algorithm checks for evaluation function f(n), and
then check if the node has been in either OPEN or CLOSED list. If the node has not
been in both list, then add it to the OPEN list.
o Step 7: Return to Step 2.
Advantages:
o Best first search can switch between BFS and DFS by gaining the advantages of both
the algorithms.
o This algorithm is more efficient than BFS and DFS algorithms.
Disadvantages:
Consider the below search problem, and we will traverse it using greedy best-first search. At
each iteration, each node is expanded using evaluation function f(n)=h(n) , which is given in
the below table.
16
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
In this search example, we are using two lists which are OPEN and CLOSED Lists. Following
are the iteration for traversing the above example.
Time Complexity: The worst case time complexity of Greedy best first search is O(b m).
Space Complexity: The worst case space complexity of Greedy best first search is O(b m).
Where, m is the maximum depth of the search space.
Complete: Greedy best-first search is also incomplete, even if the given state space is finite.
Optimal: Greedy best first search algorithm is not optimal.
A* search is the most commonly known form of best-first search. It uses heuristic function
h(n), and cost to reach the node n from the start state g(n). It has combined features of UCS
17
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
and greedy best-first search, by which it solve the problem efficiently. A* search algorithm
finds the shortest path through the search space using the heuristic function. This search
algorithm expands less search tree and provides optimal result faster. A* algorithm is similar
to UCS except that it uses g(n)+h(n) instead of g(n).
In A* search algorithm, we use search heuristic as well as the cost to reach the node. Hence
we can combine both costs as following, and this sum is called as a fitness number.
At each point in the search space, only those node is expanded which have the lowest value
of f(n), and the algorithm terminates when the goal node is found.
Algorithm of A* search:
Advantages:
18
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
o It does not always produce the shortest path as it mostly based on heuristics and
approximation.
o A* search algorithm has some complexity issues.
o The main drawback of A* is memory requirement as it keeps all generated nodes in
the memory, so it is not practical for various large-scale problems.
Example:
In this example, we will traverse the given graph using the A* algorithm. The heuristic value
of all states is given in the below table so we will calculate the f(n) of each state using the
formula f(n)= g(n) + h(n), where g(n) is the cost to reach any node from start state.
Here we will use OPEN and CLOSED list.
Solution:
19
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Points to remember:
o A* algorithm returns the path which occurred first, and it does not search for all
remaining paths.
o The efficiency of A* algorithm depends on the quality of heuristic.
o A* algorithm expands all nodes which satisfy the condition f(n)<="" li="">
Complete: A* algorithm is complete as long as:
o Branching factor is finite.
o Cost at every action is fixed.
Optimal: A* search algorithm is optimal if it follows below two conditions:
o Admissible: the first condition requires for optimality is that h(n) should be an
admissible heuristic for A* tree search. An admissible heuristic is optimistic in nature.
o Consistency: Second required condition is consistency for only A* graph-search.
If the heuristic function is admissible, then A* tree search will always find the least cost
path.
20
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
o Hill climbing algorithm is a local search algorithm which continuously moves in the
direction of increasing elevation/value to find the peak of the mountain or best
solution to the problem. It terminates when it reaches a peak value where no
neighbor has a higher value.
o Hill climbing algorithm is a technique which is used for optimizing the mathematical
problems. One of the widely discussed examples of Hill climbing algorithm is
Traveling-salesman Problem in which we need to minimize the distance traveled by
the salesman.
o It is also called greedy local search as it only looks to its good immediate neighbor
state and not beyond that.
o A node of hill climbing algorithm has two components which are state and value.
o Hill Climbing is mostly used when a good heuristic is available.
o In this algorithm, we don't need to maintain and handle the search tree or graph as it
only keeps a single current state.
21
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
o No backtracking: It does not backtrack the search space, as it does not remember
the previous states.
22
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Simple hill climbing is the simplest way to implement a hill climbing algorithm. It only
evaluates the neighbor node state at a time and selects the first one which optimizes
current cost and set it as a current state. It only checks it's one successor state, and if it
finds better than the current state, then move else be in the same state. This algorithm has
the following features:
o Less time consuming
o Less optimal solution and the solution is not guaranteed
Algorithm for Simple Hill Climbing:
o Step 1: Evaluate the initial state, if it is goal state then return success and Stop.
o Step 2: Loop Until a solution is found or there is no new operator left to apply.
o Step 3: Select and apply an operator to the current state.
o Step 4: Check new state:
1. If it is goal state, then return success and quit.
2. Else if it is better than the current state then assign new state as a current
state.
3. Else if not better than the current state, then return to step2.
b. Step 5: Exit.
The steepest-Ascent algorithm is a variation of simple hill climbing algorithm. This algorithm
examines all the neighboring nodes of the current state and selects one neighbor node
which is closest to the goal state. This algorithm consumes more time as it searches for
multiple neighbors
23
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
o Step 1: Evaluate the initial state, if it is goal state then return success and stop, else
make current state as initial state.
o Step 2: Loop until a solution is found or the current state does not change.
1. Let SUCC be a state such that any successor of the current state will be better
than it.
2. For each operator that applies to the current state:
I. Apply the new operator and generate a new state.
II. Evaluate the new state.
III. If it is goal state, then return it and quit, else compare it to the SUCC.
IV. If it is better than SUCC, then set new state as SUCC.
V. If the SUCC is better than the current state, then set current state to
SUCC.
b. Step 5: Exit.
3. Stochastic hill climbing:
Stochastic hill climbing does not examine for all its neighbor before moving. Rather, this
search algorithm selects one neighbor node at random and decides whether to choose it as
a current state or examine another state.
24
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
2. Plateau: A plateau is the flat area of the search space in which all the neighbor states of
the current state contains the same value, because of this algorithm does not find any best
direction to move. A hill-climbing search might be lost in the plateau area.
Solution: The solution for the plateau is to take big steps or very little steps while searching,
to solve the problem. Randomly select a state which is far away from the current state so it
is possible that the algorithm could find non-plateau region.
3. Ridges: A ridge is a special form of the local maximum. It has an area which is higher than
its surrounding areas, but itself has a slope, and cannot be reached in a single move.
Solution: With the use of bidirectional search, or by moving in different directions, we can
improve this problem.
Simulated Annealing:
A hill-climbing algorithm which never makes a move towards a lower value guaranteed to
be incomplete because it can get stuck on a local maximum. And if algorithm applies a
random walk, by moving a successor, then it may complete but not efficient. Simulated
Annealing is an algorithm which yields both efficiency and completeness. In mechanical
25
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
A* Algorithm
A* Algorithm is one of the best and popular techniques used for path finding and
graph traversals.
A lot of games and web-based maps use this algorithm for finding the shortest path
efficiently.
It is essentially a best first search algorithm.
Here,
‘n’ is the last node on the path
g(n) is the cost of the path from start node to node ‘n’
h(n) is a heuristic function that estimates cost of the cheapest path from node ‘n’ to
the goal node.
Algorithm Steps:
The implementation of A* Algorithm involves maintaining two lists- OPEN and
CLOSED.
26
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
OPEN contains those nodes that have been evaluated by the heuristic function but
have not been expanded into successors yet.
CLOSED contains those nodes that have already been visited.
Step-02:
Step-03:
Remove node n with the smallest value of f(n) from OPEN and move it to list
CLOSED.
If node n is a goal state, return success and exit.
Step-04:
Expand node n.
Step-05:
If any successor to n is the goal node, return success and the solution by tracing the
path from goal node to S.
Otherwise, go to Step-06.
Step-06:
27
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Step-07:
Go back to Step-02.
Like A* algorithm here we will use two arrays and one heuristic function.
OPEN:
It contains the nodes that has been traversed but yet not been marked solvable or
unsolvable.
CLOSE:
It contains the nodes that have already been processed.
Algorithm:
Step 1: Place the starting node into OPEN.
Step 2: Compute the most promising solution tree say T0.
Step 3: Select a node n that is both on OPEN and a member of T0. Remove it from OPEN
and place it in CLOSE.
Step 4: If n is the terminal goal node then level n as solved and leveled all the ancestors of n
as solved. If the starting node is marked as solved then success and exit.
Step 5: If n is not a solvable node, then mark n as unsolvable. If starting node is marked as
unsolvable, then return failure and exit.
Step 6: Expand n. Find all its successors and find their h (n) value, push them into OPEN.
Step 7: Return to Step 2.
28
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Step 8: Exit.
29
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Minimax(node, 3, true)
Working of Min-Max Algorithm:
o The working of the minimax algorithm can be easily described using an example.
Below we have taken an example of game-tree which is representing the two-player
game.
o In this example, there are two players one is called Maximizer and other is called
Minimizer.
o Maximizer will try to get the Maximum possible score, and Minimizer will try to get
the minimum possible score.
o This algorithm applies DFS, so in this game-tree, we have to go all the way through
the leaves to reach the terminal nodes.
o At the terminal node, the terminal values are given so we will compare those value
and backtrack the tree until the initial state occurs. Following are the main steps
involved in solving the two-player game tree:
Step-1: In the first step, the algorithm generates the entire game-tree and apply the utility
function to get the utility values for the terminal states. In the below tree diagram, let's take
A is the initial state of the tree. Suppose maximizer takes first turn which has worst-case
initial value =- infinity, and minimizer will take next turn which has worst-case initial value =
+infinity.
30
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Step 2: Now, first we find the utilities value for the Maximizer, its initial value is -∞, so we
will compare each value in terminal state with initial value of Maximizer and determines the
higher nodes values. It will find the maximum among the all.
o For node D max(-1,- -∞) => max(-1,4)= 4
o For Node E max(2, -∞) => max(2, 6)= 6
o For Node F max(-3, -∞) => max(-3,-5) = -3
o For node G max(0, -∞) = max(0, 7) = 7
Step 3: In the next step, it's a turn for minimizer, so it will compare all nodes value with +∞,
and will find the 3rd layer node values.
o For node B= min(4,6) = 4
31
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Step 3: Now it's a turn for Maximizer, and it will again choose the maximum of all nodes
value and find the maximum value for the root node. In this game tree, there are only 4
layers, hence we reach immediately to the root node, but in real games, there will be more
than 4 layers.
o For node A max(4, -3)= 4
That was the complete workflow of the minimax two player game.
Properties of Mini-Max algorithm:
o Complete- Min-Max algorithm is Complete. It will definitely find a solution (if exist),
in the finite search tree.
o Optimal- Min-Max algorithm is optimal if both opponents are playing optimally.
32
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
o Time complexity- As it performs DFS for the game-tree, so the time complexity of
Min-Max algorithm is O(bm), where b is branching factor of the game-tree, and m is
the maximum depth of the tree.
o Space Complexity- Space complexity of Mini-max algorithm is also similar to DFS
which is O(bm).
Alpha-Beta Pruning
33
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
b. The Alpha-beta pruning to a standard minimax algorithm returns the same move as
the standard algorithm does, but it removes all the nodes which are not really
affecting the final decision but making algorithm slow. Hence by pruning these
nodes, it makes the algorithm fast.
Note: To better understand this topic, kindly study the minimax algorithm.
34
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Step 2: At Node D, the value of α will be calculated as its turn for Max. The value of α is
compared with firstly 2 and then 3, and the max (2, 3) = 3 will be the value of α at node D
and node value will also 3.
35
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Step 3: Now algorithm backtrack to node B, where the value of β will change as this is a turn
of Min, Now β= +∞, will compare with the available subsequent nodes value, i.e. min (∞, 3)
= 3, hence at node B now α= -∞, and β= 3.
In the next step, algorithm traverse the next successor of Node B which is node E, and the
values of α= -∞, and β= 3 will also be passed.
Step 4: At node E, Max will take its turn, and the value of alpha will change. The current
value of alpha will be compared with 5, so max (-∞, 5) = 5, hence at node E α= 5 and β= 3,
where α>=β, so the right successor of E will be pruned, and algorithm will not traverse it,
and the value at node E will be 5.
36
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Step 5: At next step, algorithm again backtrack the tree, from node B to node A. At node A,
the value of alpha will be changed the maximum available value is 3 as max (-∞, 3)= 3, and
β= +∞, these two values now passes to right successor of A which is Node C.
At node C, α=3 and β= +∞, and the same values will be passed on to node F.
Step 6: At node F, again the value of α will be compared with left child which is 0, and
max(3,0)= 3, and then compared with right child which is 1, and max(3,1)= 3 still α remains
3, but the node value of F will become 1.
Step 7: Node F returns the node value 1 to node C, at C α= 3 and β= +∞, here the value of
beta will be changed, it will compare with 1 so min (∞, 1) = 1. Now at C, α=3 and β= 1, and
again it satisfies the condition α>=β, so the next child of C which is G will be pruned, and the
algorithm will not compute the entire sub-tree G.
37
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Step 8: C now returns the value of 1 to A here the best value for A is max (3, 1) = 3.
Following is the final game tree which is the showing the nodes which are computed and
nodes which has never computed. Hence the optimal value for the maximizer is 3 for this
example.
38
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
o Worst ordering: In some cases, alpha-beta pruning algorithm does not prune any of
the leaves of the tree, and works exactly as minimax algorithm. In this case, it also
consumes more time because of alpha-beta factors, such a move of pruning is called
worst ordering. In this case, the best move occurs on the right side of the tree. The
time complexity for such an order is O(bm).
o Ideal ordering: The ideal ordering for alpha-beta pruning occurs when lots of
pruning happens in the tree, and best moves occur at the left side of the tree. We
apply DFS hence it first search left of the tree and go deep twice as minimax
algorithm in the same amount of time. Complexity in ideal ordering is O(b m/2).
39
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
21 What is frame? What are the reasoning actions that can be performed using frames?
22 Explain constraint propagation.
UNIT – 2 (NOTES)
40
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
What to Represent:
Following are the kind of knowledge which needs to be represented in AI systems:
o Object: All the facts about objects in our world domain. E.g., Guitars contains strings,
trumpets are brass instruments.
o Events: Events are the actions which occur in our world.
o Performance: It describe behavior which involves knowledge about how to do
things.
o Meta-knowledge: It is knowledge about what we know.
o Facts: Facts are the truths about the real world and what we represent.
o Knowledge-Base: The central component of the knowledge-based agents is the
knowledge base. It is represented as KB. The Knowledgebase is a group of the
Sentences (Here, sentences are used as a technical term and not identical with the
English language).
Knowledge: Knowledge is awareness or familiarity gained by experiences of facts, data, and
situations. Following are the types of knowledge in artificial intelligence:
Types of knowledge
Following are the various types of knowledge:
1. Declarative Knowledge:
o Declarative knowledge is to know about something.
o It includes concepts, facts, and objects.
o It is also called descriptive knowledge and expressed in declarative sentences.
41
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
The main objective of knowledge representation is to draw the conclusions from the
knowledge, but there are many issues associated with the use of knowledge representation
techniques.
42
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
1. Important attributes
There are two attributes shown in the diagram, instance and isa. Since these attributes
support property of inheritance, they are of prime importance.
43
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Such a representation can make it easy to answer questions such as: Who spotted Alex?
Hence, the user can add other facts, such as "Spotted (x, y) → saw (x, y)"
44
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Predicate
A predicate is an expression of one or more variables defined on some specific domain. A
predicate with variables can be made a proposition by either assigning a value to the
variable or by quantifying the variable. Consider the following statement.
Ram is a student.
Statement Function
Earlier we denoted "Ram" as x and "is a student" as predicate P then we have statement as
P(x). Here P(x) is a statement function where if we replace x with a Subject say Sunil then
we'll be having a statement "Sunil is a student." Thus a statement function is an expression
having Predicate Symbol and one or multiple variables. This statement function gives a
statement when we replaced the variables with objects. This replacement is called
substitution instance of statement function.
Quantifiers
The variable of predicates is quantified by quantifiers. There are two types of quantifier in
predicate logic − Universal Quantifier and Existential Quantifier.
Universal Quantifier
Universal quantifier states that the statements within its scope are true for every value of
the specific variable. It is denoted by the symbol ∀.
∀ x P(x) is read as for every value of x, P(x) is true.
45
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Example − "Man is mortal" can be transformed into the propositional form ∀ x P(x) where
P(x) is the predicate which denotes x is mortal and ∀ x represents all men.
Existential Quantifier
Existential quantifier states that the statements within its scope are true for some values of
the specific variable. It is denoted by the symbol ∃.
∃ x P(x) is read as for some values of x, P(x) is true.
Example − "Some people are dishonest" can be transformed into the propositional form ∃ x
P(x) where P(x) is the predicate which denotes x is dishonest and ∃ x represents some
dishonest men.
Predicate Formulas
Consider a Predicate P with n variables as P(x 1, x2, x3, ..., xn). Here P is n-place predicate and
x1, x2, x3, ..., xn are n individuals variables. This n-place predicate is known as atomic formula
of predicate calculus. For Example: P(), Q(x, y), R(x,y,z)
Well Formed Formula
Well Formed Formula (wff) is a predicate holding any of the following −
All propositional constants and propositional variables are wffs
If x is a variable and Y is a wff, ∀ x Y and ∀ x Y are also wff
Truth value and false values are wffs
Each atomic formula is a wff
All connectives connecting wffs are wffs
46
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Universe of Discourse
We can limit the class of individuals/objects used in a statment. Here limiting means
confining the input variable to a set of particular individuals/objects. Such a restricted class
is termed as Universe of Discourse/domain of individual or universe. See the example
below:
Some cats are black.
C(x) : x is a cat.
B(x) : x is black.
(∃ x)(C(x) ∧ B(x))
If Universe of discourse is E = { Katy, Mille } where katy and Mille are white cats then our
third statement is false when we replace x with either Katy or Mille where as if Universe of
discourse is E = { Jene, Jackie } where Jene and Jackie black cats then our third statement
stands true for Universe of Discourse F.
Techniques of knowledge representation
There are mainly four ways of knowledge representation which are given as follows:
1. Logical Representation
2. Semantic Network Representation
3. Frame Representation
4. Production Rules
1. Logical Representation
47
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Logical representation is a language with some concrete rules which deals with propositions
and has no ambiguity in representation. Logical representation means drawing a conclusion
based on various conditions. This representation lays down some important communication
rules. It consists of precisely defined syntax and semantics which supports the sound
inference. Each sentence can be translated into logics using syntax and semantics.
Syntax:
o Syntaxes are the rules which decide how we can construct legal sentences in the
logic.
o It determines which symbol we can use in knowledge representation.
o How to write those symbols.
Semantics:
o Semantics are the rules by which we can interpret the sentence in the logic.
o Semantic also involves assigning a meaning to each sentence.
Logical representation can be categorised into mainly two logics:
a. Propositional Logics
b. Predicate logics
Note: We will discuss Prepositional Logics and Predicate logics in later chapters.
1. Logical representations have some restrictions and are challenging to work with.
2. Logical representation technique may not be very natural, and inference may not be
so efficient.
48
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Note: Do not be confused with logical representation and logical reasoning as logical
representation is a representation language and reasoning is a process of thinking logically.
a. Jerry is a cat.
b. Jerry is a mammal
c. Jerry is owned by Priya.
d. Jerry is brown colored.
e. All Mammals are animal.
49
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
In the above diagram, we have represented the different type of knowledge in the form of
nodes and arcs. Each object is connected with another object by some relation.
3. Frame Representation
A frame is a record like structure which consists of a collection of attributes and its values to
describe an entity in the world. Frames are the AI data structure which divides knowledge
into substructures by representing stereotypes situations. It consists of a collection of slots
and slot values. These slots may be of any type and sizes. Slots have names and values which
are called facets.
Facets: The various aspects of a slot is known as Facets. Facets are features of frames which
enable us to put constraints on the frames. Example: IF-NEEDED facts are called when data
of any particular slot is needed. A frame may consist of any number of slots, and a slot may
50
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
include any number of facets and facets may have any number of values. A frame is also
known as slot-filter knowledge representation in artificial intelligence.
Frames are derived from semantic networks and later evolved into our modern-day classes
and objects. A single frame is not much useful. Frames system consist of a collection of
frames which are connected. In the frame, knowledge about an object or event can be
stored together in the knowledge base. The frame is a type of technology which is widely
used in various applications including Natural language processing and machine visions.
Example: 1
Slots Filters
Year 1996
Page 1152
Example 2:
Let's suppose we are taking an entity, Peter. Peter is an engineer as a profession, and his age
is 25, he lives in city London, and the country is England. So following is the frame
representation for this:
Slots Filter
Name Peter
Profession Doctor
Age 25
Weight 78
51
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
1. The frame knowledge representation makes the programming easier by grouping the
related data.
2. The frame representation is comparably flexible and used by many applications in AI.
3. It is very easy to add slots for new attribute and relations.
4. It is easy to include default data and to search for missing values.
5. Frame representation is easy to understand and visualize.
4. Production Rules
Production rules system consist of (condition, action) pairs which mean, "If condition then
action". It has mainly three parts:
o The set of production rules
o Working Memory
o The recognize-act-cycle
In production rules agent checks for the condition and if the condition exists then
production rule fires and corresponding action is carried out. The condition part of the rule
determines which rule may be applied to a problem. And the action part carries out the
associated problem-solving steps. This complete process is called a recognize-act cycle. The
working memory contains the description of the current state of problems-solving and rule
can write knowledge to the working memory. This knowledge match and may fire other
rules. If there is a new situation (state) generates, then multiple production rules will be
fired together, this is called conflict set. In this situation, the agent needs to select a rule
from these sets, and it is called a conflict resolution.
52
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Example:
o IF (at bus stop AND bus arrives) THEN action (get into the bus)
o IF (on the bus AND paid AND empty seat) THEN action (sit down).
o IF (on bus AND unpaid) THEN action (pay charges).
o IF (bus arrives at destination) THEN action (get down from the bus).
1. Production rule system does not exhibit any learning capabilities, as it does not store
the result of the problem for the future uses.
2. During the execution of the program, many rules may be active hence rule-based
production systems are inefficient.
CONSTRAINT PROPAGATION
Definition
A constraint satisfaction problem (CSP) is a problem that requires its solution within some
limitations/conditions also known as constraints. It consists of the following:
A finite set of variables which stores the solution. (V = {V1, V2, V3,....., Vn} )
A set of discrete values known as domain from which the solution is picked. (D = {D1,
D2, D3,.....,Dn} )
A finite set of constraints. (C = {C1, C2, C3,......, Cn} )
Please note that the elements in the domain can be both continuous and discrete but in AI,
we generally only deal with discrete values. Also, note that all these sets should be finite
except for the domain set. Each variable in the variable set can have different domains. For
example, consider the Sudoku problem again. Suppose that a row, column and block already
53
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
have 3,5 and 7 filled in. Then the domain for all the variables in that row, column and block
will be {1,2,4,6,8,9}.
54
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
The classic methods of representing knowledge use either rules or logic. Table displays
the knowledge for the zoo animals problem in two formats–using rules on the left as
implemented within the Knowledge Representation NetLogo model, and using first order
logic on the right. Rules are often used in rule-based expert systems, and are either
specified explicitly by a knowledge engineer (usually through a process called
‘knowledge acquisition’ from a human expert), or they are derived from data using a
machine learning or data mining algorithm. Rules use a logic-based form for reasoning.
Logic is the use of symbolic and mathematical techniques for deductive reasoning, and
dates back as a discipline to Aristotle.
Knowledge Representation using rules in Artificial Intelligence
The rule-based method of knowledge representation uses IF-THEN rules (sometimes called
conditionaction rules) to specify the knowledge. All the rules for a particular problem form
the rules-base, and the knowledge-base comprises three components: the list of rules in the
rules-base; the list of known facts in the facts-base; and an inferencing system, which
processes the rules to derive new facts via some form of reasoning. A rule consists of an IF
part which is a set of conditions (called the antecedents) that must be met before the rule is
said to ‘fire’ so that the set of actions in the THEN part (called the consequents) are
executed. For example, for Rule R1 in Table, if the condition ‘animal has hair’ is met–that is,
there is a known fact in the knowledge base that the animal being classified has hair, then
the rule is fired, and the action is to add a further fact ‘species is mammal’ to the
knowledge-base. There may be multiple conditions in the IF part of the rule. For example,
Rule R4 has three conditions that must be met before it can be fired. b These conditions are
55
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
separated by the AND keyword, and therefore these conditions are called ‘conjunctions’. If
they were separated by the OR keyword, they would be called ‘disjunctions’. For the
Knowledge Representation NetLogo model, only rules with conjunctions have been
implemented. The set of rules and how they are defined for the zoo animals and New
Zealand birds problem is shown in NetLogo Code. The third set of rules for the Sailing boats
problem can be found by loading the model in NetLogo using the URL link below.
NetLogo Code How the rules are defined for the zoo animals and the New Zealand birds
problem in the Knowledge Representation model.
56
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
57
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
58
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
59
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
UNIT – 3 (NOTES)
Causes of uncertainty:
Following are some leading causes of uncertainty to occur in the real world.
1. Information occurred from unreliable sources.
2. Experimental Errors
3. Equipment fault
4. Temperature variation
5. Climate change.
Probabilistic reasoning:
Probabilistic reasoning is a way of knowledge representation where we apply the concept of
probability to indicate the uncertainty in knowledge. In probabilistic reasoning, we combine
probability theory with logic to handle the uncertainty. We use probability in probabilistic
reasoning because it provides a way to handle the uncertainty that is the result of
someone's laziness and ignorance. In the real world, there are lots of scenarios, where the
certainty of something is not confirmed, such as "It will rain today," "behavior of someone
for some situations," "A match between two teams or two players." These are probable
sentences for which we can assume that it will happen but not sure about it, so here we use
probabilistic reasoning.
Need of probabilistic reasoning in AI:
o When there are unpredictable outcomes.
60
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
We can find the probability of an uncertain event by using the below formula.
Conditional probability:
61
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
It can be explained by using the below Venn diagram, where B is occurred event, so sample
space will be reduced to set B, and now we can only calculate event A when event B is
already occurred by dividing the probability of P(A⋀B) by P( B ).
Example:
In a class, there are 70% of the students who like English and 40% of the students who likes
English and mathematics, and then what is the percent of students those who like English
also like mathematics?
Solution:
62
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Hence, 57% are the students who like English also like Mathematics.
The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is basic of
most modern AI systems for probabilistic inference. It shows the simple relationship
between joint and conditional probabilities. Here, P(A|B) is known as posterior, which we
63
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
need to calculate, and it will be read as Probability of hypothesis A when we have occurred
an evidence B.
P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we
calculate the probability of evidence. P(A) is called the prior probability, probability of
hypothesis before considering the evidence. P(B) is called marginal probability, pure
probability of an evidence.
In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes' rule can
be written as:
Where A1, A2, A3,........, An is a set of mutually exclusive and exhaustive events.
Example-1:
Question: what is the probability that a patient has diseases meningitis with a stiff neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it occurs
80% of the time. He is also aware of some more facts, which are given as follows:
o The Known probability that a patient has meningitis disease is 1/30,000.
o The Known probability that a patient has a stiff neck is 2%.
64
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Let a be the proposition that patient has stiff neck and b be the proposition that patient has
meningitis. , so we can calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02
Hence, we can assume that 1 patient out of 750 patients has meningitis disease with a stiff
neck.
Example-2:
Question: From a standard deck of playing cards, a single card is drawn. The probability that
the card is king is 4/52, then calculate posterior probability P(King|Face), which means the
drawn face card is a king card.
Solution:
65
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
66
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
For eg:-
If P = { a, b, c}, then Power set is given as
{o, {a}, {b}, {c}, {a, b}, {b, c}, {a, c}, {a, b, c}}= 23 elements.
Mass function m(K): It is an interpretation of m({K or B}) i.e; it means there is evidence for
{K or B} which cannot be divided among more specific beliefs for K and B.
Belief in K: The belief in element K of Power Set is the sum of masses of element which are
subsets of K. This can be explained through an example
Lets say K = {a, b, c}
Bel(K) = m(a) + m(b) + m(c) + m(a, b) + m(a, c) + m(b, c) + m(a, b, c)
Symbolic Reasoning
The basis for intelligent mathematical software is the integration of the "power of symbolic
mathematical tools" with the suitable "proof technology".
Mathematical reasoning enjoys a property called monotonic.
67
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
then it also follows from any larger set of premises, as long as the original premises are
included."
Human reasoning is not monotonic.
1. Non-Monotonic Reasoning
Non-Monotonic reasoning is a generic name to a class or a specific theory of reasoning.
Non-monotonic reasoning attempts to formalize reasoning with incomplete information by
classical logic systems.
■ Default reasoning
■ Ci umscription
■ Truth Maintenance Systems
Default Reasoning
This is a very common form of non-monotonic reasoning. The conclusions are drawn based
on what is most likely to be true. There are two approaches, both are logic type, to Default
reasoning:
One is Non-monotonic logic and the other is Default logic.
Non-monotonic logic
68
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
It has already been defined. It says, "the truth of a proposition may change when new
information (axioms) are added and a logic may be build to allows the statement to be
retracted."
Non-monotonic logic is predicate logic with one extension called modal operator M which
means “consistent with everything we know”. The purpose of M is to allow consistency.
A way to define consistency with PROLOG notation is : To show that fact P is true, we
attempt to prove ¬P.
If we fail we may say that P is consistent since ¬P is false.
Example:
∀ x : plays_instrument(x) ∧ M manage(x) → jazz_musician(x)
States that for all x, the x plays an instrument and if the fact that x can manage
is consistent with all other knowledge then we can conclude that x is a jazz musician.
■ Default Logic
69
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
‡ Default Theory
70
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Example:
A Default Rule says " Typically an American adult owns a car ".
The rule is only accessed if we wish to know whether or not John owns a car then an answer
cannot be deduced from our current beliefs. This default rule is applicable if we can prove
from our beliefs that John is an American and an adult, and believing that there is some car
that is owned by John does not lead to an inconsistency. If these two sets of premises are
satisfied, then the rule states that we can conclude that John owns a car.
Ci umscription
Observe that the rule ∀ x(Bird(x) & ¬ Abnormal(x) → Flies)) does not allow us to infer
that "Tweety flies", since we do not know that he is abnormal with respect to flying ability.
But if we add axioms which ci umscribe the abnormality predicate to which they are
currently known say "Bird Tweety" then the inference can be drawn. This inference is non-
monotonic.
71
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
The RS provides the RMS with information about each inference it performs, and in return
the RMS provides the RS with information about the whole set of inferences. Several
implementations of RMS have been proposed for non-monotonic reasoning. The important
ones are the:
72
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
■ Event:
One or more outcomes of a probability experiment.
■ Probability Experiment:
Process which leads to well-defined results call outcomes.
■ Sample Space:
Set of all possible outcomes of a probability experiment.
■ Independent Events:
Two events, E1 and E2, are independent if the fact that E1 occurs does not affect the
probability of E2 occurring.
■ Classical Probability:
Also called a priori theory of probability. The probability of event A = no of possible
outcomes f divided by the total no of possible outcomes n ; ie., P(A) = f / n.
73
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
■ Conditional Probability:
The probability of some event A, given the occurrence of some other event B. Conditional
probability is written P(A|B), and read as "the probability of A, given B ".
■ Joint probability:
The probability of two events in conjunction. It is the probability of both events together.
The joint probability of A and B is written P(A ∩; also written as P(A, B).
Marginal Probability:
The probability of one event, regardless of the other event. The marginal probability of A is
written P(A), and the marginal probability of B is written P(B).
Examples
Sample Space - Rolling two dice
Classical Probability
Table below illustrates frequency and distribution for the above sums.
74
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
The classical probability is the relative frequency of each event. Classical probability P(E) =
n(E) / n(S); P(6) = 5 / 36, P(8) = 5 / 36
Empirical Probability
The empirical probability of an event is the relative frequency of a frequency distribution
based upon observation P(E) = f / n
CERTAINLY YES
POSSIBLY YES
CANNOT SAY
POSSIBLY NO
CERTAINLY NO
The fuzzy logic works on the levels of possibilities of input to achieve the definite output.
Implementation
It can be implemented in systems with various sizes and capabilities ranging from
small micro-controllers to large, networked, workstation-based control systems.
It can be implemented in hardware, software, or a combination of both.
75
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
LP x is Large Positive
MP x is Medium Positive
S x is Small
MN x is Medium Negative
LN x is Large Negative
Knowledge Base − It stores IF-THEN rules provided by experts.
Inference Engine − It simulates the human reasoning process by making fuzzy
inference on the inputs and IF-THEN rules.
Defuzzification Module − It transforms the fuzzy set obtained by the inference
engine into a crisp value.
76
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Membership Function
Membership functions allow you to quantify linguistic term and represent a fuzzy set
graphically. A membership function for a fuzzy set A on the universe of discourse X is
defined as μA:X → [0,1]. Here, each element of X is mapped to a value between 0 and 1. It is
called membership value or degree of membership. It quantifies the degree of membership
of the element in X to the fuzzy set A.
x axis represents the universe of discourse.
y axis represents the degrees of membership in the [0, 1] interval.
There can be multiple membership functions applicable to fuzzify a numerical value. Simple
membership functions are used as use of complex functions does not add more precision in
the output. All membership functions for LP, MP, S, MN, and LN are shown as below −
The triangular membership function shapes are most common among various other
membership function shapes such as trapezoidal, singleton, and Gaussian. Here, the input
to 5-level fuzzifier varies from -10 volts to +10 volts. Hence the corresponding output also
changes.
Non-monotonic Reasoning
77
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
can be invalidated by adding more knowledge into our knowledge base. Non-monotonic
reasoning deals with incomplete and uncertain models. "Human perceptions for various
things in daily life, "is a general example of non-monotonic reasoning. Example: Let suppose
the knowledge base contains the following knowledge:
o Birds can fly
o Penguins cannot fly
o Pitty is a bird
So from the above sentences, we can conclude that Pitty can fly. However, if we add one
another sentence into knowledge base "Pitty is a penguin", which concludes "Pitty cannot
fly", so it invalidates the above conclusion.
78
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
UNIT – 4 (NOTES)
NATURAL LANGUAGE PROCESSING
Natural Language Processing (NLP) refers to AI method of communicating with an
intelligent systems using a natural language such as English. Processing of Natural Language
is required when you want an intelligent system like robot to perform as per your
instructions, when you want to hear decision from a dialogue based clinical expert system,
etc. The field of NLP involves making computers to perform useful tasks with the natural
languages humans use. The input and output of an NLP system can be;
Speech
Written Text
Components of NLP
There are two components of NLP as given in Natural Language Understanding (NLU)
involves the following tasks are;
Mapping the given input in natural language into useful representations.
Analyzing different aspects of the language.
79
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
The NLU is harder than NLG. Difficulties in NLU is that NL has an extremely rich form and
structure. It is very ambiguous. There can be different levels of ambiguity −
Lexical ambiguity − It is at very primitive level such as word-level.
For example, treating the word “board” as noun or verb?
Syntax Level ambiguity − A sentence can be parsed in different ways.
For example, “He lifted the beetle with red cap.” − Did he use cap to lift the beetle
or he lifted a beetle that had red cap?
Referential ambiguity − Referring to something using pronouns. For example, Rima
went to Gauri. She said, “I am tired.” − Exactly who is tired?
One input can mean different meanings.
Many inputs can mean the same thing.
NLP Terminology
Phonology − It is study of organizing sound systematically.
Morphology − It is a study of construction of words from primitive meaningful units.
Morpheme − It is primitive unit of meaning in a language.
Syntax − It refers to arranging words to make a sentence. It also involves
determining the structural role of words in the sentence and in phrases.
Semantics − It is concerned with the meaning of words and how to combine words
into meaningful phrases and sentences.
Pragmatics − It deals with using and understanding sentences in different situations
and how the interpretation of the sentence is affected.
Discourse − It deals with how the immediately preceding sentence can affect the
interpretation of the next sentence.
World Knowledge − It includes the general knowledge about the world.
Steps in NLP
There are general five steps −
80
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Semantic Analysis − It draws the exact meaning or the dictionary meaning from the
text. The text is checked for meaningfulness. It is done by mapping syntactic
structures and objects in the task domain. The semantic analyzer disregards
sentence such as “hot ice-cream”.
Discourse Integration − The meaning of any sentence depends upon the meaning of
the sentence just before it. In addition, it also brings about the meaning of
immediately succeeding sentence.
Pragmatic Analysis − During this, what was said is re-interpreted on what it actually
meant. It involves deriving those aspects of language which require real world
knowledge.
81
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Context-Free Grammar
Top-Down Parser
Context-Free Grammar
It is the grammar that consists rules with a single symbol on the left-hand side of the
rewrite rules. Let us create grammar to parse a sentence −
“The bird pecks the grains”
The parse tree breaks down the sentence into structured parts so that the computer can
easily understand and process it. In order for the parsing algorithm to construct this parse
tree, a set of rewrite rules, which describe what tree structures are legal, need to be
constructed.
These rules say that a certain symbol may be expanded in the tree by a sequence of other
symbols. According to first order logic rule, if there are two strings Noun Phrase (NP) and
Verb Phrase (VP), then the string combined by NP followed by VP is a sentence. The rewrite
rules for the sentence are as follows –
S → NP VP
NP → DET N | DET ADJ N
VP → V NP
Lexocon −
DET → a | the
ADJ → beautiful | perching
82
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Now, consider the above rewrite rules. Since V can be replaced by both, "peck" or "pecks",
sentences such as "The bird peck the grains" can be wrongly permitted. i. e. the subject-
verb agreement error is approved as correct.
Merit − The simplest style of grammar, therefore widely used one.
Demerits −
They are not highly precise. For example, “The grains peck the bird”, is a
syntactically correct according to parser, but even if it makes no sense, parser takes
it as a correct sentence.
To bring out high precision, multiple sets of grammar need to be prepared. It may
require a completely different sets of rules for parsing singular and plural variations,
passive sentences, etc., which can lead to creation of huge set of rules that are
unmanageable.
Top-Down Parser
Here, the parser starts with the S symbol and attempts to rewrite it into a sequence
of terminal symbols that matches the classes of the words in the input sentence until it
consists entirely of terminal symbols. These are then checked with the input sentence to
see if it matched. If not, the process is started over again with a different set of rules. This is
repeated until a specific rule is found which describes the structure of the sentence.
Merit − It is simple to implement.
83
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Demerits −
It is inefficient, as the search process has to be repeated if an error occurs.
Slow speed of working.
84
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Knowledge Base
It contains domain-specific and high-quality knowledge. Knowledge is required to exhibit
intelligence. The success of any ES majorly depends upon the collection of highly accurate
and precise knowledge.
What is Knowledge?
The data is collection of facts. The information is organized as data and facts about the task
domain. Data, information, and past experience combined together are termed as
knowledge.
Components of Knowledge Base
The knowledge base of an ES is a store of both, factual and heuristic knowledge.
Factual Knowledge − It is the information widely accepted by the Knowledge
Engineers and scholars in the task domain.
Heuristic Knowledge − It is about practice, accurate judgement, one’s ability of
evaluation, and guessing.
Knowledge representation
85
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
It is the method used to organize and formalize the knowledge in the knowledge base. It is
in the form of IF-THEN-ELSE rules.
Knowledge Acquisition
The success of any expert system majorly depends on the quality, completeness, and
accuracy of the information stored in the knowledge base. The knowledge base is formed
by readings from various experts, scholars, and the Knowledge Engineers. The knowledge
engineer is a person with the qualities of empathy, quick learning, and case analyzing skills.
He acquires information from subject expert by recording, interviewing, and observing him
at work, etc. He then categorizes and organizes the information in a meaningful way, in the
form of IF-THEN-ELSE rules, to be used by interference machine. The knowledge engineer
also monitors the development of the ES.
Inference Engine
Use of efficient procedures and rules by the Inference Engine is essential in deducting a
correct, flawless solution. In case of knowledge-based ES, the Inference Engine acquires
and manipulates the knowledge from the knowledge base to arrive at a particular solution.
In case of rule based ES, it −
Applies rules repeatedly to the facts, which are obtained from earlier rule
application.
Adds new knowledge into the knowledge base if required.
Resolves rules conflict when multiple rules are applicable to a particular case.
To recommend a solution, the Inference Engine uses the following strategies −
Forward Chaining
Backward Chaining
Forward Chaining
The Inference Engine follows the chain of conditions and derivations and finally deduces
the outcome. It considers all the facts and rules, and sorts them before concluding to a
solution. This strategy is followed for working on conclusion, result, or effect. For example,
prediction of share market status as an effect of changes in interest rates.
86
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Backward Chaining
On the basis of what has already happened, the Inference Engine tries to find out which
conditions could have happened in the past for this result. This strategy is followed for
finding out cause or reason. For example, diagnosis of blood cancer in humans.
User Interface
User interface provides interaction between user of the ES and the ES itself. It is generally
Natural Language Processing so as to be used by the user who is well-versed in the task
domain. The user of the ES need not be necessarily an expert in Artificial Intelligence.
It explains how the ES has arrived at a particular recommendation. The explanation may
appear in the following forms −
Natural language displayed on screen.
Verbal narrations in natural language.
Listing of rule numbers displayed on the screen.
The user interface makes it easy to trace the credibility of the deductions.
87
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Its technology should be adaptable to user’s requirements; not the other way
round.
It should make efficient use of user input.
Application Description
88
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
There are several levels of ES technologies available. Expert systems technologies include −
Expert System Development Environment − The ES development environment
includes hardware and tools. They are;
o Workstations, minicomputers, mainframes.
o High level Symbolic Programming Languages such as LISt Programming (LISP)
and PROgrammation en LOGique (PROLOG).
o Large databases.
Tools − They reduce the effort and cost involved in developing an expert system to
large extent.
o Powerful editors and debugging tools with multi-windows.
o They provide rapid prototyping
o Have Inbuilt definitions of model, knowledge representation, and inference
design.
Shells − A shell is nothing but an expert system without knowledge base. A shell
provides the developers with knowledge acquisition, inference engine, user
interface, and explanation facility. For example, few shells are given below −
o Java Expert System Shell (JESS) that provides fully developed Java API for
creating an expert system.
o Vidwan, a shell developed at the National Centre for Software Technology,
Mumbai in 1993. It enables knowledge encoding in the form of IF-THEN
rules.
89
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Realize how the concepts can represent the domain knowledge best.
Develop the Prototype
From Knowledge Base: The knowledge engineer works to −
Acquire domain knowledge from the expert.
Represent it in the form of If-THEN-ELSE rules.
Test and Refine the Prototype
The knowledge engineer uses sample cases to test the prototype for any
deficiencies in performance.
End users test the prototypes of the ES.
Develop and Complete the ES
Test and ensure the interaction of the ES with all elements of its environment,
including end users, databases, and other information systems.
Document the ES project well.
Train the user to use ES.
Objective
90
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Robots are aimed at manipulating the objects by perceiving, picking, moving, modifying the
physical properties of object, destroying it, or to have an effect thereby freeing manpower
from doing repetitive functions without getting bored, distracted, or exhausted.
What is Robotics?
Robotics is a branch of AI, which is composed of Electrical Engineering, Mechanical
Engineering, and Computer Science for designing, construction, and application of robots.
Aspects of Robotics
The robots have mechanical construction, form, or shape designed to accomplish a
particular task.
They have electrical components which power and control the machinery.
They contain some level of computer program that determines what, when and
how a robot does something.
Difference in Robot System and Other AI Program
Here is the difference between the two −
AI Programs Robots
They need general purpose computers They need special hardware with sensors and
to operate on. effectors.
Robot Locomotion
Locomotion is the mechanism that makes a robot capable of moving in its environment.
There are various types of locomotion −
Legged
Wheeled
Combination of Legged and Wheeled Locomotion
91
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Tracked slip/skid
Legged Locomotion
This type of locomotion consumes more power while demonstrating walk, jump,
trot, hop, climb up or down, etc.
It requires more number of motors to accomplish a movement. It is suited for rough
as well as smooth terrain where irregular or too smooth surface makes it consume
more power for a wheeled locomotion. It is little difficult to implement because of
stability issues.
It comes with the variety of one, two, four, and six legs. If a robot has multiple legs
then leg coordination is necessary for locomotion.
The total number of possible gaits (a periodic sequence of lift and release events for each
of the total legs) a robot can travel depends upon the number of its legs.
If a robot has k legs, then the number of possible events N = (2k-1)!.
In case of a two-legged robot (k=2), the number of possible events is N = (2k-1)! = (2*2-1)!
= 3! = 6.
Hence there are six possible different events −
Lifting the Left leg
Releasing the Left leg
Lifting the Right leg
Releasing the Right leg
Lifting both the legs together
Releasing both the legs together
In case of k=6 legs, there are 39916800 possible events. Hence the complexity of robots is
directly proportional to the number of legs.
92
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Wheeled Locomotion
It requires fewer number of motors to accomplish a movement. It is little easy to
implement as there are less stability issues in case of more number of wheels. It is power
efficient as compared to legged locomotion.
Standard wheel − Rotates around the wheel axle and around the contact
Castor wheel − Rotates around the wheel axle and the offset steering joint.
Swedish 45o and Swedish 90o wheels − Omni-wheel, rotates around the contact
point, around the wheel axle, and around the rollers.
Ball or spherical wheel − Omnidirectional wheel, technically difficult to implement.
Slip/Skid Locomotion
In this type, the vehicles use tracks as in a tank. The robot is steered by moving the tracks
with different speeds in the same or opposite direction. It offers stability because of large
contact area of track and ground.
Components of a Robot
Robots are constructed with the following are;
93
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Power Supply − The robots are powered by batteries, solar power, hydraulic, or
pneumatic power sources.
Actuators − They convert energy into movement.
Electric motors (AC/DC) − They are required for rotational movement.
Pneumatic Air Muscles − They contract almost 40% when air is sucked in them.
Muscle Wires − They contract by 5% when electric current is passed through them.
Piezo Motors and Ultrasonic Motors − Best for industrial robots.
Sensors − They provide knowledge of real time information on the task
environment. Robots are equipped with vision sensors to be to compute the depth
in the environment. A tactile sensor imitates the mechanical properties of touch
receptors of human fingertips.
Computer Vision
This is a technology of AI with which the robots can see. The computer vision plays vital
role in the domains of safety, security, health, access, and entertainment. Computer vision
automatically extracts, analyzes, and comprehends useful information from a single image
or an array of images. This process involves development of algorithms to accomplish
automatic visual comprehension.
94
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Face Detection − Many state-of-the-art cameras come with this feature, which
enables to read the face and take the picture of that perfect expression. It is used to
let a user access the software on correct match.
Object Recognition − They are installed in supermarkets, cameras, high-end cars
such as BMW, GM, and Volvo.
Estimating Position − It is estimating position of an object with respect to camera as
in position of tumor in human’s body.
Application Domains of Computer Vision
Agriculture
Autonomous vehicles
Biometrics
Character recognition
Forensics, security, and surveillance
Industrial quality inspection
Face recognition
Gesture analysis
Geoscience
Medical imagery
Pollution monitoring
Process control
Remote sensing
Robotics
Transport
Applications of Robotics
The robotics has been instrumental in the various domains such as −
Industries − Robots are used for handling material, cutting, welding, color coating,
drilling, polishing, etc.
Military − Autonomous robots can reach inaccessible and hazardous zones during
war. A robot named Daksh, developed by Defense Research and Development
Organization (DRDO), is in function to destroy life-threatening objects safely.
95
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Medicine − The robots are capable of carrying out hundreds of clinical tests
simultaneously, rehabilitating permanently disabled people, and performing
complex surgeries such as brain tumors.
Exploration − The robot rock climbers used for space exploration, underwater
drones used for ocean exploration are to name a few.
Entertainment − Disney’s engineers have created hundreds of robots for movie
making.
Each rule represents a small chunk of knowledge to the given domain of expertise. When
the known facts support the conditions in the rule’s left side, the conclusion or action part of
the rule is then accepted as known. The rule based architecture of an expert system consists
of the domain expert, knowledge engineer, inference engine, working memory, knowledge
base, external interfaces, user interface, explanation module, database spreadsheets
executable programs s mentioned in figure.
96
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
1. User Interface: It is the mechanism by which the user and the expert system
communicate with each other i.e. the use interacts with the system through a user
interface. It acts as a bridge between user and expert system. This module accepts the user
queries and submits those to the expert system. The user normally consults the expert
system for following reasons.
a) To get answer of his/her queries.
b) To get explanation about the solution for psychological satisfaction.
The user interface module is designed in such a way that at user level it accepts the
query in a language understandable by expert system. To make the expert system user
friendly, the user interface interacts with the user in natural language. The user interface
provides as much facilities as possible such as menus, graphical interfaces etc. to make
the dialog user friendly and more attractive.
97
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
2. Explanation Module: The explanation module explains the reasoning of the system to
a user. It provides the user with an explanation of the reasoning process when
requested. The credibility of expert system will be established only when it is able to
explain “how and why” a particular conclusion is drawn. This explanation increases the
belief of user in the expert system.
a) Explanation (How): To respond to a how query, the explanation module traces the
chain of rules fired during a consolation with the user. This explanation mode can be
activated once the process is over. It explains how a fact was deduced by the system
and similarly how a rule was/wasn’t used. The simplest way to specify this is to
explain the rule which allows the deduction. For e.g.
If the system (S) will give information about the parent-child relationship to the user
(U) then the followings can be possible.
S: My diagnosis is “A is the father of B”
U: How?
S: The result was obtained by the addition of following facts and rules.
Fact no 11: A is the parent of Hari.
Fact no 15: A is a male.
Fact no 110: X is father of Y:
X is parent of Y, X is male.
So, A is the father of B.
A is the father of B.
98
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
U: Why?
S: I need the fact:
A is the father of B to establish the following fact “B is the son of A”.
By using the rule no. 4:
A is the father of B:
B is the son of A.
4. Knowledge Base: In rule based architecture of an expert system, the knowledge base is
the set of production rules. The expertise concerning the problem area is represented by
productions. In rule based architecture, the condition actions pairs are represented as
rules, with the premises of the rules (if part) corresponding to the condition and the
conclusion (then part) corresponding to the action. Case-specific data are kept in the
working memory. The core part of an expert system is the knowledge base and for this
reason an expert system is also called a knowledge based system. Expert system
knowledge is usually structured in the form of a tree that consists of a root frame and a
number of sub frames. A simple knowledge base can have only one frame, i.e. the root
frame whereas a large and complex knowledge base may be structured on the basis of
multiple frames.
99
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
5. Inference Engine: The inference engine accepts user input queries and responses to
questions through the I/O interface. It uses the dynamic information together with the static
knowledge stored in the knowledge base. The knowledge in the knowledge base is used to
derive conclusions about the current case as presented by the user’s input. Inference engine
is the module which finds an answer from the knowledge base. It applies the knowledge to
find the solution of the problem. In general, inference engine makes inferences by deciding
which rules are satisfied by facts, decides the priorities of the satisfied rules and executes
the rule with the highest priority. Generally inferring process is carried out recursively in 3
stages like match, select and execute. During the match stage, the contents of working
memory are compared to facts and rules contained in the knowledge base. When proper
and consistent matches are found, the corresponding rules are placed in a conflict set.
KNOWLEDGE ACQUISITION
Knowledge acquisition is the gathering or collecting knowledge from various sources. It is
the process of adding new knowledge to a knowledge base and refining or improving
knowledge that was previously acquired. Acquisition is the process of expanding the
capabilities of a system or improving its performance at some specified task. So it is the goal
oriented creation and refinement of knowledge. Acquired knowledge may consist of facts,
rules, concepts, procedures, heuristics, formulas, relationships, statistics or any other useful
information. Source of these knowledge may be experts in the domain of interest, text
books, technical papers, database reports, journals and the environments. The knowledge
acquisition is a continuous process and is spread over entire lifetime. Example of knowledge
acquisition is machine learning. It may be process of autonomous knowledge creation or
refinements through the use of computer programs. The newly acquired knowledge should
be integrated with existing knowledge in some meaningful way. The knowledge should be
accurate, non-redundant, consistent and fairly complete. Knowledge acquisition supports
the activities like entering the knowledge and maintaining knowledge base. The knowledge
acquisition process also sets dynamic data structures for existing knowledge to refine the
knowledge.
100
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
The role of knowledge engineer is also very important with respect to develop the
refinements of knowledge. Knowledge engineers may be the professionals who elicit
knowledge from experts. They integrate knowledge from various sources like creates and
edits code, operates the various interactive tools, build the knowledge base etc.
Many techniques have been developed to deduce knowledge from an expert. They are
termed as knowledge acquisition techniques. They are:
101
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
f) Sorting Techniques
In diagram based techniques the generation and use of concept maps, event diagrams and
process maps. This technique captures the features like “why, whe n, who, how and where”.
The matrix based techniques involve the construction of grids indicating such things as
problems encountered against possible solutions. Hierarchical techniques are used to build
hierarchical structures like trees. Protocol analysis technique is used to identify the type of
knowledge like goals, decisions, relationships etc. The protocol generation techniques
include various types of interviews like structured, semi-structured and unstructured.
The most common knowledge acquisition technique is face-to-face interview.
Interview is a very important technique which must be planned carefully. The results of an
interview must be verified and validated. Some common variations of an unstructured
interview are talk through, teach through and read through. The knowledge engineer slowly
learns about the problem. Then can build a representation of the knowledge. In
unstructured interviews, seldom provides complete or well-organized descriptions of
cognitive processes because the domains are generally complex. The experts usually find it
very difficult to express some more important knowledge. Data acquired are often
unrelated, exists at varying levels of complexity, and are difficult for the knowledge engineer
to review, interpret and integrate. But on the other hand structured interviews are
systematic goal oriented process. It forces an organized communication between the
knowledge engineer and the expert. In structured interview, inter personal communication
and analytical skills are important.
Unless you are living under a rock, you would have come across plethora of articles
convincing you that the AI revolution has come and it is here to stay. While we try to
understand some of the theory behind the claims made, there would be many more articles
that try to create panic among non-expert audience by conspiring doomsday theories. When
there is a lack of understanding of what AI cannot do, there would be fear of what AI can do.
I believe it is important that we understand the current state of the technology in the field of
102
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
While an agent is deployed in real world, it may be keen in exploring its environment but it
needs to follow certain constraints in order to obey the limitations of that environment. A
team from Berkeley AI Research (BAIR) has presented their work titled Constrained Policy
Optimization (CPO) which induces constraints motivated by safety for policy search. It has
many applications where safety can be ensured while exploration. Also, the BAIR has
published an article explaining their work on CPO. If an agent/robot is purchased by a non-
technical owner, she should be able to train the agent by providing feedback. McGlashan et.
al has proposed Convergent Actor-Critic by Humans (COACH), an algorithm to learn from
policy-dependent feedback, for training agents/robots with the feedback provided by non-
technical users. They demonstrate that COACH can also learn multiple behaviours on a
physical robot with noisy images as well.
In order to perform any human activity like cooking, household chores, etc., a RL
agent need to execute long sequence of instructions and generalize for new unseen subtasks.
Sometimes, there would be other unexpected instructions like low battery, etc., which needs
a deviation to be able to finish the rest of the subtasks. To achieve these goals, Oh et. al had
proposed a generalised approach which takes sequence of tasks in natural language and
executes the subtasks mostly sequential. They have tackled the problem in two steps: 1)
Learning the skills to perform sub tasks and an analogy based generalisation framework. 2) A
meta-controller to determine the order of execution of subtasks. Unlike the existing work,
their architecture generalises well and also handles unexpected subtasks. For completing a
multiple set of tasks, we need a policy that can understand the sub tasks and still finish the
103
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
tasks optimising for the overall reward. Often, the agent doesn’t receive an immediate
reward for completing the sub task. Andreas et. al proposed a framework for learning deep
sub policies in a multi task setting. The algorithm is guided only by abstract sketches of high-
level behaviour.
In order to regularise deep neural networks, several methods like batch normalisation,
whitening neural networks (WNN) are used. To apply whitening, the computational overhead
of building covariance matrix and solving SVD plays a bottleneck. The work proposed by Ping
Luo attempts to overcome the limitations of WNN with a new method termed Generalised
Whitening Neural Networks (GWNN) which reduces computational overhead with compact
representations. The limitations over hardware for implementing higher dimensional tensor
kernels for ConvNets is studied by Budden et. al. They had proposed a Winograd style faster
computation for higher dimensions optimised for CPUs. They have benchmarked their
algorithm against popular frameworks like Caffe, Tensorflow that support the AVX and Intel
MKL optimised libraries and concluded an interesting insight that the current CPU limitations
are largely due to software rather than hardware.
Extending the class of faster computations, like FFT, Winograd, Cho and
Brand suggested a Memory-efficient Computation (MEC) which lowers memory requirement
and improves the convolution process. MEC takes rolling subsets of columns and expands
them into rows to form a smaller matrix. This process is repeated along with Kernel matrix
multiplication to produce efficient computation. With the increase in the number of feature
maps the redundancy increases leading to inefficient memory usage. Wang et. al proposed a
method called RedCNN which attempts to reduce the dimensionality of feature maps by
preserving the intrinsic information and also reducing the correlation between feature maps.
They used circulant matrix for projection that gives high training speed and mapping speed.
Correlation between gradients decay slowly with depth in the network resulting in gradients
appearing as white noise. These shattering gradients are predominantly observed in feed
forward networks, however the skip-connection networks are resistant. The authors
proposed Looks Linear (LL) initialisation which resolves shattering gradients in feed forward
networks without adding any skip connections.
104
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
Identifying sleep patterns will help diagnose sleep disorders and thereby better healthcare
can be provided. However, the existing approaches for identifying sleep patterns involve
using a lot of sensors attached to patient’s body and it is usually conducted in a hospital or a
lab. The experimental setting itself would make the patient experience sleep difficulty
rendering the measurements unreliable. A team from MIT has conducted research on using
wireless radio frequency (RF) signals in identifying sleep patterns without having any sensors
on patient’s body. They used a CNN-RNN combination to identify patterns for sleep stage
prediction. However, the RF signals suffer from noise reflected from any nearby sources in
the environment. Hence, they added an adversarial training that would discard any
extraneous information specific to any individual but retain the useful information required
to predict the sleep stage. They had achieved significantly better results (~80%) than the
existing state-of-the-art (~64%) which used hand crafted signal features. The team from
Baidu has presented their work on Deep Voice, an end-to-end neural speech synthesis. They
had detailed the five major building blocks that includes phoneme conversion to audio
synthesis using a variant of Wavenet. As their entire architecture is powered by neural
network, their system is more flexible than existing text-to-speech systems.
4) Meta Learning
Model Agnostic Meta Learning (MAML) proposed by Finn et. al, creates a meta-learned
model with parameters learned from random sampling over a distribution of tasks. This
model can be quickly adapted to new tasks using a few training samples and iterations, which
is commonly referred as few-shot learning. The authors also demonstrated the application of
MAML over classification, regression and reinforcement learning tasks. An interesting paper
on learning the network structure and weights is proposed by Cortes et. al. The proposed
method, called AdaNet, learns the network architecture with incremental addition of depth
to the network . The new network’s k^th layer is connected to existing network’s k^th and k-
1^th layers. The network architecture is selected by comparing their performances over an
empirical loss function with regularisation parameter. Wichrowska et. al introduced a learned
105
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
gradient descent optimiser that can generalise to new tasks with reduced memory and
computational requirements. They used a hierarchical RNN architecture in defining the
optimiser and it outperformed RMSprop/Adam on MNIST dataset.
5) Sequential Modeling
Segmental structure is a natural pattern in many sequences like phrases in human language
or group of letters in identifying phonotactic rules. Wang et. al has proposed a sequence
modelling approach via segmentation. They had learned the segmental structure using LSTM
and the space of possible segments are searched by keeping a limit on search space and
further exploring the structure of segments. The popular implementation from Facebook AI
Research (FAIR) of using Convolutions for Sequence to Sequence learning has gathered much
attention at ICML’17. They created hierarchical structures using multi layer convolutions
thereby replicating the long-range dependencies captured in traditional LSTM based
architectures. They also used gated linear units, residual connections and attention in every
decoder layer.
The temporal evolution of word embedding were studied by Bamler et. al in their
paper titled “Dynamic Word Embeddings”. In their approach, they extended the skip-gram to
probabilistic dynamic skip grams to model sequential text data with latent time series. The
key contribution of their approach is to use Kalman filter as a prior for the latent
embeddings. This allowed them to share information across all times while the embeddings
were allowed to drift.
A team from Microsoft research India has come up with powerful tree based models that can
help run Machine Learning in resource constraint devices like IoT with as little as 2 KB RAM.
For classification problems, often Gradient Boosted Decision Trees (GBDT) performs relatively
well. However, when the output space of multilabel classification becomes high dimensional
and sparse, the GBDT algorithms suffers from memory issues and long running times. In
order to have better prediction time and reduced model size Si et. al proposed GBDT-Sparse
algorithm to handle the high dimensional sparse data.
106
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
The team from Google Brain has presented a paper on audio synthesis using Wavenet auto
encoders. Their main contribution being the Wavenet auto encoder architecture that
includes a Temporal Encoder built over dilated convolutions that encodes sequence of
hidden codes with separate dimension for time and channel. Also, they introduced NSynth
dataset that contains approx 300k annotated musical notes from approx 1k instruments.
Modeling sequential data like user browsing history has large action space with many actions
having similar intent or topic. Recurrent neural networks like LSTM would need many
parameters to model such data and makes the model highly uninterpretable. Whereas,
models like LDA would model such sequential data and are interpretable but the
performance is not better than LSTMs. To overcome such limitations, Zaheer et. al proposed
a Latent LSTM Allocation (LLA) for user modeling combining hierarchical bayesian models
with LSTMs.
The image compression algorithm proposed by Rippel and Bourdev used GANs
instead of auto-encoders. The proposed solution included pyramidal decomposition encoder
that extracts image features at different scales. The extracted features are decomposed into
equal-sized bins using quantisation, bitplane decomposition, arithmetic coding and code
length regularisation. This is followed by realistic reconstruction via adversarial training.
In order to overcome the limitations of discriminative models for natural language text
generation, Wen et. al proposed a Latent Intention Dialogue Model for learning the intention
using latent variable and then composing appropriate machine responses. The key idea
behind this paper is the representation of latent intention distribution as an intrinsic policy
that reflects human decision-making and it is learned using policy gradient-based
reinforcement learning. An alternative approach to the natural language generation using
latent semantic structure was proposed by Hu et. al. They used VAEs to generate text
samples conditioned on a latent attribute codes. The attribute codes are learned using
107
International Institute of Technology and Management, Murthal
CSE 6th – Artificial Intelligence (AI) – CSE 308-B
individual discriminator for each code that measures the match between generated samples
and the desired attributes using softmax approximation.
For the online multi-class bandit algorithms, the previous work of Banditron, while
computationally efficient, achieves only O(T^2/3) expected regret. This is suboptimal as the
Exp4 algorithm achieves O(T^1/2) regret for the 0–1 loss. Beygelzimer et. al had proposed an
efficient online bandit multi class learning with O(T^1/2) regret. The evaluation of contextual
multi-armed bandits is a tough problem as the online evaluation is too costly to evaluate
different policies whereas off policy evaluation methods suffer from variance in the
estimations. While there exists methods like Inverse Propensity Score (IPS) which gives good
estimations on the MSE, they don’t consider the context information while choosing actions.
The authors Wang et. al proposed an algorithm SWITCH which effectively uses the Reward
model and IPS resulting in variance reduction compared to prior work.
Many existing methods for generating knowledge graphs from data considers the graph to be
a static snapshot. In the work published by Trivedi et. al, they had demonstrated that the
knowledge graphs evolve temporally and they had developed a multidimensional point
process to model the evolving knowledge graph. Identifying the transition probabilities from
only the node visit counts can help in understanding the navigation behaviour of
users. Maystre et. al has proposed Choice Rank, an iterative algorithm that can learn edge
transition probabilities from observing only node-level traffic.
108