0% found this document useful (0 votes)
19 views

AI Important Questions

Uploaded by

Suman Chavan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

AI Important Questions

Uploaded by

Suman Chavan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

ARTIFICIAL INTELLIGENCE

IMPORTANT QUESTIONS AND ANSWERS

1. Explain with an example AO* Search.


Best-first search is what the AO* algorithm does. The AO* method divides any given
difficult problem into a smaller group of problems that are then resolved using the AND-
OR graph concept. AND OR graphs are specialized graphs that are used in problems that can
be divided into smaller problems. The AND side of the graph represents a set of tasks that
must be completed to achieve the main goal, while the OR side of the graph represents
different methods for accomplishing the same main goal.

AND-OR Graph

In the above figure, the buying of a car may be broken down into smaller problems or tasks
that can be accomplished to achieve the main goal in the above figure, which is an example
of a simple AND-OR graph. The other task is to either steal a car that will help us accomplish
the main goal or use your own money to purchase a car that will accomplish the main goal.
The AND symbol is used to indicate the AND part of the graphs, which refers to the need
that all subproblems containing the AND to be resolved before the preceding node or issue
may be finished.
The start state and the target state are already known in the knowledge-
based search strategy known as the AO* algorithm, and the best path is identified by
heuristics. The informed search technique considerably reduces the algorithm’s time
complexity. The AO* algorithm is far more effective in searching AND-OR trees than the
A* algorithm.

Working of AO* algorithm:


The evaluation function in AO* looks like this:
f(n) = g(n) + h(n)
f(n) = Actual cost + Estimated cost
here,
f(n) = The actual cost of traversal.
g(n) = the cost from the initial node to the current node.
h(n) = estimated cost from the current node to the goal state.

Difference between the A* Algorithm and AO* algorithm


• A* algorithm and AO* algorithm both works on the best first search.
• They are both informed search and works on given heuristics values.
• A* always gives the optimal solution but AO* doesn’t guarantee to give the
optimal solution.
• Once AO* got a solution doesn’t explore all possible paths but A* explores all
paths.
• When compared to the A* algorithm, the AO* algorithm uses less memory.
• opposite to the A* algorithm, the AO* algorithm cannot go into an endless loop.
Example:

AO* Algorithm – Question tree

Here in the above example below the Node which is given is the heuristic value i.e h(n).
Edge length is considered as 1.
Step 1

AO* Algorithm (Step-1)

With help of f(n) = g(n) + h(n) evaluation function,


Start from node A,
f(A⇢B) = g(B) + h(B)
=1 + 5 ……here g(n)=1 is taken by default for path cost
=6

f(A⇢C+D) = g(c) + h(c) + g(d) + h(d)


=1+2+1+4 ……here we have added C & D because they are in AND
=8
So, by calculation A⇢B path is chosen which is the minimum path, i.e f(A⇢B)
Step 2
AO* Algorithm (Step-2)

According to the answer of step 1, explore node B


Here the value of E & F are calculated as follows,

f(B⇢E) = g(e) + h(e)


f(B⇢E) = 1 + 7
=8

f(B⇢f) = g(f) + h(f)


f(B⇢f) = 1 + 9
= 10
So, by above calculation B⇢E path is chosen which is minimum path, i.e f(B⇢E)
because B's heuristic value is different from its actual value The heuristic is
updated and the minimum cost path is selected. The minimum value in our situation is 8.
Therefore, the heuristic for A must be updated due to the change in B's heuristic.
So we need to calculate it again.

f(A⇢B) = g(B) + updated h(B)


=1+8
=9
We have Updated all values in the above tree.

Step 3

AO* Algorithm (Step-3)

By comparing f(A⇢B) & f(A⇢C+D)


f(A⇢C+D) is shown to be smaller. i.e 8 < 9
Now explore f(A⇢C+D)
So, the current node is C

f(C⇢G) = g(g) + h(g)


f(C⇢G) = 1 + 3
=4

f(C⇢H+I) = g(h) + h(h) + g(i) + h(i)


f(C⇢H+I) = 1 + 0 + 1 + 0 ……here we have added H & I because they are in AND
=2

f(C⇢H+I) is selected as the path with the lowest cost and the heuristic is also left unchanged
because it matches the actual cost. Paths H & I are solved because the heuristic for those
paths is 0,
but Path A⇢D needs to be calculated because it has an AND.

f(D⇢J) = g(j) + h(j)


f(D⇢J) = 1 + 0
=1
the heuristic of node D needs to be updated to 1.

f(A⇢C+D) = g(c) + h(c) + g(d) + h(d)


=1+2+1+1
=5

as we can see that path f(A⇢C+D) is get solved and this tree has become a solved tree now.
In simple words, the main flow of this algorithm is that we have to find firstly level 1st
heuristic
value and then level 2nd and after that update the values with going upward means towards
the root node.
In the above tree diagram, we have updated all the values.

2. Explain various levels in knowledge base agents.


Humans claim that how intelligence is achieved- not by purely reflect mechanisms but by
process of reasoning that operate on internal representation of knowledge. In AI these
techniques for intelligence are present in Knowledge Based Agents.
Knowledge-Based System
• A knowledge-based system is a system that uses artificial intelligence techniques
to store and reason with knowledge. The knowledge is typically represented in the
form of rules or facts, which can be used to draw conclusions or make decisions.
• One of the key benefits of a knowledge-based system is that it can help to
automate decision-making processes. For example, a knowledge-based system
could be used to diagnose a medical condition, by reasoning over a set of rules
that describe the symptoms and possible causes of the condition.
• Another benefit of knowledge-based systems is that they can be used to explain
their decisions to humans. This can be useful, for example, in a customer service
setting, where a knowledge-based system can help a human agent understand why
a particular decision was made.
• Knowledge-based systems are a type of artificial intelligence and have been used
in a variety of applications including medical diagnosis, expert systems, and
decision support systems.
Knowledge-Based System in Artificial Intelligence
• An intelligent agent needs knowledge about the real world to make decisions and
reasoning to act efficiently.
• Knowledge-based agents are those agents who have the capability of maintaining
an internal state of knowledge, reason over that knowledge, update their
knowledge after observations and take action. These agents can represent the
world with some formal representation and act intelligently.
Why use a knowledge base?
• A knowledge base inference is required for updating knowledge for an agent to
learn with experiences and take action as per the knowledge.
• Inference means deriving new sentences from old. The inference-based system
allows us to add a new sentence to the knowledge base. A sentence is a
proposition about the world. The inference system applies logical rules to the KB
to deduce new information.
• The inference system generates new facts so that an agent can update the KB. An
inference system works mainly in two rules which are given:
• Forward chaining
• Backward chaining
Various levels of knowledge-based agents
A knowledge-based agent can be viewed at different levels which are given below:
1. Knowledge level
Knowledge level is the first level of knowledge-based agent, and in this level, we need to
specify what the agent knows, and what the agent goals are. With these specifications, we can
fix its behaviour. For example, suppose an automated taxi agent needs to go from a station A
to station B, and he knows the way from A to B, so this comes at the knowledge level.
2. Logical level
At this level, we understand that how the knowledge representation of knowledge is stored.
At this level, sentences are encoded into different logics. At the logical level, an encoding of
knowledge into logical sentences occurs. At the logical level we can expect to the automated
taxi agent to reach to the destination B.
3. Implementation level
This is the physical representation of logic and knowledge. At the implementation level agent
perform actions as per logical and knowledge level. At this level, an automated taxi agent
actually implement his knowledge and logic so that he can reach to the destination.
Knowledge-based agents have explicit representation of knowledge that can be reasoned.
They maintain internal state of knowledge, reason over it, update it and perform actions
accordingly. These agents act intelligently according to requirements.
Knowledge based agents give the current situation in the form of sentences. They have
complete knowledge of current situation of mini-world and its surroundings. These agents
manipulate knowledge to infer new things at “Knowledge level”.
knowledge-based system has following features
Knowledge base (KB): It is the key component of a knowledge-based agent. These deal with
real facts of world. It is a mixture of sentences which are explained in knowledge
representation language.
Inference Engine(IE): It is knowledge-based system engine used to infer new knowledge in
the system.
Actions performed by an agent
Inference System is used when we want to update some information (sentences) in
Knowledge-Based System and to know the already present information. This mechanism is
done by TELL and ASK operations. They include inference i.e. producing new sentences
from old. Inference must accept needs when one asks a question to KB and answer should
follow from what has been Told to KB. Agent also has a KB, which initially has some
background Knowledge. Whenever, agent program is called, it performs some actions.
Actions done by KB Agent:
1. It TELLS what it recognized from the environment and what it needs to know to
the knowledge base.
2. It ASKS what actions to do? and gets answers from the knowledge base.
3. It TELLS the which action is selected , then agent will execute that action.

Algorithm :
function KB_AGENT (percept) returns an action
KB : knowledge base
t : time ( counter initially 0)
TELL(KB, MAKE_PERCEPT_SENTENCE (percept,t) )
action = ASK(KB, MAKE_ACTION_QUERY (t) )
TELL(KB, MAKE_ACTION_SENTENCE (action,t) )
t=t+1
return action
If a percept is given, agent adds it to KB, then it will ask KB for the best action and then tells
KB that it has in fact taken that action.

Knowledge Based Agents

A Knowledge based system behavior can be designed in following approaches:-


Declarative Approach: In this beginning from an empty knowledge base, the agent can
TELL sentences one after another till the agent has knowledge of how to work with its
environment. This is known as the declarative approach. It stores required information in
empty knowledge-based system.
Procedural Approach: This converts required behaviours directly into program code in
empty knowledge-based system. It is a contrast approach when compared to Declarative
approach. In this by coding behaviours of system is designed.

3 Explain WUMPUS world problem step by step with example.

The Wumpus World in AI is a classic problem demonstrating various ideas such as search
algorithms, planning, and decision-making. The wumpus world in AI is a straightforward
environment in which an agent (a computer program or a robot) must traverse a grid world
filled with obstacles, hazards, and dangerous wumpus. Wumpus is a fictional character that
kills the player in the game. The agent must travel the globe for a safe route to the treasure
without falling into pits or being killed by the wumpus.

Introduction

The Wumpus World in AI is a classic problem based on reasoning with knowledge where
the scenario entails a world comprising a grid of chambers, each with pits, obstacles, and a
wumpus. The agent's mission is to locate the gold and escape the world without being killed
by wumpus or falling into a pit. The wumpus is a fierce creature that can detect the agent and
kill it if it is in the same area as it. As a result, the agent can only perform a few activities,
such as moving forward, turning, shooting an arrow, and grabbing the money.

The Wumpus World in AI is an important research problem because it offers a simple yet
challenging setting for testing and developing intelligent agents. The problem has
uncertainty, partial observability, and numerous objectives, making it a good test for different
AI techniques like search algorithms, reinforcement learning, and planning. Real-world
applications of the Wumpus World issue include designing intelligent agents for autonomous
vehicles, robotics, and game creation.

In the following parts, we will examine the game rules and the various AI methods used to
solve the Wumpus World problem. We will also discuss how the problem is pertinent in real-
world applications and the difficulties in designing intelligent agents to deal with the
Wumpus World.

What is Wumpus World in AI?

The Wumpus World in AI is a basic yet difficult AI environment that demonstrates search
algorithms, planning, and decision-making concepts. It is a simulated world comprising a
grid of rooms where an agent must negotiate obstacles, hazards, and a dangerous creature
known as the wumpus. The agent's main goal is to find a safe way to the treasure and escape
the world without falling into pits or being killed by the wumpus.

Properties of the Wumpus World

• Partially observable: The Wumpus world in AI is partially observable because the


agent can only sense the immediate surroundings, such as an adjacent room.
• Deterministic: It is deterministic because the result and end of the world are already
known.
• Sequential: It is sequential because the order is essential.
• Static: It is motionless because Wumpus and Pits are not moving.
• Discrete: The surroundings are distinct.
• One agent: The environment is a single agent because we only have one agent, and
Wumpus is not regarded as an agent.

PEAS Description of Wumpus World

To build an intelligent agent for the Wumpus World, we must first define the problem's
Performance, Environment, Actuators, and Sensors (PEAS).

1. Performance:
o +1000 bonus points if the agent returns from the tunnel with the gold.
o Being eaten by the wumpus or plummeting into the pit results in a -1000 point
penalty.
o Each move is worth -1, and using an arrow is worth -10.
o The game is over if either agent dies or exits the tunnel.
2. Environment:
o A four-by-four grid of chambers.
o The operative begins in room square [1, 1], facing the right.
o Wumpus and gold locations are selected randomly except for the first
square [1,1].
o Except for the first square, each square in the tunnel has a 0.2 chance of being
a pit.
3. Actuators: They are the actions that the agent can take to interact with the world. The
worker in Wumpus World in AI can carry out the following tasks:
o Left turn
o Right turn
o Move forward
o Grab
o Release
o Shoot
4. Sensors: They are how the agent senses its surroundings. The agent's instruments in
the Wumpus World provide the following information:
o If the agent is in the chamber next to the wumpus, he will notice the stench.
(Not diagonally).
o If the agent is in the room immediately adjacent to the pit, he will notice a
breeze.
o The agent will notice the glitter in the chamber with the gold.
o The agent will notice the bump if he runs into a wall.
o When the Wumpus is shot, it lets out a horrifying scream that can be heard
throughout the tunnel.
o These perceptions can be represented as a five-element list with distinct
indicators for each sensor.
o For example, if an agent detects stench and breeze but not glitter, bump, or
scream, it can be depicted as [Stench, Breeze, None, None].

Wumpus World Cave Problem

The Wumpus world in AI is a cave with four chambers linked by passageways. So there are a
total of 16 chambers that are linked to one another. We now have a knowledge-based agent
who will advance in this universe. The cave has a chamber with a beast named Wumpus, who
eats anyone who enters it. The agent can shoot the wumpus, but the agent only has one
projectile. Some pit rooms in the Wumpus world in AI are bottomless, and if the agent falls
into one of them, he will be stuck there eternally. The exciting aspect of this cave is
discovering a heap of gold in one of its rooms. So the agent's objective is to locate the gold
and climb out of the cave without being eaten by wumpus or falling into Pits. The agent will
be rewarded if he returns with gold, but he will be punished if he is eaten by wumpus or slips
into the pit.

Some elements can assist the agent in navigating the tunnel. These elements are listed below:

• The rooms adjacent to the Wumpus chamber are stinky, so there will be a stench.
• The room closest to the PITs has a breeze, so if the agent gets close to the PIT, he will
notice the breeze.
• Glitter will be present in the chamber if the room contains gold.
• If the agent confronts the wumpus, it can be killed, and the wumpus will scream
horribly, which can be heard throughout the cave.
Exploring the Wumpus World

We will now explore the Wumpus world in AI and use logical reasoning to determine how
the agent will reach its objective.

Agent's first step: Initially, the agent is in the first room or on the square [1,1], and we
already know that this room is safe for the agent, so we will add the symbol OK to the below
diagram (a) to indicate that room is safe. Then, the agent is represented by symbol A, the
breeze by symbol B, the glitter or gold by symbol G, the visited chamber by symbol V, the
pits by symbol P, and the wumpus by symbol W.

The agent does not detect any breeze or Stench in Room [1,1], implying that the neighboring
squares are also fine.

Agent's second step: Now that the agent has to proceed forward, it will either go to [1,
2] or [2, 1]. Assume the agent moves to room [2, 1]. The agent detects a breeze in this
chamber, indicating that the pit is nearby. The pit can be in [3, 1] or [2, 2], so we'll put the
symbol P? to indicate whether or not this is a Pit room.

Now, the agent will pause and reflect before making any bad moves. Finally, the agent will
return to the [1, 1] chamber. The agent visits rooms [1,1] and [2,1], so we will use V to
symbolize the visited squares.

Agent's third step: At the third stage, the agent will proceed to room [1,2], which is fine.
The agent detects a stench in the area [1,2], indicating the presence of a Wumpus nearby. But,
according to the game's regulations, wumpus cannot be in the room [1,1] nor in [2,2]. (Agent
had not detected any stench when he was at [2,1]). As a result, the agent deduces that the
wumpus is in room [1,3], and in the present state, there is no breeze, implying that there is no
Pit and no Wumpus in [2,2]. So it's safe, and we'll label it OK, and the agent will move in
further [2,2].
Agent's fourth move: Because there is no stench or breeze in room [2,2], let us assume the
agent chooses to relocate to [2,3]. The agent detects glitter in the room [2,3], so it should take
the gold and climb out of the cave.

Applications of Wumpus World in AI

The Wumpus World in AI is a classic problem with multiple uses, including:

• Developing intelligent agents: The Wumpus World in AI is an excellent platform for


creating intelligent agents capable of navigating complicated environments, reasoning
in uncertainty, and planning actions.
• Testing AI algorithms: Wumpus World is a benchmark issue for testing and
comparing various AI algorithms, such as search, planning, and reinforcement
learning.
• Education and training: Because it is simple to use and offers hands-on experience,
the Wumpus World in AI is a popular tool for teaching AI concepts and algorithms to
students.
• Game Development: Wumpus World can motivate developers to create challenging
and engaging games requiring strategic thinking and problem-solving.
• Robotics: The Wumpus World can be used as a testing and development setting for
robotics algorithms such as pathfinding and mapping.

4 Discuss Forward Chaining with an example.


Forward Chaining the Inference Engine goes through all the facts, conditions and
derivations before deducing the outcome i.e When based on available data a decision is
taken then the process is called as Forwarding chaining, It works from an initial state and
reaches to the goal(final decision).
Example:
A
A -> B
B
—————————–
He is running.
If he is running, he sweats.
He is sweating.

Note: Refer Class Notes for Example Problem

5 Differentiate Supervised and Unsupervised Learning.


6 Explain Decision Trees in Machine Learning.

o Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes
represent the features of a dataset, branches represent the decision rules and each
leaf node represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions and do not contain any further
branches.
o The decisions or the test are performed on the basis of features of the given dataset.
o It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
o It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
o In order to build a tree, we use the CART algorithm, which stands for Classification
and Regression Tree algorithm.
o A decision tree simply asks a question, and based on the answer (Yes/No), it further
split the tree into subtrees.
o Below diagram explains the general structure of a decision tree:
Note: A decision tree can contain categorical data (YES/NO) as well as numeric data.

Why use Decision Trees?

There are various algorithms in Machine learning, so choosing the best algorithm for the given
dataset and problem is the main point to remember while creating a machine learning model.
Below are the two reasons for using the Decision tree:

o Decision Trees usually mimic human thinking ability while making a decision, so it is
easy to understand.
o The logic behind the decision tree can be easily understood because it shows a tree-like
structure.

Decision Tree Terminologies

Root Node: Root node is from where the decision tree starts. It represents the entire
dataset, which further gets divided into two or more homogeneous sets.
Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further
after getting a leaf node.
Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
Branch/Sub Tree: A tree formed by splitting the tree.
Pruning: Pruning is the process of removing the unwanted branches from the tree.
Parent/Child node: The root node of the tree is called the parent node, and other nodes
are called the child nodes.

How does the Decision Tree algorithm Work?

In a decision tree, for predicting the class of the given dataset, the algorithm starts from the
root node of the tree. This algorithm compares the values of root attribute with the record (real
dataset) attribute and, based on the comparison, follows the branch and jumps to the next node.

For the next node, the algorithm again compares the attribute value with the other sub-nodes
and move further. It continues the process until it reaches the leaf node of the tree. The complete
process can be better understood using the below algorithm:
o Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
o Step-2: Find the best attribute in the dataset using Attribute Selection Measure
(ASM).
o Step-3: Divide the S into subsets that contains possible values for the best attributes.
o Step-4: Generate the decision tree node, which contains the best attribute.
o Step-5: Recursively make new decision trees using the subsets of the dataset created in
step -3. Continue this process until a stage is reached where you cannot further classify
the nodes and called the final node as a leaf node.

Example: Suppose there is a candidate who has a job offer and wants to decide whether he
should accept the offer or Not. So, to solve this problem, the decision tree starts with the root
node (Salary attribute by ASM). The root node splits further into the next decision node
(distance from the office) and one leaf node based on the corresponding labels. The next
decision node further gets split into one decision node (Cab facility) and one leaf node. Finally,
the decision node splits into two leaf nodes (Accepted offers and Declined offer). Consider the
below diagram:

Attribute Selection Measures

While implementing a Decision tree, the main issue arises that how to select the best attribute
for the root node and for sub-nodes. So, to solve such problems there is a technique which is
called as Attribute selection measure or ASM. By this measurement, we can easily select the
best attribute for the nodes of the tree. There are two popular techniques for ASM, which are:

o Information Gain
o Gini Index

1. Information Gain:

o Information gain is the measurement of changes in entropy after the segmentation of a


dataset based on an attribute.
o It calculates how much information a feature provides us about a class.
o According to the value of information gain, we split the node and build the decision
tree.
o A decision tree algorithm always tries to maximize the value of information gain, and
a node/attribute having the highest information gain is split first. It can be calculated
using the below formula:

1. Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)

Entropy: Entropy is a metric to measure the impurity in a given attribute. It specifies


randomness in data. Entropy can be calculated as:

Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)

Where,

o S= Total number of samples


o P(yes)= probability of yes
o P(no)= probability of no

2. Gini Index:

o Gini index is a measure of impurity or purity used while creating a decision tree in the
CART (Classification and Regression Tree) algorithm.
o An attribute with the low Gini index should be preferred as compared to the high Gini
index.
o It only creates binary splits, and the CART algorithm uses the Gini index to create
binary splits.
o Gini index can be calculated using the below formula:

Gini Index= 1- ∑jPj2

Pruning: Getting an Optimal Decision tree

Pruning is a process of deleting the unnecessary nodes from a tree in order to get the optimal
decision tree.

A too-large tree increases the risk of overfitting, and a small tree may not capture all the
important features of the dataset. Therefore, a technique that decreases the size of the learning
tree without reducing accuracy is known as Pruning. There are mainly two types of
trees pruning technology used:

o Cost Complexity Pruning


o Reduced Error Pruning.
Advantages of the Decision Tree

o It is simple to understand as it follows the same process which a human follow while
making any decision in real-life.
o It can be very useful for solving decision-related problems.
o It helps to think about all the possible outcomes for a problem.
o There is less requirement of data cleaning compared to other algorithms.

Disadvantages of the Decision Tree

o The decision tree contains lots of layers, which makes it complex.


o It may have an overfitting issue, which can be resolved using the Random Forest
algorithm.
o For more class labels, the computational complexity of the decision tree may increase.

Python Implementation of Decision Tree

Now we will implement the Decision tree using Python. For this, we will use the dataset
"user_data.csv," which we have used in previous classification models. By using the same
dataset, we can compare the Decision tree classifier with other classification models such
as KNN SVM, LogisticRegression, etc.

Steps will also remain the same, which are given below:

o Data Pre-processing step


o Fitting a Decision-Tree algorithm to the Training set
o Predicting the test result
o Test accuracy of the result (Creation of Confusion matrix)
o Visualizing the test set result.

7 Explain KNN in Machine Learning. (K-Nearest Neighbours Algorithm)

K-Nearest
KNN isalgorithm.
employed
developed
The
onearticle
of theto
this
most Neighbors
explores
tackle
algorithm
basic yet (KNN)
classification
the fundamentals,
in 1951,and
essentialalgorithm
which
regression
workings,
was subsequently
problems.
and
is a algorithms
classification supervised
implementation
Evelyn
expanded
machine
Fixmachine
in and
by
of Thomas
the
Joseph
learning
KNN Cover.
Hodges
method
learning. It
belongs to the supervised learning domain and finds intense application in pattern
recognition, data mining, and intrusion detection.
It is widely disposable in real-life scenarios since it is non-parametric, meaning it does not
make any underlying assumptions about the distribution of data (as opposed to other
algorithms such as GMM, which assume a Gaussian distribution of the given data). We are
given some prior data (also called training data), which classifies coordinates into groups
identified by an attribute.
As an example, consider the following table of data points containing two features:
KNN Algorithm working visualization

Now, given another set of data points (also called testing data), allocate these points to a
group by analysing the training set. Note that the unclassified points are marked as ‘White.’

Intuition Behind KNN Algorithm


If we plot these points on a graph, we may be able to locate some clusters or groups. Now,
given an unclassified point, we can assign it to a group by observing what group its nearest
neighbors belong to. This means a point close to a cluster of points classified as ‘Red’ has a
higher probability of getting classified as ‘Red’.
Intuitively, we can see that the first point (2.5, 7) should be classified as ‘Green’, and the
second point (5.5, 4.5) should be classified as ‘Red’.

Why do we need a KNN algorithm?


(K-NN) algorithm is a versatile and widely used machine learning algorithm that is primarily
used for its simplicity and ease of implementation. It does not require any assumptions about
the underlying data distribution. It can also handle both numerical and categorical data,
making it a flexible choice for various types of datasets in classification and regression tasks.
It is a non-parametric method that makes predictions based on the similarity of data points in
a given dataset. K-NN is less sensitive to outliers compared to other algorithms.
The K-NN algorithm works by finding the K nearest neighbours to a given data point based
on a distance metric, such as Euclidean distance. The class or value of the data point is then
determined by the majority vote or average of the K neighbours. This approach allows the
algorithm to adapt to different patterns and make predictions based on the local structure of
the data.

How to choose the value of k for KNN Algorithm?


The value of k is very crucial in the KNN algorithm to define the number of neighbours in the
algorithm. The value of k in the k-nearest neighbours (k-NN) algorithm should be chosen
based on the input data. If the input data has more outliers or noise, a higher value of k would
be better. It is recommended to choose an odd value for k to avoid ties in
classification. Cross-validation methods can help in selecting the best k value for the given
dataset.
Workings of KNN algorithm
The K-Nearest Neighbours (KNN) algorithm operates on the principle of similarity, where it
predicts the label or value of a new data point by considering the labels or values of its K
nearest neighbours in the training dataset.

You might also like