dwm exp4 a49

Data warehousing and mining Experiment 4 Mumbai University

Uploaded by

gharatsoham28

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

dwm exp4 a49

Data warehousing and mining Experiment 4 Mumbai University

Uploaded by

gharatsoham28

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

PART A

(PART A: TO BE REFFERED BY STUDENTS)

Experiment No.04

Aim: Implementation of naïve Bayesian Classifier using Weka Tool.

Outcome:
After successful completion of this experiment students will be able to
1. Demonstrate an understanding of the importance of data mining
2. Organize and Prepare the data needed for data mining using pre preprocessing techniques
3. Perform exploratory analysis of the data to be used for mining.
4. Implement the appropriate data mining methods like classification

Theory:

Naïve Bayes:

Naïve Bayes methods are aset of supervised learning algorithms based on applying Bayes’ theorem with the
“naïve” assumption of conditional independence between every pair of features given the value of the class
variable. Bayes’ theorem states the following relationship, given class variable y and dependent feature vector
x1 through xn, :

P(y∣x1,…,xn) = P(y)P(x1,…,xn∣y)/P(x1,…,xn) Using

the naive conditional independence assumption that
P(xi|y,x1,…,xi−1,xi+1,…,xn) = P(xi|y),

for all i, this relationship is simplified to

P(y∣x1,…,xn)=P(y)∏i=1nP(xi∣y)/P(x1,…,xn)

SinceP(x1,…,xn)is constant given the input, we can use the following classification rule:

P(y∣x1,…,xn)∝P(y)∏i=1nP(xi∣y)𝖴y^
=argmax(y)(P(y)∏i=1nP(xi∣y)),
and we can use Maximum A Posteriori (MAP) estimation to estimate P(y) and P(xi∣y); the former is then the
relative frequency of class y in the training set.

The different naive Bayes classifiers differ mainly by the assumptions they make regarding the distribution of
P(xi∣y).
In spite of their apparently over-simplified assumptions, naive Bayes classifiers have worked quite well in
many real-world situations, famously document classification and spam filtering. They require a small amount
of training data to estimate the necessary parameters. (For theoretical reasons why naive Bayes works well,
and on which types of data it does, see the references below.)

Naive Bayes learners and classifiers can be extremely fast compared to more sophisticated methods. The
decoupling of the class conditional feature distributions means that each distribution can be independently
estimated as a one-dimensional distribution. This in turn helps to alleviate problems stemming from the course
of dimensionality.
Gaussian Naive Bayes:
When working with continuous data,an assumption often taken is that the continuous values associated with
each class are distributed according to a normal (or Gaussian) distribution. The likelihood of the features is
assumed to be-

Sometimes assume variance

● Is independent of Y(i.e.,σi),

● Or independent of Xi (i.e.,σk)

● Or both(i.e., σ)

Gaussian NaiveBayes supports continuous valued features and models each as conforming to a Gaussian
(normal) distribution.
PART B
(PART B: TO BE COMPLETED BY STUDENTS)

Roll. No: A49 Name: Soham B. Gharat

Class: TE AI & DS Batch: A3
Date of Experiment: 16/08/2024 Date of Submission: 30/08/2024
Grade:

Input and Output:

Observations and learning:
The Naïve Bayes classifier is a probabilistic classifier based on Bayes' theorem with the assumption of
independence between features. Weka's user-friendly interface simplified data import, preprocessing, and
model training, making it straightforward to handle various datasets. The Naïve Bayes classifier, known for
its simplicity and efficiency, performed well even with the assumption of feature independence.

Conclusion:
In this experiment, we implemented a Naïve Bayesian Classifier using Weka, successfully training it to
classify data with good accuracy. The results demonstrated the model's efficiency and scalability for
classification tasks, with insights into areas for potential improvement.

Question of Curiosity:

Q.1] What type of datasets are suitable for the Naive Bayesian Classifier in Weka and why?

Ans: Naive Bayes is particularly well-suited for datasets that:

1. Categorical Data: Naive Bayes works exceptionally well with categorical data because it estimates the
probability of each category independently. In Weka, datasets with categorical features allow the Naive
Bayes algorithm to directly calculate the probability of a given class based on the frequency of attribute
values.
2. Text Data (Bag-of-Words Representation): Naive Bayes is effective in text classification tasks, such as
spam detection or sentiment analysis, where data is often represented as a bag-of-words. Each word
(feature) is treated as independent, making the algorithm simple and fast for text data.
3. Low Dimensional Datasets: Naive Bayes performs well on datasets with a relatively small number of
features. In cases where the dimensionality is high but the features are sparse (like text data), the
independence assumption of Naive Bayes simplifies computation and often leads to good results.
4. Moderately Sized Datasets: It is particularly effective on small to medium-sized datasets where other,
more complex models might overfit. The simplicity of Naive Bayes allows it to generalize well even with
limited data.

The Naive Bayesian Classifier assumes that all features are independent given the class label (the so-called
"naive" assumption). This simplification makes the algorithm computationally efficient and easy to
implement, even with relatively simple data. Despite the independence assumption, Naive Bayes often yields
good results, especially when the independence assumption is approximately true or when the model's
simplicity outweighs the impact of any correlations between features.
Q.2] How do you preprocess a dataset in Weka before applying the Naive Bayesian Classifier?
Ans: Before applying the Naive Bayesian Classifier, the following preprocessing steps are recommended in
Weka:
1. Data Cleaning: Remove or impute missing values using Weka's "Filter" option under the "Preprocess"
tab.
2. Discretization: If the dataset has continuous numerical attributes, consider discretizing them into
categorical intervals using filters like unsupervised.attribute.Discretize.
3. Attribute Selection: Use Weka's attribute selection filters to remove irrelevant or redundant features,
which can improve the model’s performance.
4. Normalization: Though Naive Bayes generally handles raw data well, normalization can be applied to
ensure that attributes are on a similar scale, especially when dealing with continuous data.
Q.3] How can you interpret the confusion matrix generated by the Naive Bayesian Classifier in Weka ?
Ans: The confusion matrix generated by the Naive Bayesian Classifier in Weka is a key tool for evaluating
the performance of your model. The following are the ways to interpret the confusion matrix:
1. True Positives (TP): This value represents the number of instances that were correctly predicted as
belonging to the positive class. For example, if you are classifying emails as "spam" or "not spam," TP
would be the number of emails correctly identified as "spam."
2. True Negatives (TN): This value indicates the number of instances that were correctly predicted as
belonging to the negative class. Continuing with the spam example, TN would be the number of emails
correctly identified as "not spam.".
3. False Positives (FP): These are the instances where the classifier incorrectly predicted the positive class
when it should have predicted the negative class. In the spam example, FP represents the number of
"not spam" emails that were incorrectly classified as "spam." This is also known as a Type I error.
4. False Negatives (FN): These are the instances where the classifier incorrectly predicted the negative
class when it should have predicted the positive class. In our example, FN would be the number of
"spam" emails that were incorrectly classified as "not spam." This is also known as a Type II error.
5. Interpretation:
• A high number of True Positives (TP) and True Negatives (TN) indicates that the classifier is
performing well.
• A high number of False Positives (FP) or False Negatives (FN) suggests areas where the model
is misclassifying data, potentially requiring further tuning or alternative modeling approaches.
• By analyzing the balance between Precision and Recall (via the F1-score), you can assess
whether the model is biased towards one class, which is particularly important in imbalanced
datasets.

The AI Wealth Creation Blueprint PDF
67% (3)
The AI Wealth Creation Blueprint PDF
50 pages
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
88% (8)
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
56 pages
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
81% (48)
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
708 pages
The Akashic Records
75% (8)
The Akashic Records
5 pages
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
100% (10)
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
821 pages
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
100% (25)
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
306 pages
Grokking Machine Learning v7 MEAP
100% (9)
Grokking Machine Learning v7 MEAP
280 pages
Generative AI For Beginners1
100% (1)
Generative AI For Beginners1
85 pages
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
100% (24)
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
52 pages
Banana Pancakes - Ukulele Chord Chart
100% (1)
Banana Pancakes - Ukulele Chord Chart
2 pages
Gestalt
100% (3)
Gestalt
39 pages
75 Productivity Hacks - System Sunday
100% (7)
75 Productivity Hacks - System Sunday
75 pages
Military Remote Viewing Manual
100% (5)
Military Remote Viewing Manual
72 pages
DeSeasonalizing A Time Series
100% (2)
DeSeasonalizing A Time Series
8 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
6 pages
NOTES
No ratings yet
NOTES
15 pages
6 Easy Steps To Learn Naive Bayes Algorithm (With Code in Python)
No ratings yet
6 Easy Steps To Learn Naive Bayes Algorithm (With Code in Python)
3 pages
Naive Bayes Classifier: Fundamentals and Applications
From Everand
Naive Bayes Classifier: Fundamentals and Applications
Fouad Sabry
No ratings yet
DWM EXP 4-2
No ratings yet
DWM EXP 4-2
4 pages
Pract 8 - Naive Bays Algorithm
No ratings yet
Pract 8 - Naive Bays Algorithm
2 pages
ML Unit3
No ratings yet
ML Unit3
21 pages
Unit 3 PPT
No ratings yet
Unit 3 PPT
20 pages
Assignment 3 Part 1 and 4
No ratings yet
Assignment 3 Part 1 and 4
3 pages
Naive Bayes Classifier From Wikipedia
No ratings yet
Naive Bayes Classifier From Wikipedia
13 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
11 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
9 pages
Practical # 11
No ratings yet
Practical # 11
10 pages
6 Easy Steps To Learn Naive Bayes Algorithm With Codes in Python and R
No ratings yet
6 Easy Steps To Learn Naive Bayes Algorithm With Codes in Python and R
6 pages
Practical_3 (2)
No ratings yet
Practical_3 (2)
11 pages
Naive Bayes
No ratings yet
Naive Bayes
11 pages
Classification
No ratings yet
Classification
7 pages
1.9. Naive Bayes - Scikit-Learn 0.21.3 Documentation
No ratings yet
1.9. Naive Bayes - Scikit-Learn 0.21.3 Documentation
4 pages
Why Use PCA
No ratings yet
Why Use PCA
85 pages
Probabilistic Models in Machine Learning: Unit - III Chapter - 1
No ratings yet
Probabilistic Models in Machine Learning: Unit - III Chapter - 1
18 pages
Unit-4 Naïve Bayes & Support Vector Machine
No ratings yet
Unit-4 Naïve Bayes & Support Vector Machine
79 pages
DM assignment 2
No ratings yet
DM assignment 2
23 pages
11 Most Common Machine Learning Algorithms Explained in A Nutshell by Soner Yıldırım Towards Data Science
No ratings yet
11 Most Common Machine Learning Algorithms Explained in A Nutshell by Soner Yıldırım Towards Data Science
16 pages
Document
No ratings yet
Document
6 pages
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
No ratings yet
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
28 pages
Unit 4
No ratings yet
Unit 4
20 pages
Naive Bayes Explanation Cleaned (1)
No ratings yet
Naive Bayes Explanation Cleaned (1)
2 pages
Classification and Prediction
No ratings yet
Classification and Prediction
21 pages
Chapter 11 KNN Naive Bayes and LDA
No ratings yet
Chapter 11 KNN Naive Bayes and LDA
15 pages
Unit 5-6
No ratings yet
Unit 5-6
18 pages
Assignment No 2
No ratings yet
Assignment No 2
5 pages
SVM Presentation
No ratings yet
SVM Presentation
27 pages
Neal Zhang
No ratings yet
Neal Zhang
33 pages
Analysis of Classification Algorithm in Data Mining
No ratings yet
Analysis of Classification Algorithm in Data Mining
4 pages
Assignment 2
No ratings yet
Assignment 2
111 pages
Algorithms
No ratings yet
Algorithms
7 pages
ML Unit-4
No ratings yet
ML Unit-4
82 pages
unit-3(after_mid)
No ratings yet
unit-3(after_mid)
10 pages
DWM_EXP3_63
No ratings yet
DWM_EXP3_63
7 pages
Naive Bayes
No ratings yet
Naive Bayes
3 pages
Assignment 02: Submitted To
No ratings yet
Assignment 02: Submitted To
4 pages
Naive Bayes Classifier Overview
No ratings yet
Naive Bayes Classifier Overview
7 pages
Deep Learning Answers
No ratings yet
Deep Learning Answers
36 pages
Pattern Recognition and Computer Vision Unit-1
No ratings yet
Pattern Recognition and Computer Vision Unit-1
37 pages
ML-UNIT-II
No ratings yet
ML-UNIT-II
16 pages
jpskycak-2018-intuiting-predictive-algorithms-1
No ratings yet
jpskycak-2018-intuiting-predictive-algorithms-1
16 pages
Lecture 9
No ratings yet
Lecture 9
27 pages
NLP Unit-3
No ratings yet
NLP Unit-3
17 pages
Adbms Assignment 5: Q.1) Comparison of All Classification Algorithms Logistic Regression
No ratings yet
Adbms Assignment 5: Q.1) Comparison of All Classification Algorithms Logistic Regression
4 pages
Module 3_classification
No ratings yet
Module 3_classification
9 pages
Machine Learning Techniques Assignment-7: Name:Ishaan Kapoor Rollno:1/15/Fet/Bcs/1/055
No ratings yet
Machine Learning Techniques Assignment-7: Name:Ishaan Kapoor Rollno:1/15/Fet/Bcs/1/055
5 pages
Ijcsea 2
No ratings yet
Ijcsea 2
13 pages
Unit-III Classification
No ratings yet
Unit-III Classification
10 pages
7 Types of Classification Algorithms
No ratings yet
7 Types of Classification Algorithms
9 pages
QlikView and Naive Bayes
No ratings yet
QlikView and Naive Bayes
10 pages
Week 2: Machine Learning Intro: Instructor: Ting Sun
No ratings yet
Week 2: Machine Learning Intro: Instructor: Ting Sun
21 pages
MODULE 3 Classification
No ratings yet
MODULE 3 Classification
5 pages
Bayesian Classification Theory: Robin Hanson John Stutz Peter Cheeseman
No ratings yet
Bayesian Classification Theory: Robin Hanson John Stutz Peter Cheeseman
10 pages
Bayes' Theorem Explained
No ratings yet
Bayes' Theorem Explained
18 pages
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet
A49 WC EXP 6
No ratings yet
A49 WC EXP 6
9 pages
dwm exp5 a49
No ratings yet
dwm exp5 a49
8 pages
dwm exp6 a49
No ratings yet
dwm exp6 a49
7 pages
DocScanner Oct 8, 2024 4-54 PM
No ratings yet
DocScanner Oct 8, 2024 4-54 PM
39 pages
Iat KT TT Students Sh24
No ratings yet
Iat KT TT Students Sh24
1 page
Teas Topics To Study
100% (12)
Teas Topics To Study
6 pages
Sudoku Theory
No ratings yet
Sudoku Theory
13 pages
My Ai Cheat List
100% (11)
My Ai Cheat List
3 pages
The Secrets of A Slot Machine
No ratings yet
The Secrets of A Slot Machine
4 pages
Improved Statistical Test
87% (171)
Improved Statistical Test
20 pages
Psych Unit 7a Practice Quiz
No ratings yet
Psych Unit 7a Practice Quiz
4 pages
Wisc V Interpretation
100% (1)
Wisc V Interpretation
8 pages
Attention Is All You Need
50% (2)
Attention Is All You Need
11 pages
Roadmap How To Learn AI in 2024 (Uncovered AI)
No ratings yet
Roadmap How To Learn AI in 2024 (Uncovered AI)
6 pages
Situationalawareness 1 30
No ratings yet
Situationalawareness 1 30
30 pages
Rationality From AI To Zombies
86% (7)
Rationality From AI To Zombies
1,813 pages
AI Money Machine
100% (2)
AI Money Machine
267 pages
What Are The Different Types of Aptitude Test
100% (1)
What Are The Different Types of Aptitude Test
135 pages
101 Productivity Boosting ChatGPT Prompts
No ratings yet
101 Productivity Boosting ChatGPT Prompts
28 pages
Realworld - Python - Hackers Guide2021
67% (3)
Realworld - Python - Hackers Guide2021
362 pages
Cognitive Bias Cheat Sheet
100% (1)
Cognitive Bias Cheat Sheet
17 pages
Download Complete Artificial Intelligence and Problem Solving 1st Edition Danny Kopec PDF for All Chapters
100% (4)
Download Complete Artificial Intelligence and Problem Solving 1st Edition Danny Kopec PDF for All Chapters
61 pages
Program Design Tools: Algorithms, Flow Charts, Pseudo Codes and Decision Tables
No ratings yet
Program Design Tools: Algorithms, Flow Charts, Pseudo Codes and Decision Tables
28 pages
Maths
No ratings yet
Maths
11 pages
Research Chapter 1
No ratings yet
Research Chapter 1
3 pages
Module 3 - Partial Differential Equations Part 1 - PPT. 16888102435516
No ratings yet
Module 3 - Partial Differential Equations Part 1 - PPT. 16888102435516
33 pages
Dsu MCQ PDF
No ratings yet
Dsu MCQ PDF
16 pages
Entropy Balance: Prof. Dr. Uğur Atikol
No ratings yet
Entropy Balance: Prof. Dr. Uğur Atikol
14 pages
International Journal of Control
No ratings yet
International Journal of Control
2 pages
MATHs Method Pasco
No ratings yet
MATHs Method Pasco
7 pages
Modular Arithmetic PDF
No ratings yet
Modular Arithmetic PDF
3 pages
Hermite Interpolation
No ratings yet
Hermite Interpolation
3 pages
Arash Ashari Ponnuswamy Sadayappan Davide Del Vento: Contribution Results
No ratings yet
Arash Ashari Ponnuswamy Sadayappan Davide Del Vento: Contribution Results
1 page
Feature Extraction Gabor Filters
No ratings yet
Feature Extraction Gabor Filters
2 pages
main8
No ratings yet
main8
21 pages
A Scalable and Auditable Secure Data Sharing Scheme With Traceability For Fog-Based Smart Logistics - Done
No ratings yet
A Scalable and Auditable Secure Data Sharing Scheme With Traceability For Fog-Based Smart Logistics - Done
15 pages
Monte Carlo Statistical Methods
No ratings yet
Monte Carlo Statistical Methods
289 pages
Classical Control 5th Sem - Watermark
No ratings yet
Classical Control 5th Sem - Watermark
3 pages
Blast 2 Sequences: Salman Khan Current Gpa in Bioinf 4 Gpa
No ratings yet
Blast 2 Sequences: Salman Khan Current Gpa in Bioinf 4 Gpa
45 pages
信號期中大抄
No ratings yet
信號期中大抄
2 pages
Lab 1
No ratings yet
Lab 1
2 pages
hw4 Sol
No ratings yet
hw4 Sol
5 pages
Atmel 8885 CryptoAuth ATSHA204A Datasheet-1368905
No ratings yet
Atmel 8885 CryptoAuth ATSHA204A Datasheet-1368905
83 pages
Expt 8 - Material-Balances-Excel Student Copy-1
No ratings yet
Expt 8 - Material-Balances-Excel Student Copy-1
7 pages
Essential Questions For The Exam 2017, AMCS 336, Numerical Methods For Stochastic Differential Equations
No ratings yet
Essential Questions For The Exam 2017, AMCS 336, Numerical Methods For Stochastic Differential Equations
21 pages
Maxima & Minima-Jee
No ratings yet
Maxima & Minima-Jee
2 pages