0% found this document useful (0 votes)

145 views10 pages

BDA Assignment (Savi Bilandi)

This document summarizes a student's submission of their data analytics lab work. It includes responses to 4 questions analyzing grocery store transaction data, a diabetes dataset, and a car theft dataset using association rule mining and naive Bayes classification. Screenshots of Weka outputs are included. The student traces the Apriori algorithm on sample transaction data and identifies frequent itemsets and strong association rules. They also classify an unlabeled example in the car theft data using naive Bayes.

Uploaded by

SAVI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

145 views10 pages

BDA Assignment (Savi Bilandi)

Uploaded by

SAVI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

INTERNATIONAL SCHOOL OF INFORMATICS &

MANAGEMENT

Submitted in partial fulfillment of the requirement for the award of

the degree of
Diploma in Business Data analytics
2018-20

Subject:- Data analytics Lab

Submitted To:- Submitted By :-
Dr.Monika Rathore Savi Bilandi

MBA/2018/3731

Roll No. 18MIIXX699

Data Analytics Roll No. 183172

Q.1 Find association rules for supermarket / grocery dataset using R/
Weka . share screen shots and explain rules in your?
Solution:
1. Start the WEKA EXPLORER

2. LOAD THE SUPERMARKET DATABASE

3. RULES :

In principle the algorithm is quite simple. It builds up attribute-value (item) sets that maximize the number of instances
that can be explained (coverage of the dataset). The search through item space is very much similar to the problem
faced with attribute selection and subset search.

4. ANALYSE RESULTS:


 HJ
Q.2 Provide decision tree for diabetites dataset ( or you can take any
inbuilt dataset of your choice using R or Weka. ?
Solution:
Q.3 Trace the results of using the Apriori algorithm on the grocery
store example with support threshold s=33.34% and confidence
threshold c=60%. Show the candidate and frequent item sets for each
database scan. Enumerate all the final frequent item sets. Also
indicate the association rules that are generated and highlight the
strong ones, sort them by confidence. ?

Transaction Items
ID
T1 HotDogs, Buns,
Ketchup
T2 HotDogs, Buns
T3 HotDogs, Coke, Chips
T4 Chips, Coke
T5 Chips, Ketchup
T6 HotDogs, Coke, Chips

Solution:
Support threshold =33.34% => threshold is at least 2
transactions. Applying Apriori
Pass (k) Candidate k-itemsets and their support Frequent k-itemsets
k=1 HotDogs(4), Buns(2), Ketchup(2), Coke(3), Chips(4) HotDogs, Buns,
Ketchup,
Coke, Chips
k=2 {HotDogs, Buns}(2), {HotDogs, Ketchup}(1), {HotDogs, Buns},
{HotDogs, Coke}(2), {HotDogs, Chips}(2), {HotDogs, Coke},
{Buns, Ketchup}(1), {Buns, Coke}(0), {Buns, Chips} {HotDogs, Chips},
(0), {Coke, Chips}
{Ketchup, Coke}(0), {Ketchup, Chips}(1), {Coke,
Chips}(3)
k=3 {HotDogs, Coke, Chips}(2) {HotDogs, Coke,
Chips}
k=4 {}
Note that {HotDogs, Buns, Coke} and {HotDogs, Buns, Chips} are not candidates when
k=3 because their subsets {Buns, Coke} and {Buns, Chips} are not frequent.
Note also that normally, there is no need to go to k=4 since the longest transaction has only
3 items.

All Frequent Itemsets: {HotDogs}, {Buns}, {Ketchup}, {Coke}, {Chips}, {HotDogs,

Buns}, {HotDogs, Coke}, {HotDogs, Chips}, {Coke, Chips}, {HotDogs, Coke, Chips}.
Association rules:
{HotDogs, Buns} would generate: HotDogs € Buns (2/6=0.33, 2/4=0.5) and
Buns € HotDogs (2/6=0.33, 2/2=1);
{HotDogs, Coke} would generate: HotDogs € Coke (0.33, 0.5) and
Coke € HotDogs (2/6=0.33, 2/3=0.66);
{HotDogs, Chips} would generate: HotDogs € Chips (0.33, 0.5) and
Chips € HotDogs (2/6=0.33, 2/4=0.5);
{Coke, Chips} would generate: Coke € Chips (3/6=0.5, 3/3=1)
and
Chips € Coke (3/6=0.5, 3/4=0.75);
{HotDogs, Coke, Chips} would generate: HotDogs € Coke ^ Chips (2/6=0.33, 2/4=0.5),
Coke € Chips ^ HotDogs (2/6=0.33,
2/3=0.66), Chips € Coke ^ HotDogs
(2/6=0.33, 2/4=0.5), HotDogs ^ Coke €
Chips(2/6=0.33, 2/2=1), HotDogs ^ Chips €
Coke(2/6=0.33, 2/2=1) and Coke ^ Chips €
HotDogs(2/6=0.33, 2/3=0.66).

With the confidence threshold set to 60%, the Strong Association Rules are
1. Coke € Chips (0.5, 1) (sorted by confidence): 5.
2. Buns € HotDogs (0.33, 1); Chips € Coke (0.5, 0.75);
3. HotDogs ^ Coke € Chips(0.33, 1) 6. Coke € HotDogs (0.33, 0.66);
4. HotDogs ^ Chips € Coke(0.33, 1) 7. Coke € Chips ^ HotDogs (0.33,
0.66)
8. Coke ^ Chips € HotDogs(0.33,
0.66).
Q. 4 For following data set Example classify a Red
,Domestic and SUV. Note there is no example of a Red
Domestic SUV in following data set. So determine if the
classifier is Red ,Domestic and SUV , What will be the class
label ? Using Naive Bayes Classifier algorithm?
No. Color Type Origin Stolen?

1 Red Sports Domestic Yes

2 Red Sports Domestic No

3 Red Sports Domestic Yes

4 Yellow Sports Domestic No

5 Yellow Sports Imported Yes

6 Yellow SUV Imported No

7 Yellow SUV Imported Yes

8 Yellow SUV Domestic No

9 Red SUV Imported No

10 Red Sports Imported Yes

Solution
1 The Classifier
The Bayes Naive classifier selects the most likely classification Vnb given the attribute values
a1, a2, . . . an. This results in: Y
Vnb = argmaxvj ∈V P (vj ) P (ai |vj ) (1)

We generally estimate P (ai |vj ) using m-estimates:

P (ai |vj ) nc + mp (2)

= n+m
wher
e:
n = the number of training examples for
which v = vj nc = number of examples for
|
1
which v = vj and a = ai p = a priori
estimate for P (ai vj)
m = the equivalent sample size

2 Car theft Example

Attributes are Color , Type , Origin, and the subject, stolen can be either yes or
no.

2.1 data set

Example Color Type Origin Stole
No. n?
1 Red Spor Domest Yes
ts ic
2 Red Spor Domest No
ts ic
3 Red Spor Domest Yes
ts ic
4 Yello Spor Domest No
w ts ic
5 Yello Spor Import Yes
w ts ed
6 Yello SUV Import No
w ed
7 Yello SUV Import Yes
w ed
8 Yello SUV Domest No
w ic
9 Red SUV Import No
ed
10 Red Spor Import Yes
ts ed

2.2 Training example

We want to classify a Red Domestic SUV. Note there is no example of a Red
Domestic SUV in our data set. Looking back at equation (2) we can see how to
compute this. We need to calculate the probabilities
P(Red|Yes), P(SUV|Yes), P(Domestic|Yes) , P(Red|

No) , P(SUV|No), and P(Domestic|No)

2
and multiply them by P(Yes) and P(No) respectively . We can estimate these
values using equation (3).
Yes: No:
Red: Red:
n=5 n=5
n_c= 3 n_c = 2
p = .5 p = .5
m=3 m=3
SUV: SUV:
n=5 n=5
n_c = 1 n_c = 3
p = .5 p = .5
m=3 m=3
Domestic: Domestic:
n=5 n=5
n_c = 2 n_c = 3
p = .5 p = .5
m=3 m =3
|
Looking at P (Red Y es), we have 5 cases where vj = Yes , and in 3 of those
|
cases ai = Red. So for P (Red Y es), n = 5 and nc = 3. Note that all attribute
are binary (two possible values). We are assuming no other information so, p =
1 / (number-of-attribute-values) = 0.5 for all of our attributes. Our m value is
arbitrary, (We will use m = 3) but consistent for all attributes. Now we simply
apply eqauation (3) using the precomputed values of n , nc, p, and m.
3 + 3 ∗ .5 2 + 3 ∗ .5
P (Red|Y es) = = .56
5 + P (Red|No) = = 5.43
+
3 3
1 + 3 ∗ .5 3 + 3 ∗ .5
P (SUV |Y es) = =5.31
+
P (SUV |No) = 5+
= .56
3 3
2 + 3 ∗ .5 3 + 3 ∗ .5
P (Domestic|Y es) = 5+
= .43 P (Domestic|No) = 5+
= .56
3 3

We have P (Y es) = .5 and P (No) = .5, so we can apply equation (2). For v = Y es,
we have
P(Yes) * P(Red | Yes) * P(SUV | Yes) * P(Domestic|Yes)

= .5 * .56 * .31 * .43 = .037

and for v = No, we have
P(No) * P(Red | No) * P(SUV | No) * P (Domestic | No)

= .5 * .43 * .56 * .56 = .069

Since 0.069 > 0.037, our example gets classified as ’NO’

Q.5 What do you understand by linear regression ? For

following dataset is there a relation between Quantity Sold
(Output) and Price and Advertising (Input). Predict
Quantity Sold using regression analysis in Excel .?

Solution:
Linear regression is an important tool in analytics. The technique uses
statistical calculations to plot a trend line in a set of data points. ... Linear
regression shows a relationship between an independent variable and a
dependent variable being studied. There are a number of ways to calculate
linear regression
Analysis: if R Square is greater than 0.08, as it in the case , there is a good fit to
the data. Some statistics references recommend using the adjusted R square value.

Interpretation: R square of .961 means that 96.1% of the variation in price &
advertising can be explained by quantity sold. The adjusted R square of .942 means
94.2%.

Since the p value (0<0.05), we reject the null hypothesis that the two variables are
unrelated . in other words, there is a relation between the two variables.

30 Days ML Projects Challenge
No ratings yet
30 Days ML Projects Challenge
288 pages
Data Science and Big Data Analytics
No ratings yet
Data Science and Big Data Analytics
264 pages
Cars Project PDF
No ratings yet
Cars Project PDF
9 pages
Machine Learning VIVEK
80% (5)
Machine Learning VIVEK
118 pages
Wage and Salary Administration
100% (2)
Wage and Salary Administration
34 pages
SuperKart Milestone1 Final
No ratings yet
SuperKart Milestone1 Final
15 pages
yeniyeniduzelcek
No ratings yet
yeniyeniduzelcek
37 pages
Machine Learning
100% (1)
Machine Learning
33 pages
Business Report SMDM Project - Coded
No ratings yet
Business Report SMDM Project - Coded
27 pages
ML_Extended Project Business Report-Richa
No ratings yet
ML_Extended Project Business Report-Richa
32 pages
Machine Learning Assignment Report - Cars
100% (4)
Machine Learning Assignment Report - Cars
42 pages
UNIT02
No ratings yet
UNIT02
41 pages
DSBDA UNIT4
No ratings yet
DSBDA UNIT4
22 pages
SuperKart Milestone1 Final
No ratings yet
SuperKart Milestone1 Final
15 pages
Cardio Good Fitness Dataset
No ratings yet
Cardio Good Fitness Dataset
27 pages
DS for Business Home Assignments
No ratings yet
DS for Business Home Assignments
24 pages
Machine Learning Extended Project - BrahmaChari
No ratings yet
Machine Learning Extended Project - BrahmaChari
29 pages
Business Report Pradeep Chauhan 11june'23
100% (1)
Business Report Pradeep Chauhan 11june'23
25 pages
MBA 909_ DATA_ANALYTICS_Assessment_3
No ratings yet
MBA 909_ DATA_ANALYTICS_Assessment_3
19 pages
Assignment1
No ratings yet
Assignment1
11 pages
Project-Predictive Modeling-Rajendra M Bhat
100% (3)
Project-Predictive Modeling-Rajendra M Bhat
14 pages
Comprehensive Data Exploration With Python
No ratings yet
Comprehensive Data Exploration With Python
20 pages
Machine Learning Project - Parijat
No ratings yet
Machine Learning Project - Parijat
26 pages
AdvDB_Sheet5
No ratings yet
AdvDB_Sheet5
9 pages
Predictive Model: Submitted by
100% (3)
Predictive Model: Submitted by
27 pages
Finalised FBA CIA 3
No ratings yet
Finalised FBA CIA 3
16 pages
ISLR Package R Intro Stats Learning in R
No ratings yet
ISLR Package R Intro Stats Learning in R
15 pages
Price Analysis of BMW Cars in Dealerships
No ratings yet
Price Analysis of BMW Cars in Dealerships
25 pages
Nanduri Naga Sowri Pgp-Dsba - Octa - G2 Great Learning
No ratings yet
Nanduri Naga Sowri Pgp-Dsba - Octa - G2 Great Learning
40 pages
Cars4u Project: Proprietary Content. © Great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
100% (2)
Cars4u Project: Proprietary Content. © Great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
30 pages
Business Report Suchita Bhovar Coded Project
No ratings yet
Business Report Suchita Bhovar Coded Project
18 pages
MS5107 Boston Housing, Corolla NUIG
No ratings yet
MS5107 Boston Housing, Corolla NUIG
6 pages
SMDM Project Report-Survi Ghura
100% (1)
SMDM Project Report-Survi Ghura
26 pages
Cart-Rf-Ann: Prepared by Muralidharan N
67% (3)
Cart-Rf-Ann: Prepared by Muralidharan N
33 pages
DM Assignment
No ratings yet
DM Assignment
17 pages
BigMart PDF
100% (1)
BigMart PDF
42 pages
Managing data
No ratings yet
Managing data
8 pages
(eBook PDF) Business Analytics: A Management Approachpdf download
100% (4)
(eBook PDF) Business Analytics: A Management Approachpdf download
57 pages
DataMining Workbook Answers
No ratings yet
DataMining Workbook Answers
18 pages
openSAP Sac5 Week 4 Unit 7 PREDKEYINT Exercise
No ratings yet
openSAP Sac5 Week 4 Unit 7 PREDKEYINT Exercise
18 pages
Data Mining
No ratings yet
Data Mining
10 pages
Data Mining Problem 2 Report
No ratings yet
Data Mining Problem 2 Report
13 pages
Package ISLR': R Topics Documented
100% (1)
Package ISLR': R Topics Documented
15 pages
Price Analysis of BMW Cars in Dealerships
No ratings yet
Price Analysis of BMW Cars in Dealerships
25 pages
Pranjal - Singh - 30.10.2022 SMDM PROJECT REPORT
No ratings yet
Pranjal - Singh - 30.10.2022 SMDM PROJECT REPORT
9 pages
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages
Machine Learning
100% (2)
Machine Learning
30 pages
7406HW03
No ratings yet
7406HW03
2 pages
ISyE7406 Homework3
No ratings yet
ISyE7406 Homework3
20 pages
Itb 10BM60092 Term Paper
No ratings yet
Itb 10BM60092 Term Paper
8 pages
Python For Data Sceince l1 Hands On
No ratings yet
Python For Data Sceince l1 Hands On
5 pages
Deriving Insights From Data
No ratings yet
Deriving Insights From Data
8 pages
Assignment Problems
No ratings yet
Assignment Problems
7 pages
Data Mining Homework
No ratings yet
Data Mining Homework
3 pages
Finalproj Aml
No ratings yet
Finalproj Aml
69 pages
Project 5 Surabhi Sood - Report
No ratings yet
Project 5 Surabhi Sood - Report
34 pages
Predicting Mode of Transport (ML) : Akalya KS
No ratings yet
Predicting Mode of Transport (ML) : Akalya KS
17 pages
Date Preparation and Exploration:: Titanic Data - CSV
No ratings yet
Date Preparation and Exploration:: Titanic Data - CSV
5 pages
Indian Management Thought Vs Western Managenment Thought
67% (6)
Indian Management Thought Vs Western Managenment Thought
11 pages
BDMDM Telemarketing
No ratings yet
BDMDM Telemarketing
16 pages
Software Lab Report (M-405) : Index
No ratings yet
Software Lab Report (M-405) : Index
55 pages
Employee Welfare
100% (3)
Employee Welfare
18 pages
ML First Unit
No ratings yet
ML First Unit
70 pages
PID Report
100% (1)
PID Report
38 pages
A Multi-Size Compartment Vehicle Routing Problem For Multi-Product Distribution: Models and Solution Procedures
No ratings yet
A Multi-Size Compartment Vehicle Routing Problem For Multi-Product Distribution: Models and Solution Procedures
21 pages
Becg Assignment
No ratings yet
Becg Assignment
16 pages
Analysis of Systematic Investment Plan
No ratings yet
Analysis of Systematic Investment Plan
35 pages
Numerical Methods For Optimization Lecture 6: Maximum Flow Problems
No ratings yet
Numerical Methods For Optimization Lecture 6: Maximum Flow Problems
30 pages
Bussiness Laws
No ratings yet
Bussiness Laws
22 pages
Homework Assignment 8: Solutions
No ratings yet
Homework Assignment 8: Solutions
7 pages
Solving Absolute Value Equations Worksheet
No ratings yet
Solving Absolute Value Equations Worksheet
4 pages
Mutual Fund Plans 4thsem
No ratings yet
Mutual Fund Plans 4thsem
21 pages
Chapter#3 - Assignmnet Problem
No ratings yet
Chapter#3 - Assignmnet Problem
36 pages
Hika ppt
No ratings yet
Hika ppt
38 pages
Unit 3 - Probability and Probability Distributions Vs2-Merged
No ratings yet
Unit 3 - Probability and Probability Distributions Vs2-Merged
28 pages
AI512/EE633: Reinforcement Learning: Lecture 3 - Dynamic Programming
No ratings yet
AI512/EE633: Reinforcement Learning: Lecture 3 - Dynamic Programming
43 pages
Sensitivity Analysis Dual Problem
No ratings yet
Sensitivity Analysis Dual Problem
67 pages
The Sole Meaning of Life Is To
No ratings yet
The Sole Meaning of Life Is To
9 pages
S3567
No ratings yet
S3567
18 pages
lec48
No ratings yet
lec48
12 pages
Resource Management in Distributed Systems: Task Assignment, Load-Balancing and Load-Sharing
No ratings yet
Resource Management in Distributed Systems: Task Assignment, Load-Balancing and Load-Sharing
36 pages
Machine Learning
0% (2)
Machine Learning
3 pages
Human Resource Softwares
No ratings yet
Human Resource Softwares
30 pages
Rural Marketing Assignment
No ratings yet
Rural Marketing Assignment
29 pages
Master: International School of Informatic and Management
No ratings yet
Master: International School of Informatic and Management
28 pages
Project Management
No ratings yet
Project Management
28 pages
Bda Lab
No ratings yet
Bda Lab
11 pages
Final report kmean 3
No ratings yet
Final report kmean 3
9 pages
Identify A Company of Your Choice and List
No ratings yet
Identify A Company of Your Choice and List
22 pages
A Study On Hybrid Recommender System Combined Sentiment Analysis With Matrix Factorization
No ratings yet
A Study On Hybrid Recommender System Combined Sentiment Analysis With Matrix Factorization
12 pages
33, Cad. 643-D - TD
No ratings yet
33, Cad. 643-D - TD
11 pages
Pmrs Assignment
No ratings yet
Pmrs Assignment
23 pages
Project Diksha
No ratings yet
Project Diksha
18 pages
Mock_exam (17)
No ratings yet
Mock_exam (17)
5 pages
Anomaly Detection in Social Networks Twitter Bot
No ratings yet
Anomaly Detection in Social Networks Twitter Bot
11 pages
Essay On Industrial Psychology
No ratings yet
Essay On Industrial Psychology
14 pages
Exercise 3 (Transient Response Analysis)
No ratings yet
Exercise 3 (Transient Response Analysis)
2 pages
All Quizzes
No ratings yet
All Quizzes
9 pages
Daaa 1 Introduction
No ratings yet
Daaa 1 Introduction
33 pages
Study, Analysis and Evaluation of Block Diagram Reduction Name: University of Polytechnic/college of
No ratings yet
Study, Analysis and Evaluation of Block Diagram Reduction Name: University of Polytechnic/college of
10 pages
Maths Test Algebra L
No ratings yet
Maths Test Algebra L
4 pages
Test
No ratings yet
Test
1 page
Lanchester Equation For MAT 5932
No ratings yet
Lanchester Equation For MAT 5932
22 pages
Using D-Funtions: Dsum Daverage Dmin Dmax Dcount
No ratings yet
Using D-Funtions: Dsum Daverage Dmin Dmax Dcount
4 pages
Master Fundamental Concepts of Math Olympiad: Maths, #1
From Everand
Master Fundamental Concepts of Math Olympiad: Maths, #1
Subbalakshmi Devaki
No ratings yet
Image Processing With Matlab
No ratings yet
Image Processing With Matlab
4 pages
Sample 7780
No ratings yet
Sample 7780
11 pages
Case Study For Text Analytics
No ratings yet
Case Study For Text Analytics
2 pages
Let's Get Coding
From Everand
Let's Get Coding
Philip Searle
No ratings yet

BDA Assignment (Savi Bilandi)

Uploaded by

BDA Assignment (Savi Bilandi)

Uploaded by

INTERNATIONAL SCHOOL OF INFORMATICS &

Submitted in partial fulfillment of the requirement for the award of

Subject:- Data analytics Lab

Roll No. 18MIIXX699

Data Analytics Roll No. 183172

2. LOAD THE SUPERMARKET DATABASE

All Frequent Itemsets: {HotDogs}, {Buns}, {Ketchup}, {Coke}, {Chips}, {HotDogs,

1 Red Sports Domestic Yes

2 Red Sports Domestic No

3 Red Sports Domestic Yes

4 Yellow Sports Domestic No

5 Yellow Sports Imported Yes

6 Yellow SUV Imported No

7 Yellow SUV Imported Yes

8 Yellow SUV Domestic No

9 Red SUV Imported No

10 Red Sports Imported Yes

We generally estimate P (ai |vj ) using m-estimates:

P (ai |vj ) nc + mp (2)

2 Car theft Example

2.1 data set

2.2 Training example

No) , P(SUV|No), and P(Domestic|No)

= .5 * .56 * .31 * .43 = .037

= .5 * .43 * .56 * .56 = .069

Q.5 What do you understand by linear regression ? For

You might also like