Data Mining Assignment 1

This document outlines the instructions for Assignment 1 of the CS 536/CS 432 Data Mining course. It includes 3 parts: 1) data understanding, visualization, and preprocessing on the Dishes dataset using RapidMiner, 2) data preprocessing tasks on the Communities and Crime dataset from UCI, and 3) dependency analysis on the cryotherapy dataset including generating a correlation matrix and chi square statistic. Students are instructed to complete the tasks individually, submit a report and code, and discuss all results in detail.

Uploaded by

Zain Aamir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

Data Mining Assignment 1

Uploaded by

Zain Aamir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

CS 536/CS 432 – Data Mining

Assignment 1
Due: February 13 (Tuesday) at 12 midnight

Instructions: (1) You may discuss the assignment with others. However, you MUST
do and submit your OWN work. (2) Submit a soft-copy report to the submission
folder on LMS. Include report and code needed to reproduce your results.

1. Data Understanding, Visualization and Preprocessing (55 points)

Explore and understand the Dishes dataset provided on LMS. Report the
results/outcomes of every part in your report. All tasks must be performed in
RapidMiner.

You need to perform the following tasks:

a. Fill in the missing values in highest and lowest price column using the
appropriate filter. Report statistics of both columns (max, mean, average, std
dev, variance).
b. If value is not valid in first and last appeared column, replace it with the year
above or below it. Report the number of affected rows.
c. Sort w.r.t dishes name and do the following things on this column:
 Find unique entries and report the number
 Remove records with false entries (e.g. name can't be a number)
 Remove null entries
 Remove punctuation
 Remove leading and trailing spaces
 Change to lower case
 Remove duplicate entries
 What changes will appear in other columns when you remove duplicate
entries. Perform and report them.
 Find at least 10 different near duplicate entries and record them as one
entry e.g egg muffin and egg muffins are near duplicates. Similarly
zuppa del Giorno and zuppa del girono are same thing with spelling
mistake
 Find unique entries again and report the number
d. Report the dataset size and affected rows after every operation in (a), (b), (c)
and discuss why it changed. Comment on if there is more redundant information
and ways to remove them.
e. Report 10 most popular dishes
f. Report 10 most expensive and cheap dishes. Plot their bar graph with number
of times they are appearing on the menu.
g. Report at least 5 dishes that appeared for the longest or shortest period of time
in the menu.
h. What is the best representation strategy for different attributes providing
maximum information for this dataset? Justify your choice in report
i. Comments on the results. Present and discuss any other observation/result that
you find interesting.

CS 536 (Sp 17-18) – Dr. Mian Muhammad Awais Page 1 of 2

2. Data Preprocessing (15 points)
Use RapidMiner to perform the following tasks. Download the Communities and
Crime dataset from UCI repository. Study the dataset, and perform the following
tasks:
a. Study dataset and report number of missing values in data set (per attribute/per
object). Also, report attributes with high missing values (top 5)
b. Fill in the missing values in the data using an appropriate filter.
c. Standardize the dataset to zero mean and unit variance (z-score normalization)

3. Dependency Analysis (30 points)

Perform dependency analysis on cryotherapy dataset (Download dataset from here:
goo.gl/rzspPa). Result of treatment is your class label identifying whether treatment is
successful or not (this is a binary classification task)

You are required to do the following tasks:

a. Generate the correlation matrix for the attributes in this dataset. In particular, observe
the correlation between attributes and class label, and significant correlations between
attributes and report your observation.
b. Compute the chi square stat between attribute ‘age’ and ‘number of warts’ in this
dataset.
c. Comment on the results. (This whole part deals with analysis. Make sure to present
a comprehensive observation after performing these operations)

You are required to do this task in Python. There is no restriction of using library
functions.

Note: You should discuss results/outcomes of each part in detail in your report and
provide all rapid miner files in your submission. Zip the folder and name it as
rollnumber_Name_SubjectCode e.g 16030000_JohnSnow_CS536. There will be
deduction of marks if submission instructions aren’t properly followed. In case of
plagiarism (in any of the part), whole assignment will be graded zero.

CS 536 (Sp 17-18) – Dr. Mian Muhammad Awais Page 2 of 2

Agri-Informatics - Notes by G Vanitha
No ratings yet
Agri-Informatics - Notes by G Vanitha
4 pages
CS 432/536 (SP 17-18) - Dr. Mian Muhammad Awais Page 1 of 2
No ratings yet
CS 432/536 (SP 17-18) - Dr. Mian Muhammad Awais Page 1 of 2
2 pages
Chap.3 Data Preprocessing
No ratings yet
Chap.3 Data Preprocessing
6 pages
M S Ramaiah Institute of Technology Department of Information Science & Engg
No ratings yet
M S Ramaiah Institute of Technology Department of Information Science & Engg
11 pages
Assignment Mini Project_5_6_920241107180304
No ratings yet
Assignment Mini Project_5_6_920241107180304
1 page
Data Mining - Lab 1
No ratings yet
Data Mining - Lab 1
4 pages
CS322_Lec 3_S25
No ratings yet
CS322_Lec 3_S25
42 pages
Revision Questions
No ratings yet
Revision Questions
19 pages
In Semester (Individual) Assignment
No ratings yet
In Semester (Individual) Assignment
12 pages
Task 1
No ratings yet
Task 1
3 pages
ITECH2302 MainAssessment Report
No ratings yet
ITECH2302 MainAssessment Report
8 pages
index
No ratings yet
index
4 pages
CSCI322 - Lecture 2
No ratings yet
CSCI322 - Lecture 2
38 pages
Ca2 - Lpu
No ratings yet
Ca2 - Lpu
2 pages
Assignment 1 - Introduction To Data Science
No ratings yet
Assignment 1 - Introduction To Data Science
3 pages
ADA Assignment - Final - 2022
No ratings yet
ADA Assignment - Final - 2022
6 pages
Assignment 1
No ratings yet
Assignment 1
9 pages
CSCI946 Assignment_1_task_sheet
No ratings yet
CSCI946 Assignment_1_task_sheet
4 pages
Assign 1
No ratings yet
Assign 1
1 page
Assignment Questions - Data Analysis and Visualization Using Power BI and Tableau
No ratings yet
Assignment Questions - Data Analysis and Visualization Using Power BI and Tableau
2 pages
DATASCIENCE (1)
No ratings yet
DATASCIENCE (1)
3 pages
Computational
No ratings yet
Computational
7 pages
Assign1 s2 2024
No ratings yet
Assign1 s2 2024
5 pages
Data Science in Society Cat
No ratings yet
Data Science in Society Cat
5 pages
DMC - Record
No ratings yet
DMC - Record
54 pages
II CSE_A&B (96)DS-int 1 QP ANS-set1 - Copy
No ratings yet
II CSE_A&B (96)DS-int 1 QP ANS-set1 - Copy
7 pages
1152CS239-Intro. To Data Science-Syllabus
No ratings yet
1152CS239-Intro. To Data Science-Syllabus
6 pages
Assignment For DSF
No ratings yet
Assignment For DSF
2 pages
Assignment2
No ratings yet
Assignment2
7 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
Final Project
No ratings yet
Final Project
4 pages
Assignment_1 (1)
No ratings yet
Assignment_1 (1)
3 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Chapter 2 Data Issues
No ratings yet
Chapter 2 Data Issues
21 pages
lec01
No ratings yet
lec01
5 pages
Topic: Dimension Reduction With PCA: Instructions
No ratings yet
Topic: Dimension Reduction With PCA: Instructions
8 pages
Assigment I Questions IT402
No ratings yet
Assigment I Questions IT402
2 pages
(CS 402) Assignment 1 v00
No ratings yet
(CS 402) Assignment 1 v00
2 pages
Assessment 3-Group Assignment
No ratings yet
Assessment 3-Group Assignment
3 pages
Data preprocessing (1)
No ratings yet
Data preprocessing (1)
77 pages
DM Preprocessing Lec4,5
No ratings yet
DM Preprocessing Lec4,5
36 pages
OCS353_Review Questions
No ratings yet
OCS353_Review Questions
3 pages
Data - part 1
No ratings yet
Data - part 1
58 pages
Assignment
No ratings yet
Assignment
3 pages
MTech(DS) Sem-II Data Mining and Predictive Analytics_out
No ratings yet
MTech(DS) Sem-II Data Mining and Predictive Analytics_out
2 pages
dav end sem (1)
No ratings yet
dav end sem (1)
2 pages
Data Science and Big Data Analysis
No ratings yet
Data Science and Big Data Analysis
8 pages
B Tech-AIML-question bank-2 Answer Key
No ratings yet
B Tech-AIML-question bank-2 Answer Key
9 pages
Unit4
No ratings yet
Unit4
100 pages
PROJECT
No ratings yet
PROJECT
1 page
Project - Data Mining: Bank - Marketing - Part1 - Data - CSV
No ratings yet
Project - Data Mining: Bank - Marketing - Part1 - Data - CSV
4 pages
Data Mining Project - Abinaya John
No ratings yet
Data Mining Project - Abinaya John
42 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
DATA SCIENCE SAMPLE
No ratings yet
DATA SCIENCE SAMPLE
5 pages
DMC Lab Ex - 1 To 15 (31.03.2024)
No ratings yet
DMC Lab Ex - 1 To 15 (31.03.2024)
52 pages
B SC Programme / B SC Mathematical Science: Instructions For Candidates
No ratings yet
B SC Programme / B SC Mathematical Science: Instructions For Candidates
2 pages
Big Data Analytics Suggestion
No ratings yet
Big Data Analytics Suggestion
3 pages
3 - Assignment Question - Updated
No ratings yet
3 - Assignment Question - Updated
6 pages
Assignment 2 Slot8 TTS3208 Summer
No ratings yet
Assignment 2 Slot8 TTS3208 Summer
11 pages
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
From Everand
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
Sama Alshatali
No ratings yet
Apache Cassandra Developer Associate - Exam Practice Tests
From Everand
Apache Cassandra Developer Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Assessment Activity: Marketing Concepts: Aamer Adam
No ratings yet
Assessment Activity: Marketing Concepts: Aamer Adam
11 pages
Combination of Multiple Classifiers For The Customer's Purchase Behavior Prediction
No ratings yet
Combination of Multiple Classifiers For The Customer's Purchase Behavior Prediction
9 pages
Early Prediction of Market Success For New Grocery Products: - Louis A. Fourt and Joseph W. Woodlock
No ratings yet
Early Prediction of Market Success For New Grocery Products: - Louis A. Fourt and Joseph W. Woodlock
8 pages
Next-Item The Market Basket: Discovery Analysis
No ratings yet
Next-Item The Market Basket: Discovery Analysis
2 pages
Journal of Fashion Marketing and Management: An International Journal
No ratings yet
Journal of Fashion Marketing and Management: An International Journal
18 pages
Application of Predictive Analytics in Customer Relationship Mana
No ratings yet
Application of Predictive Analytics in Customer Relationship Mana
8 pages
HCPC Husson Josse
No ratings yet
HCPC Husson Josse
17 pages
Paper
No ratings yet
Paper
5 pages
A Market Basket Analysis Conducted With A Multivariate Logit Mod
No ratings yet
A Market Basket Analysis Conducted With A Multivariate Logit Mod
8 pages
Lariviere 2005
No ratings yet
Lariviere 2005
13 pages
Predicting Online Purchase Intentions For Clothing Products
100% (1)
Predicting Online Purchase Intentions For Clothing Products
15 pages
Market Basket Analysis in A Multiple Store Environment: Yen-Liang Chen, Kwei Tang, Ren-Jie Shen, Ya-Han Hu
No ratings yet
Market Basket Analysis in A Multiple Store Environment: Yen-Liang Chen, Kwei Tang, Ren-Jie Shen, Ya-Han Hu
16 pages
Lions Share en PDF
No ratings yet
Lions Share en PDF
16 pages
PCS-9794A X Instruction+Manual en Domestic+General X R1.00
No ratings yet
PCS-9794A X Instruction+Manual en Domestic+General X R1.00
56 pages
Student Information and Online Grade Viewing Application
No ratings yet
Student Information and Online Grade Viewing Application
3 pages
Multistage 2015
No ratings yet
Multistage 2015
36 pages
C70 Series VHF Antenna 360-720 Channel Broadband
No ratings yet
C70 Series VHF Antenna 360-720 Channel Broadband
1 page
Computer 2
No ratings yet
Computer 2
4 pages
Army CID Wikileaks Hunt
No ratings yet
Army CID Wikileaks Hunt
2 pages
Hillstone POC Setting Guide iNGFW StoneOS 5.5R6 V2.1
No ratings yet
Hillstone POC Setting Guide iNGFW StoneOS 5.5R6 V2.1
185 pages
PATH Technical Documentation
No ratings yet
PATH Technical Documentation
9 pages
IIT Kharagpur Note
No ratings yet
IIT Kharagpur Note
126 pages
48-V, 10-A, High-Frequency PWM, 3-Phase Gan Inverter Reference Design For High-Speed Motor Drives
No ratings yet
48-V, 10-A, High-Frequency PWM, 3-Phase Gan Inverter Reference Design For High-Speed Motor Drives
44 pages
Cyber Security
No ratings yet
Cyber Security
23 pages
Curriculum Vitae: Arton Krasniqi
No ratings yet
Curriculum Vitae: Arton Krasniqi
2 pages
Models - Mems.electrostatically Actuated Cantilever
No ratings yet
Models - Mems.electrostatically Actuated Cantilever
18 pages
Practical 2 Data Transfer
No ratings yet
Practical 2 Data Transfer
3 pages
Assignment Basic of Algebra Question
No ratings yet
Assignment Basic of Algebra Question
5 pages
15 - Ceragon - IP-10G EMS Backup
50% (2)
15 - Ceragon - IP-10G EMS Backup
26 pages
MA6459-Numerical Methods - Edited
No ratings yet
MA6459-Numerical Methods - Edited
12 pages
Shimadzu Atomic Absorption Aa-7000
No ratings yet
Shimadzu Atomic Absorption Aa-7000
8 pages
AFOQT Preparation Study Aids
No ratings yet
AFOQT Preparation Study Aids
3 pages
Quiz 1 - 2016 - Solutions
No ratings yet
Quiz 1 - 2016 - Solutions
4 pages
Final SLAM Report (Capstone Project)
No ratings yet
Final SLAM Report (Capstone Project)
41 pages
P 53-57
No ratings yet
P 53-57
5 pages
Cntlplan1 Week 9 Johnson Anderson
No ratings yet
Cntlplan1 Week 9 Johnson Anderson
7 pages
Markany Log
No ratings yet
Markany Log
3 pages
Balaji Telecom & Construction: Company Profile
No ratings yet
Balaji Telecom & Construction: Company Profile
11 pages
Faculty of Engineering and Technology: Object Oriented Programming /java Lab (CS-453)
No ratings yet
Faculty of Engineering and Technology: Object Oriented Programming /java Lab (CS-453)
16 pages
Biodiesel Technology and Applications Inamuddin all chapter instant download
100% (3)
Biodiesel Technology and Applications Inamuddin all chapter instant download
41 pages
Sensors and Actuators
100% (1)
Sensors and Actuators
10 pages

Data Mining Assignment 1

Uploaded by

Data Mining Assignment 1

Uploaded by

CS 536/CS 432 – Data Mining

1. Data Understanding, Visualization and Preprocessing (55 points)

You need to perform the following tasks:

CS 536 (Sp 17-18) – Dr. Mian Muhammad Awais Page 1 of 2

3. Dependency Analysis (30 points)

You are required to do the following tasks:

CS 536 (Sp 17-18) – Dr. Mian Muhammad Awais Page 2 of 2

You might also like