0% found this document useful (0 votes)
109 views

Assignment No1 - Modified

The essence of the dataset is to predict the probability that an online transaction is fraudulent or not, as denoted by the binary target isFraud. We have used RapidMiner Studio for data analysis and visualization

Uploaded by

Mukhtar Ahmed
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views

Assignment No1 - Modified

The essence of the dataset is to predict the probability that an online transaction is fraudulent or not, as denoted by the binary target isFraud. We have used RapidMiner Studio for data analysis and visualization

Uploaded by

Mukhtar Ahmed
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Data Mining

Assignment No. 1

Data Visualization

..::: Submitted To :::..

Brig. Dr. Usman Akram

..::: Submitted By :::..

(CMS ID: 281112) (CMS ID: NNNNN)


NUST College of E&ME, Islamabad
Assignment No. 1: Data Visualization

Table of Contents

Introduction.............................................................................................................................................................. 3
Dataset Specifications........................................................................................................................................... 3
Data Visualization................................................................................................................................................... 4
1. Histogram......................................................................................................................................................... 4
2. Scatter Plot....................................................................................................................................................... 5
2.1 Card1 and Card2..................................................................................................................................... 5
2.2 Card1 and Card6..................................................................................................................................... 5
2.3 Card4 and Card6..................................................................................................................................... 6
2.4 Card2 and Card4..................................................................................................................................... 6
3. Parallel Projects............................................................................................................................................. 7
4. Box Plot.............................................................................................................................................................. 8
4.1 Card1 (All Training Set – Both Classes)......................................................................................... 8
4.2 Card2 (All Training Set – Both Classes)........................................................................................ 8
4.3 Card1 (Class 1 i.e. isFarud=0)............................................................................................................ 9
4.4 Card1 (Class 2 i.e. isFarud=1)............................................................................................................ 9
4.5 Card2 (Class 1 i.e. isFarud=0)......................................................................................................... 10
4.6 Card2 (Class 2 i.e. isFarud=1)......................................................................................................... 10
5. Common user train and test................................................................................................................... 11
6. Unique user train and test....................................................................................................................... 11
7. No. of transaction vs Time....................................................................................................................... 11
8. First and last transaction (span).......................................................................................................... 12
9. Attributes Correlation: Highly correlated (Scatter plots)..........................................................13
10. Dissimilarity Index (with in same class) : Highlighted outliers in scatter plots. (Yes or
No)......................................................................................................................................................................... 14
11. Data Analysis.............................................................................................................................................. 15
11.1 PCA.......................................................................................................................................................... 15
11.2 LDA.......................................................................................................................................................... 15

Submitted By: Muhammad Waqas Ahmad Page 2 of 12


Assignment No. 1: Data Visualization

Introduction
This report is submitted as solution to the assignment no. 1 (Data Visualization) of “Data
Mining” subject. The purpose of the submission is exercise various data visualization
techniques. “IEEE-CIS Fraud Detection - Can you detect fraud from customer transactions?”
dataset is used for the purpose. The essence of the dataset is to predict the probability that
an online transaction is fraudulent or not, as denoted by the binary target isFraud. We have
used RapidMiner Studio for data analysis and visualization.

Dataset Specifications
The dataset is relatively large.

Training Data Test data

No. of features

No. of records 590,540 506,691

No. of Positive Examples 20,663 ?

No. of Negative Examples 569,877 ?

A snapshot of data describing both positive and negative examples is attached below.

Submitted By: Muhammad Waqas Ahmad Page 3 of 12


Assignment No. 1: Data Visualization

Submitted By: Muhammad Waqas Ahmad Page 4 of 12


Assignment No. 1: Data Visualization

Data Visualization
1. Histogram
The histogram for class label i.e. isFraud is attached below:-

Submitted By: Muhammad Waqas Ahmad Page 5 of 12


Assignment No. 1: Data Visualization

2. Scatter Plot
Scatter plot for various combination of attributes are described below:-

2.1 Card1 and Card2

2.2 Card1 and Card6

Submitted By: Muhammad Waqas Ahmad Page 6 of 12


Assignment No. 1: Data Visualization

2.3 Card4 and Card6

2.4 Card2 and Card4

Submitted By: Muhammad Waqas Ahmad Page 7 of 12


Assignment No. 1: Data Visualization

3. Parallel Projects

Submitted By: Muhammad Waqas Ahmad Page 8 of 12


Assignment No. 1: Data Visualization

4. Box Plot
4.1 Card1 (All Training Set – Both Classes)

4.2 Card2 (All Training Set – Both Classes)

Submitted By: Muhammad Waqas Ahmad Page 9 of 12


Assignment No. 1: Data Visualization

4.3 Card1 (Class 1 i.e. isFarud=0)

4.4 Card1 (Class 2 i.e. isFarud=1)

Submitted By: Muhammad Waqas Ahmad Page 10 of 12


Assignment No. 1: Data Visualization

4.5 Card2 (Class 1 i.e. isFarud=0)

4.6 Card2 (Class 2 i.e. isFarud=1)

Submitted By: Muhammad Waqas Ahmad Page 11 of 12


Assignment No. 1: Data Visualization

5. Common user train and test


No. of records common in both training and test data: 52762

6. Unique user train and test.


No. of unique records in Training Data: 348394

No. of unique records in Test Data: 280310

7. No. of transaction vs Time.

Submitted By: Muhammad Waqas Ahmad Page 12 of 12


Assignment No. 1: Data Visualization

8. First and last transaction (span)

Submitted By: Muhammad Waqas Ahmad Page 13 of 12


Assignment No. 1: Data Visualization

9. Attributes Correlation: Highly correlated (Scatter plots).


Here we have shown different attributes mutual correlation, which not such a strong one but at some
extent a pattern is found.

Figure: Card2 and Transaction Amount correlation

Submitted By: Muhammad Waqas Ahmad Page 14 of 12


Assignment No. 1: Data Visualization

Figure: Card2 and Card1 correlation

Submitted By: Muhammad Waqas Ahmad Page 15 of 12


Assignment No. 1: Data Visualization

10. Dissimilarity Index (with in same class): Highlighted outliers in scatter


plots. (Yes or No)
Here we have shown some un-correlated attributes, which have almost no relation with respect to fraud
or no fraud class.

Figure: Card1 and card4=Visa correlation

Submitted By: Muhammad Waqas Ahmad Page 16 of 12


Assignment No. 1: Data Visualization

Figure: Card2 and card6=Debit correlation

Submitted By: Muhammad Waqas Ahmad Page 17 of 12


Assignment No. 1: Data Visualization

Figure: card4=Master Card and Transaction Amount correlation

Submitted By: Muhammad Waqas Ahmad Page 18 of 12


Assignment No. 1: Data Visualization

11. Data Analysis


11.1 PCA

Submitted By: Muhammad Waqas Ahmad Page 19 of 12


Assignment No. 1: Data Visualization

Submitted By: Muhammad Waqas Ahmad Page 20 of 12


Assignment No. 1: Data Visualization

Submitted By: Muhammad Waqas Ahmad Page 21 of 12


Assignment No. 1: Data Visualization

11.2 LDA

Linear Discriminant Model Result:


Apriori probabilities:

isFraud (Class) Probability


NO 0.9652

YES 0.0348

Submitted By: Muhammad Waqas Ahmad Page 22 of 12

You might also like