Assignment No1 - Modified
Assignment No1 - Modified
Assignment No. 1
Data Visualization
Table of Contents
Introduction.............................................................................................................................................................. 3
Dataset Specifications........................................................................................................................................... 3
Data Visualization................................................................................................................................................... 4
1. Histogram......................................................................................................................................................... 4
2. Scatter Plot....................................................................................................................................................... 5
2.1 Card1 and Card2..................................................................................................................................... 5
2.2 Card1 and Card6..................................................................................................................................... 5
2.3 Card4 and Card6..................................................................................................................................... 6
2.4 Card2 and Card4..................................................................................................................................... 6
3. Parallel Projects............................................................................................................................................. 7
4. Box Plot.............................................................................................................................................................. 8
4.1 Card1 (All Training Set – Both Classes)......................................................................................... 8
4.2 Card2 (All Training Set – Both Classes)........................................................................................ 8
4.3 Card1 (Class 1 i.e. isFarud=0)............................................................................................................ 9
4.4 Card1 (Class 2 i.e. isFarud=1)............................................................................................................ 9
4.5 Card2 (Class 1 i.e. isFarud=0)......................................................................................................... 10
4.6 Card2 (Class 2 i.e. isFarud=1)......................................................................................................... 10
5. Common user train and test................................................................................................................... 11
6. Unique user train and test....................................................................................................................... 11
7. No. of transaction vs Time....................................................................................................................... 11
8. First and last transaction (span).......................................................................................................... 12
9. Attributes Correlation: Highly correlated (Scatter plots)..........................................................13
10. Dissimilarity Index (with in same class) : Highlighted outliers in scatter plots. (Yes or
No)......................................................................................................................................................................... 14
11. Data Analysis.............................................................................................................................................. 15
11.1 PCA.......................................................................................................................................................... 15
11.2 LDA.......................................................................................................................................................... 15
Introduction
This report is submitted as solution to the assignment no. 1 (Data Visualization) of “Data
Mining” subject. The purpose of the submission is exercise various data visualization
techniques. “IEEE-CIS Fraud Detection - Can you detect fraud from customer transactions?”
dataset is used for the purpose. The essence of the dataset is to predict the probability that
an online transaction is fraudulent or not, as denoted by the binary target isFraud. We have
used RapidMiner Studio for data analysis and visualization.
Dataset Specifications
The dataset is relatively large.
No. of features
A snapshot of data describing both positive and negative examples is attached below.
Data Visualization
1. Histogram
The histogram for class label i.e. isFraud is attached below:-
2. Scatter Plot
Scatter plot for various combination of attributes are described below:-
3. Parallel Projects
4. Box Plot
4.1 Card1 (All Training Set – Both Classes)
11.2 LDA
YES 0.0348