0% found this document useful (0 votes)
9 views

ML 3

Uploaded by

Sai Abhishek
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

ML 3

Uploaded by

Sai Abhishek
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Machine Intelligence Applications Lab

Week – 5

Aim : Write a program to perform Exploratory Data Analysis on real time


datasets.
a) Univariate Analysis
b) Multivariate Analysis
c) Visualization using correlation matrix.

Description :
Exploratory Data Analysis :
The preliminary analysis of data to discover relationships between measures in the
data and to gain an insight on the trends, patterns, and relationships among various
entities present in the data set with the help of statistics and visualization tools is
called Exploratory Data Analysis (EDA).
Exploratory data analysis is cross-classified in two different ways where each method
is either graphical or non-graphical. And then, each method is either univariate,
bivariate or multivariate.

Univariate Analysis :
 Uni means one and variate means variable, so in univariate analysis, there is
only one dependable variable.
 The objective of univariate analysis is to derive the data, define and summarize
it, and analyze the pattern present in it.
 It is possible for two kinds of variables- Categorical and Numerical.
 Some patterns that can be easily identified with univariate analysis are Central
Tendency (mean, mode and median), Dispersion (range, variance), Quartiles
(interquartile range), and Standard deviation.

Bivariate Analysis :
 Bi means two and variate means variable, so here there are two variables. One
variable here is dependent while the other is independent.
 The analysis is related to cause and the relationship between the two variables.
Types :
1) Numerical and Numerical
2) Categorical and Categorical
3) Numerical and Categorical

Multivariate Analysis :
 Multivariate analysis is required when more than two variables have to be
analyzed simultaneously.
 It is a tremendously hard task for the human brain to visualize a relationship
among 4 variables in a graph and thus multivariate analysis is used to study
more complex sets of data.
 Types of Multivariate Analysis include Cluster Analysis, Factor Analysis,
Multiple Regression Analysis, Principal Component Analysis, etc.
 More than 20 different ways to perform multivariate analysis exist and which
one to choose depends upon the type of data and the end goal to achieve.

Implementation :-
Import Modules :-

Load the Dataset :-


Program :

Output :
Program :

Output :

Program :

Output :
Find the Duplicates :
Program :

Output :

Unique Values :-
Program :

Output :

Plotting Unique Values :-


Program :
21131A4426

Output :

Finding Null Values :-


Program :

Output :

Program :
21131A4426

Output :

Replace Null Values :-


Program :

Output :
21131A4426

Data Types :-
Program :

Output :

Filter the Data :-


Program :

Output :

Boxplot :-
21131A4426

Output :

Correlation :-
Program :

Output :
21131A4426

UNIVARIATE ANALYSIS :-
Importing Modules :

Histogram :-
Program :

Output :
Text(0.5, 1.0, 'Histogram')

Pie Chart :-
Program :
21131A4426

Output :

Density Plot :-
Program :

Output :

BIVARIATE ANALYSIS :-
Scatter Plot :-
Program :
21131A4426

Output :
<matplotlib.collections.PathCollection at 0x7832bf19ab30>

Violin Plot :-
Program :

Output :
<Axes: xlabel='TENURE', ylabel='CASH_ADVANCE_TRX'>

<Axes: xlabel='TENURE',

ylabel='PURCHASES_TRX'>
21131A4426

MULTIVARIATE ANALYSIS :-
Program :

Output :
Machine Intelligence Applications Lab
Week – 6

Aim : Write a program to perform Dimensionality Reduction using Principle


Component Analysis techniques on real time datasets.

Description :-
Principal Component Analysis :
As the number of features or dimensions in a dataset increases, the
amount of data required to obtain a statistically significant result increases
exponentially. This can lead to issues such as overfitting, increased
computation time, and reduced accuracy of machine learning models this is
known as the curse of dimensionality problems that arise while working with
high-dimensional data.
To address the curse of dimensionality, Feature Engineering techniques
are used which include feature selection and feature extraction. Dimensionality
Reduction is a type of feature extraction technique that aims to reduce the
number of input features while retaining as much of the original information as
possible.
One of the most popular dimensionality reduction technique is Principal
Component Analysis(PCA).

 PCA comes under the Unsupervised Machine Learning category. 


 Reducing the number of variables in a data collection while retaining as
much information as feasible is the main goal of PCA. PCA can be mainly
used for Dimensionality Reduction and also for important feature selection.
 Correlated features to Independent features.
Implementation :-
Import Modules :

Load the Dataset :


Program :

Output :

Select relevant features for Analysis :


Program :

Output :
Standardize the Data :
Program :

Apply PCA :
Program :

Create DataFrame :
Program :

Output :
Bar Plot :
Program :

Output :

Program :

Output :
Correlations :
Correlation is a statistical measure that indicates the extent to which two or
more variables fluctuate in relation to each other. A positive correlation
indicates the extent to which those variables increase or decrease in parallel; a
negative correlation indicates the extent to which one variable increases as the
other decreases.
Program :

Output :
<Axes: >

You might also like