ML 3
ML 3
Week – 5
Description :
Exploratory Data Analysis :
The preliminary analysis of data to discover relationships between measures in the
data and to gain an insight on the trends, patterns, and relationships among various
entities present in the data set with the help of statistics and visualization tools is
called Exploratory Data Analysis (EDA).
Exploratory data analysis is cross-classified in two different ways where each method
is either graphical or non-graphical. And then, each method is either univariate,
bivariate or multivariate.
Univariate Analysis :
Uni means one and variate means variable, so in univariate analysis, there is
only one dependable variable.
The objective of univariate analysis is to derive the data, define and summarize
it, and analyze the pattern present in it.
It is possible for two kinds of variables- Categorical and Numerical.
Some patterns that can be easily identified with univariate analysis are Central
Tendency (mean, mode and median), Dispersion (range, variance), Quartiles
(interquartile range), and Standard deviation.
Bivariate Analysis :
Bi means two and variate means variable, so here there are two variables. One
variable here is dependent while the other is independent.
The analysis is related to cause and the relationship between the two variables.
Types :
1) Numerical and Numerical
2) Categorical and Categorical
3) Numerical and Categorical
Multivariate Analysis :
Multivariate analysis is required when more than two variables have to be
analyzed simultaneously.
It is a tremendously hard task for the human brain to visualize a relationship
among 4 variables in a graph and thus multivariate analysis is used to study
more complex sets of data.
Types of Multivariate Analysis include Cluster Analysis, Factor Analysis,
Multiple Regression Analysis, Principal Component Analysis, etc.
More than 20 different ways to perform multivariate analysis exist and which
one to choose depends upon the type of data and the end goal to achieve.
Implementation :-
Import Modules :-
Output :
Program :
Output :
Program :
Output :
Find the Duplicates :
Program :
Output :
Unique Values :-
Program :
Output :
Output :
Output :
Program :
21131A4426
Output :
Output :
21131A4426
Data Types :-
Program :
Output :
Output :
Boxplot :-
21131A4426
Output :
Correlation :-
Program :
Output :
21131A4426
UNIVARIATE ANALYSIS :-
Importing Modules :
Histogram :-
Program :
Output :
Text(0.5, 1.0, 'Histogram')
Pie Chart :-
Program :
21131A4426
Output :
Density Plot :-
Program :
Output :
BIVARIATE ANALYSIS :-
Scatter Plot :-
Program :
21131A4426
Output :
<matplotlib.collections.PathCollection at 0x7832bf19ab30>
Violin Plot :-
Program :
Output :
<Axes: xlabel='TENURE', ylabel='CASH_ADVANCE_TRX'>
<Axes: xlabel='TENURE',
ylabel='PURCHASES_TRX'>
21131A4426
MULTIVARIATE ANALYSIS :-
Program :
Output :
Machine Intelligence Applications Lab
Week – 6
Description :-
Principal Component Analysis :
As the number of features or dimensions in a dataset increases, the
amount of data required to obtain a statistically significant result increases
exponentially. This can lead to issues such as overfitting, increased
computation time, and reduced accuracy of machine learning models this is
known as the curse of dimensionality problems that arise while working with
high-dimensional data.
To address the curse of dimensionality, Feature Engineering techniques
are used which include feature selection and feature extraction. Dimensionality
Reduction is a type of feature extraction technique that aims to reduce the
number of input features while retaining as much of the original information as
possible.
One of the most popular dimensionality reduction technique is Principal
Component Analysis(PCA).
Output :
Output :
Standardize the Data :
Program :
Apply PCA :
Program :
Create DataFrame :
Program :
Output :
Bar Plot :
Program :
Output :
Program :
Output :
Correlations :
Correlation is a statistical measure that indicates the extent to which two or
more variables fluctuate in relation to each other. A positive correlation
indicates the extent to which those variables increase or decrease in parallel; a
negative correlation indicates the extent to which one variable increases as the
other decreases.
Program :
Output :
<Axes: >