Exploratory Data Analysis:: Salarydata - CSV
Exploratory Data Analysis:: Salarydata - CSV
Descriptive Statistics:
Check for null values:
Therefore we can conclude that there is no null values
Descriptive Statistics for the dataset:
Question 1.1 State the null and the alternate hypothesis for
conducting one-way ANOVA for both Education and Occupation
individually.
Null and the alternate hypothesis for conducting one-way ANOVA
for Education
𝜇� = Mean salary of Doctorate
𝜇𝐵 = Mean salary of Bachelors
𝜇C = Mean salary of HS-grad
�0: 𝜇� = 𝜇𝐵 = 𝜇C
��: 𝜇� ≠ 𝜇𝐵 ≠ 𝜇C
Level of significance (Alpha) = 0.05
N = 40
Null and the alternate hypothesis for conducting one-way ANOVA
for Occupation
𝜇� = Mean salary of Prof-specialty
𝜇𝐵 = Mean salary of Sales
𝜇C = Mean salary of Adm-clerical
𝜇D = Mean salary of Exec-managerial
�0: 𝜇� = 𝜇𝐵 = 𝜇C = 𝜇D
��: 𝜇� ≠ 𝜇𝐵 ≠ 𝜇C ≠ 𝜇D
Level of significance (Alpha) = 0.05
N = 40
F value: 30.95628
PR value: 1.257709e-08
Therefore PR value < alpha, hence we have evidence to reject the
null hypothesis and so we accept the alternate hypothesis that is 𝜇� ≠
𝜇𝐵 ≠ 𝜇C
PR value is 1.257709e-08 and it is lesser than 5% level of significance
So the statistical decision is accepting the alternate hypothesis at 5%
level of significance.
So at 95% confidence level, there is sufficient evidence to prove that
mean salary of Doctorate is not equal to mean salary of Bachelors is
not equal to mean salary of HS-grad
Problem 1B:
Question 1.5 What is the interaction between two treatments?
Analyze the effects of one variable on the other (Education and
Occupation) with the help of an interaction plot
To find the interaction between the two variables Education and
Occupation, I have plotted the point-plot graph of both Education and
Occupation with respect to the salary
Where we can very clearly see that there seems to be very less or
almost no interaction amongst the two categorical variables and the
salary differ a lot with respect to their Education and Occupations.
But, in case of individuals with Bachelors education and working as
Sales and Exec-managerial earns almost the same amount of salary
Also to check the interaction between Education and Occupation I
have done an interaction effect test which is a two-way ANOVA based
on Salary with respect to both Education and Occupation along with
their interaction Education*Occupation
As per my output from python:
As Education and Occupation interaction is 2.232500e-05 which is
less than 0.05, there seems to be some statistical interaction.
F value: 8.519815
PR value: 2.232500e-05
Therefore PR value < alpha, hence we have evidence to reject the
null hypothesis and so we accept the alternate hypothesis that is there
is interaction between Education and Occupation
After scaling:
So after scaling the mean of all the variables tends to 0 and standard
deviation tends to 1
Question 2.4 Check the dataset for outliers before and after
scaling. What insight do you derive here?
Before scaling:
After scaling:
Question 2.5 Perform PCA and export the data of the Principal
Component scores into a data frame.
As per my output from python:
-2.73993248e-02]
-1.27528369e-01]
-1.80558174e-02]
4.57358763e-02]
-1.58456329e-01]
7.88826178e-02]
-3.58599650e-02]
-5.57302446e-01]
1.05909326e-01]
-4.85902891e-02]
-9.33480418e-03]
1.86806043e-01]
-2.64231329e-01]
2.33516509e-01]
5.84510444e-02]
6.64009736e-01]
1.41880695e-01]]
Question 2.7 Write down the explicit form of the first PC (in
terms of the eigenvectors. Use values with two places of decimals
only).
To find the first PC, Sort the eigenpairs in descending order of
eigenvalues and select he one with the largest value. This is the first
principal component that covers the maximum information from the
original data
As per my output from python:
Decending order of eigenvalues:
5.643078412365666
4.829736719876616
1.1003064401877618
0.9966849033178099
0.8977433005683855
0.7654920479750724
0.5870956473731336
0.5545035789961089
0.443192911646843
0.3822264051083322
0.24563728544474883
0.1468449578810244
0.1360384406428
0.12376405766765952
0.07466870608065557
0.0559799215601786
0.03891347980205039