Puting Assignment
Puting Assignment
Student’s Name
Institution
Question 1
a)
b) Mean difference is the absolute measure of difference between two groups. The “no” and
“yes” group means are 5.46809 and 5.33962 respectively. The absolute mean between the two
groups is 0.12846 (| x 1 - x 2|= 0.12846). This information is very useful in hypothesis testing where
by, the null hypothesis is to hypothesizes that the two means are equal against alternative that
they are not equal.
Question 2
a)
b) From the table above, it can be concluded that the proportion of they who are afraid of math
and are chatty 34.09% of a total of 44. This argument can be extended to the direct group.
Difference between sample proportion can be used in quantifying a given hypothesis. In this case
the null hypothesis would be, there is no difference between the preferred style regarding the
“scare of math”. Difference of - 0.06980519, ( ^p1 - ^p2 ¿ obtained would lead us to the conclusion
of not accepting the null hypothesis but rather upheld the alternative hypothesis.
Question 3
a)
120
100
60
Y
40
20
0
200 400 600 800 1000 1200 1400 1600 1800
X
b) A correlation of about -0.738692377shows that the predictor variable and outcome variable
are highly and negatively related. That is an increase in the “engagement score?” variable would
lead to decrease in “duration?” variable.
y = -0.0171x + 96.058
= -0.0171*(600) + 96.058
=85.798
Question 4
Question 5
^p n (1− ^pn )
=¿ ¿ - ^p y ¿ ± Z score √ ¿ + )
n^ n
0.34(1−0.34)
=(0.65−0.34) ± 1.645 √ ¿+ )
15
= (0.06158, 0.5584)
a)
^p n (1− ^pn )
=¿ ¿ - ^p y ¿ ± Z score √ ¿ + )
n^ n
0.41(1−0.41)
=(0.58−0.41) ± 1.645 √ ¿+ )
23
= (-0.05008, 0.39008)
Question 6
c) Go back to the dataset summarizer and scroll down , Paste in the output for question 6c given
below the inferential statistics and fill in the blank , replace the blank with a number that would
make the p-value lower than the p-value in question 6a
Inferential statistics
Estimate of the difference between population means
xbar1-xbar2
0.12846
standard error of estimate xbar1-xbar2
0.59719
t test stat df two sided pvalue
0.21511 95 0.83014
To calculate the p-value H0:μ1=μ2 is assumed to be true
since the test is two sided H1 is H1:μ1≠μ2
b) From the part (a) above, a p_value of 0.83014 leads us to the conclusion that the two-
population means are significantly different.
c)
Question 7
a)
Inferential statistics
n1 n2 phat 1 phat 2
44 56 0.34091 0.410714286
Estimate of the difference between population proportions
phat1-phat2
-0.06980519
standard error of estimate test stat two sided pvalue
0.097783886 -0.713872171 0.475306227
To calculate the p-value H0:p1=p2 is assumed to be true
since the test is two sided H1 is H1:p1≠p2
c)
no yes total
chatty count 12 32 44
direct count 33 23 56
Question 8
a)
Inferential statistics
paste this into the word file and add comments
correlation r -0.7386924
R square 0.5456664
standard error of slope 0.0015771
test stat of slope -10.848988
two sided p-value for slope 1.756E-18
To calculate the p-value H0:population slope =0 is assumed to be true
since the test is two sided H1 is H1:population slope ≠0
b) From the output above, a correlation of -0.7386924 signifies a high negative relationship
between the two population. The R square of about 0.5457means that about 54.57% of variation
in outcome variable is explained by the predictor variable.
c) lower pvalue
Question 9
The sample report is an account of a trial of sales commission at XYZ furniture store. A research
is conducted to establish the effect of individual and shared commissions on average sales. The
furniture store franchise XYZ gives individual commissions to sales staff in store 1 and shared
commissions to sales staff in store 2.
The dataset from both stores has details on the customer ID, the time when the sale was made -
before or after the commission, the amount purchased - the value of the sale, and national
average spending. From the dataset, one can group the customers purchases and national average
spending on basis of the time when the purchase was made in both stores. The data is analysed
by taking average and standard deviation of the amount purchased before and after the
commissions as well as the national spending. Individual commissions are better than shared
commissions since it causes a higher increase in average sales. The findings of the trial show that
both commissions lead to an increase in sales.
The report highlights the importance of hypothesis testing which helps in measuring the evidence
of claims or assumptions made before making inference. A sample is used and if the results from
the hypothesis test are significant inference can be made to the whole population. To establish
the significance of the differences observed in the differences, the test statistic used in the
hypothesis testing is a t-test is used. From the t-test, there is evidence of significance relationship
between individual and shared commissions on average sales. It is also noted that there is a
significance difference in the increase due to individual commissions as compared to shared
commissions. Thus, XYZ furniture store should employ individual commissions to sales staff if
it needs to increase its average sales significantly.
Question 10
P-values can be used to establish whether there is a strong or no relationship between variables.
In this summary the variables of interest are “rent” and “location Sydney or not Sydney” Several
samples are obtained and the p-value testing for each sample obtained. The p-value will form a
distribution. For the null hypothesis, the p-value is uniformly distributed with values between 0
and 1. It has 5 % probability of being less than 0.05 and 10% probability 0f being less than 0.1.
When there is a big difference in the variables of interest, the relationship between them is
strong. This is because when the difference between means is big, most of the samples obtained
from the same population will also have big differences, hence a strong evidence of difference
between population means and consequently low p-values. The distribution of p- values in the
first case 75% of the 2600 samples had a p-value less than 0.0004, 50% of the 2600 samples had
a p-value less than 0.0001 and 25% of the samples had a p-value less than 5E-08. If one variable
is reduced the difference in means of the variables reduces as well. This increases the p-value.
The distribution of the p-values is anti-conservative. Most values or the peak is close to zero.
When the differences between means is almost insignificant this leads to high p-value so there is
no relationship between variables in the population. In such a case the distribution of the p-
values will be uniform. That is, almost 75% of the p-values will be less than 0.75, almost 50% of
p-values will be less than 0.5 and 25% of the p-values will be less than 0.25. The percentage is
proportional to value of p-values. The distribution of p-values in the second case is a uniform
distribution.