0% found this document useful (0 votes)
56 views4 pages

ADVANCED DATA ANALYTICS TUTORIAL PRACTICE QUESTIONS Session 2

The document contains 5 tasks related to advanced data analytics concepts. Task 1 explains the 4 V's of big data and types of t-tests. Task 2 involves hypothesis testing on mean claims and visiting times. Task 3 examines the relationship between TV sales and advertisements through scatter plots, correlation, and linear regression. Task 4 defines parametric and non-parametric tests and gives examples. Task 5 involves a one-way ANOVA test to compare earnings across company groups.

Uploaded by

bida22-016
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views4 pages

ADVANCED DATA ANALYTICS TUTORIAL PRACTICE QUESTIONS Session 2

The document contains 5 tasks related to advanced data analytics concepts. Task 1 explains the 4 V's of big data and types of t-tests. Task 2 involves hypothesis testing on mean claims and visiting times. Task 3 examines the relationship between TV sales and advertisements through scatter plots, correlation, and linear regression. Task 4 defines parametric and non-parametric tests and gives examples. Task 5 involves a one-way ANOVA test to compare earnings across company groups.

Uploaded by

bida22-016
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

ADVANCED DATA ANALYTICS TUTORIAL QUESTIONS

Task 1
With the aid of Examples, Identify and briefly explain the 4V’s in big data.
List three types of applications of the T-Test
The following data represents hemoglobin values in gm/dl for 10 patients: gm/dl for 10 patients:
Perform a T-test to determine whether the mean value for patients significantly differ from the
mean value of general population (12 gm/dl). Evaluate the role of chance.

Hemoglobin 10 9 6.5 8 11 7 7.5 8.5 9.5 12


values (gm\dl)
Table
Compute the mean of the patients
Compute the sample variance of the patience
Compute the t value
Determine the critical value of the test statistic at α =0.05 level
Is there statistical difference between the mean weight of males and females at α=0.05.
Task 2a
An insurance company is reviewing its current policy rates. When originally setting the rates they
believed that the average claim amount will be maximum P180000. They are concerned that the
true mean is actually higher than this, because they could potentially lose a lot of money. They
randomly select 40 claims, and calculate a sample mean of P195000. Assume that the standard
deviation of claims is P50000 and set α= .05.test to see if the insurance company should be
concerned or not by answering the following sub questions.
State the null and alternative hypothesis.
Compute the z value
Determine the critical value of the test statistic at α =0.05
Draw a conclusion on whether the insurance company should be concerned or not.
Task 2b
The manager of a large shopping mall in Knysna believes that visitors to the mall spend, on average,
85 minutes in the mall on any one occasion. To test this believes the manager commissioned a
study which found out that, from a random sample of 132 visitors to the mall, the average visiting
time was 80.5 minutes. Assume a population standard deviation of 25minutes and that visiting time
is approximately normally distributed.
Formulate the null and alternative hypothesis for this test situation
Give a brief discussion on when is it most suitable to use a Z-test and when is it most
suitable to use a T-test.
Which test statistic (z or t) is appropriate for the tests described above and Why?
Conduct a hypothesis test for a single mean at the 5% significance level to support or refute the
managers belief.
What management conclusion would be drawn from the findings above?

Task 3
Century Office Supplies, an electronic retail company in Francistown, has recorded the
number of flat-screen TVs sold each week and the number of advertisements placed weekly
for a period of 12 weeks.

Table: Database of flat-screen sales and newspaper advertisement placed

Advertisements 4 4 3 2 5 2 4 3 5 5 3 4

Sales 26 28 24 18 35 24 36 25 31 37 30 32

Use table above to answer the following questions:


Construct a scatter plot of TV sales against advertisements

Calculate the correlation coefficient r.

Comment on the strength and direction of the association between TV sales and
advertisement.

Find the linear regression model that Century Office Supplies can use estimate sales for
each week, based on number of advertisements placed.

Plot the line representing your model above.

Estimate the mean sales of flat -screen TVs when three advertisements are placed.

Compute the residuals when 3 advertisements are placed.


Briefly discuss any two assumptions underlying the use of linear regression model.

Task 4
What is a parametric test and what is the difference between a parametric test and a
non-parametric test?
Give two examples of non-parametric tests and two examples of parametric tests
Identify and discuss three different use cases of the Chi-Squares Test.

The table below shows the number of accidents that occurred throughout the week on A1 road.
Day ofSun Mon Tue Wed Thursday Friday Sat
Week
Number of11 13 14 13 15 14 18
Accidents
Table
You are required to use the data in in table 1.0 to test whether the accidents occur uniformly
over the week. You should answer the following question in order to achieve your goal:
State the null and alternative hypothesis
Determine the tabulated Chi-Square value at the 5% confidence level
Compute the Chi-Square value
State your conclusion

Task 5

Table below shows data on a random sample on average earnings per share (EPS) for
commercial banks, retailing companies, transport and logistic companies and utility
companies (variable Companies) for the same time period in 2019. You are required to
carry out one-way analysis of variance ANOVA) test to determine if there are any
differences in the mean earning per share among the 4 different groups of companies.
Answers the sub questions below to perform the ANOVA test.

commercial banks, retailing companies transport& logistics Utility(variable )


42 48 68 80
50 66 52 94
62 68 76 78
34 78 64 82
52 70 70 66

State the null and alternative hypothesis


Compute Total sum of Squares.
Compute between sample sum of squares
Compute within sample sum of Squares
Compute the associated Mean Square terms
Compute the F-Ratio
Construct and complete an ANOVA table
Determine the tabulated F value at α = 0.05.
Determine if there are any differences in the mean earning per share among the 4 different groups
of companies.

You might also like