0% found this document useful (0 votes)

25 views

Adstat Final Exam Reviewer2highlighted

Uploaded by

Justin Pimentel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

Adstat Final Exam Reviewer2highlighted

Uploaded by

Justin Pimentel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Adstat Final Exam Reviewer Note: d0 is the hypothesized difference.

It is
equal to zero if the two population means is
conjectured to be equal.
MODULE 4A - Testing the
difference between two means
using the z-test Formula for the test statistic Z

Tests of difference between

population means
If σ1 and σ2 are If σ1 and σ2 are
known unknown
• Use z-test • Use pooled
• Use normal variance test
distribution • Use separate-
variance t-test
• Use Student’s Example 1
t-distribution A survey found that the average hotel
room rate in Makati City is P4,420 while the
average room rate in Manila is P4,030.
ASSUMPTIONS FOR Z-TEST Assume that the data were obtained from
1. The samples are random samples two samples of 50 hotels each and that the
2. The sample data are independent of standard deviations of the populations are
one another P281 and P241, respectively. At 𝛼 = 0.05,
3. The standard deviations of both can it be concluded that there is a significant
populations must be known difference in the rates?
4. If the sample sizes are less than 30,
the populations must be normal or
approximately normally distributed

Traditional Method
1. State the null and alternative
hypotheses. Identify the claim. It is
necessary to identify group 1 and
group 2.
2. Test for normality if sample sizes are
less than 30. If satisfied, compute the
test statistic, 𝑧.
3. Find the critical value.
4. Make the decision.
5. Summarize the results.

Two sample tests for the Mean:

Independent populations
Two population means, Independent
Samples.
are equal or unequal. Compute the
test statistic, 𝑡.
3. Find the critical value.
4. Make the decision.
5. Summarize the results.

Equal or Unequal Variances

• One type of t-test assumes that the
variances of the two populations are
equal. This is called the pooled-
variance t-test.
• Another type of t-test assumes that
the variances of the two populations
are not equal. This is called the
separate-variance t-test or
Welch’s t-test.
• The F test for the equality of
population variances may be used to
determine which tests (pooled or
separate variance) should be used.

Assumptions for the two independent

*Example 2 ginamitan ng excel di ko na sample t-Test with unknown population
nilagay pero if u want reviewhin look variances
module 4a slide 15. • Samples are randomly and
independently drawn.
• If sample size of either or both groups
is less than 30, populations are
Module 4B - Test of difference
normally distributed.
between means of two • Large sample sizes (at least 30 for
independent groups using t- each group)
test

Two sample tests for the Mean:

Independent populations
1. State the null and alternative
hypotheses. Identify the claim.
2. Determine if the population variances
Example #1 (Traditional Method)
A researcher wishes to determines who talks
more – men or women. Random samples of
56 men and 56 women from a large
university were equipped with a small device
that secretly records sounds for a random 30
seconds during each 12.5-minute period
over two days. Then, the number of words
spoken by each subject were counted during
each recording period and were estimated. Example #2 (Traditional Method)
The female estimates had a mean of 16,177
A study was made to determine if a statistics
words per day with a standard deviation of
course is better understood when a lab
7520 words per day. For the male estimates,
constitutes part of the course. Students were
the mean was 16,569 and the standard
allowed to choose between section A (4-
deviation was 9108 words per day. Do these
semester-hour course with lab) and section
data provide convincing evidence of a
B (3-semester-hour course without lab). The
difference in the average number of words
11 students who enrolled in section A got an
spoken in a day by male and female
average grade of 85 with a standard
students at the university? Assume unequal
deviation of 4.7 while the 17 students who
population variances. Use 0.05 level of
enrolled in section B got an average of 79
significance.
with a standard deviation of 6.1. Would you
say that the laboratory course increases the
average grade by as much as 8 points? Use
0.01 level of significance and assume the
populations to be approximately normal with
equal variances

Note: μd is the mean of the population

differences of the matched pairs. On the
other hand, 𝑑0 is the hypothesized value
of μd

Assumptions for the two dependent

sample t-Test with unknown population
variances
• The samples are random.
• The samples are dependent or
matched.
• When the sample size or sample sizes
are less than 30, the population or
populations must be normally or
approximately distributed.
Note: In the succeeding problems, the
above-mentioned assumptions are met.
However, in other situations, the
assumptions must be checked before
proceeding.

*Example #3 ginamitan ng Excel kaya di

ko na nilagay dito pero if u wanna Example #1 using the Traditional
review check module 4b slide 19. method
As an aid for improving students’ study
habits, nine students were randomly selected
Module 4C - Test of difference to attend a seminar on the importance of
between two means: education in life. The table shows the
number of hours each student studied per
Dependent groups
week before and after the seminar. At 𝛼 =
0.10, did attending the seminar increase the
number of hours the students studied per
The hypotheses when samples are
week?
dependent are as follows:
*Example #2 ginamitan ng Excel/JASP
kaya di ko na nilagay dito pero if u
wanna review check module 4c slide 13.

Module 5a – Chi square

distribution

The Chi-square Distribution

• The Chi-square distribution is a
continuous probability distribution. It
is the distribution of a sum of the
squares of k independent standard
normal random variables.
• In the Chi-square distribution, as the
degrees of freedom increases, the Chi-
square distribution approaches a
normal distribution

Characteristics of a Chi-square
Distribution:
1. The chi-square distribution is a family
of curves based on the degrees of
freedom.
2. The chi-square distributions are
positively skewed.
3. All chi-square values are greater than
or equal to zero.
4. The total area under each chi-square
distribution is equal to one.
General Assumptions of a Chi-square
Distribution:
1. The sample was chosen using a
random sampling method.
2. The variables being analyzed are
categorical (nominal or ordinal).
3. All chi-square values are greater than
or equal to zero.
4. The total area under each chi-square
distribution is equal to one.
Chi-square Goodness-of-Fit Test //kung
may pattern
The Chi-square Distribution can be used also known as the chi-square goodness-
to of-fit test is used if we would like to see
• find a confidence interval for a whether the distribution of data follows a
variance or standard deviation specific pattern.
• test a hypothesis about a single For example:
variance or standard deviation • You would like to see whether the
• test concerning frequency distributions values obtained from an actual
• test the goodness-of-fit test observation on the monthly dividend
• test for independence of two in stocks differ considerably from the
categorical variables expected value.
• test the homogeneity of proportions • You may want to investigate whether
• test the normality of the variable the fluctuation on the interest rates
during Sundays is higher than the rest
of the days in a week.
Chi-square Test: //if may pinagkaiba
A chi-square test (or chi-squared test),
denoted by x2 is statistical hypothesis test Chi-square Test of Independence //if
• used to investigate whether related sila sa isat isa
distributions of categorical variables • can be used to test the independence
(at the nominal or ordinal levels of of two variables
measurement) significantly differ from • Is used when we would like to see:
one another. • whether or not two random
• commonly used to compare observed variables take their values
data (actual value) with data we independently.
would expect (expected value) to • whether the value of one relates
obtain according to a specific with another.
hypothesis. • whether one variable is associated
• used to test information about the with another
proportion or percentage of people or • this test of hypothesis use the chi-
things who fit into a category square distribution and the
contingency table.
For example:
• based on the distribution of data, you
want to see whether the success of an
individual in his chosen career is
independent or relates with his
academic performance in college.
Here, the two variables involved are
the success of an individual in his
chosen career and his academic
performance in college.
• you may want to see whether the life To illustrate the use of chi-square test:
in years of laptops is independent of
brand. Here, the two variables If, according to Mendel's laws, you expect 10
involved are the life in years of laptop of 20 offspring to be male and the actual
and the brand of laptops. observed number was 8 males, then you
• A study which involves on determining might want to know about the "goodness-of-
if job satisfaction can be associated fit" between the observed and expected
with income. The two variables are job data.
satisfaction and income. Were the deviations (differences between
observed and expected value) the result of
chance, or were they due to other factors?
Chi-square Test for Homogeneity of How much deviation can occur before we
Proportions conclude that something other than chance
• can also be used to test the is at work, causing the observed to differ
homogeneity of proportions. from the expected value.
• this is used to determine whether the The chi-square test is always testing what
proportions for a variable are equal scientists call the null hypothesis, which
when several samples are selected states that there is no significant
from different populations. difference between the expected and
• this also use the chi-square observed result.
distribution and the contingency table.
For Example:
• You would like to see if the Test for Goodness-of-Fit
proportions of each group of students The chi-square goodness-of-fit test is used
who play online gaming are equal to test the claim that an observed frequency
based on their program of affiliation, distribution fits some given expected
say proportions of accountancy frequency distribution.
students, engineering students, and Assumptions of Chi-square Goodness-
architecture students who play online of-Fit Test:
gaming. 1. The data are obtained from a random
• You may want to see if the proportions sample.
of employees who are in to stock 2. The expected frequency for each
market are equal based on the nature category must be 5 or more.
of their profession (IT, Medicine, • If the observed frequencies are close
Accounting, Engineering). to the corresponding expected
frequencies, the X2-value will be small,
indicating a good fit.
Two main types of Chi-square Tests to
• If the observed frequencies differ
be discussed here are:
considerably from the expected
Goodness-of-fit Tests of
frequencies, the x2-value will be large
tests which focus on independence
one categorical which focus on the and the fit is poor.
variable. relationship between • A good fit leads to the non-rejection of
two categorical H0 , whereas a poor fit leads to its
variables. Thus, the rejection.
contingency table (or
cross tabulation table
will be used to To calculate the expected frequencies,
present the data there are two rules to follow
values). 1. .If all the expected frequencies are
equal, the expected frequency E can
be calculated by using E =n/k, where Are these differences significant? (Which
n is the total number of observations means, there is a difference in the life span
and k is the number of categories. of the batteries for each category) or will it
2. .If all the expected frequencies are not be due to chance only? Thus, the two
equal, then the expected frequency E opposing statements are necessary before
can be calculated by E = n●p, where computing the test value, the null and
n is the total number of observations alternative hypotheses. Here, the null
and p is the probability for that hypothesis indicates that there is no
category(or p is the hypothesized difference or change among the categories.
proportion from the null hypothesis) Ho : There is no difference in the life span of
laptop batteries among three categories.
H1 : There is difference in the life span of
laptop batteries among three categories.

Summary Procedures in conducting Chi-

Squared Goodness-of-Fit Test:
1. State the hypothesis and identify the
claim
2. Find the critical value for the chi-
square table. The test is always
righttailed
3. Compute the test value using the
formula.

4. Make the decision

Consider the following as an example: • Reject the null hypothesis if the test
value is greater than the critical value
a quality control officer of a laptop
•Do not reject the null hypothesis if
manufacturing company would like to see if
the test value is less than the critical
there was a difference in the life span of
value
laptop batteries among three categories. A
5. Summarize the results
sample of 45 student laptop owners is
selected. The table below shows the
distribution of the life span of laptop
batteries in years. If there were no Example #1
difference, you would expect 45/3 = 15 A quality control officer of a laptop
years life span of batteries for each manufacturing company would like to see if
category. the life span of laptop batteries are equally
Category 4 years More Above distributed among three categories. A
and than 4 10 years sample of 45 student laptop owners is
below years selected. The table below shows the
and distribution of the life span of laptop
below 10 batteries in years. At α = 0.05 can it be
years considered that the lifespan of laptop
Observed 12 19 14 batteries are equally distributed among the
frequency three categories?
The observed frequencies will almost always Category 4 years More than Above
and 4 years 10 years
differ from the expected frequencies due to and below
sampling error; that is, the values differ below 10 years
from sample to sample. But the question is:
Observed 12 19 14 the type of investment. A sample of 93
frequency investors were interviewed and provided the
*Note that this problem involves only one information shown on the table below. At
categorical variable, the life span of laptop 0.10 level of significance, is there a
batteries classified into three (4 years and difference in investment preferences among
below, more than 4 years and below 10 the investors?
years, above 10 years), so we use the Types of Investment Frequency
goodness-of-fit-test. Stocks 35
Mutual Funds 18
Bonds 30
Index Funds 10
*Note that this problem involves only one
categorical variable, the types of investment
classified into four (stocks, mutual funds,
bonds, index funds), so we use the
goodness-of-fit-test.

Example #2
A financial analyst wants to determine
whether investors have any preference on
Example #3
An article shows statistics of orders
made online on a particular product with
different online stores within city. The data is
based on the last six months of the previous
year as follows, July 17%, August 11%,
September 8%, October 14%, November
27%, and December 23%. The CECT online
store manager wants to compare the orders
made with his store with that of the data
revealed by the article. The manager listed
the number of orders in his store on the
same product stated in the article. The table
on the right shows the data collected by the
manager for the last six months in the
previous year.
At 0.01 level of significance, can we
support the claim that the proportions of
orders with CECT online store is the same as Module 5b – Test for
the rest of the online stores within city?
Independence
Months Number of Orders
made with CECT store
July 27
August 17 Test for Independence (CATEGORICAL
September 22 DATA)
October 45 The chi-squared test procedure can also be
November 30 used to test the hypothesis of independence
December 59 of two variables of classification.
*Note that this problem involves only one A contingency table with r rows and c
columns is referred to as an r  c table.
categorical variable, months covered in a
year, so we use the goodness-of-fit-test

*The expected frequency E is computed by

multiplying the subtotals of the intersecting
categories, then dividing the product by the
grand total

Summary Procedures in conducting

Independence Test:
1. State the hypothesis and identify the
claim.
2. Find the critical value for the right tail
using the chi-square table. Determine
the degrees of freedom using the
formula df = (r – 1) (c – 1).
3. Compute the test value. To compute
the test value, first find the expected
values. For each cell of the
contingency table, use the formula
to get the expected value. To find the
test value, use the formula.

4. Make the decision.

•Reject the null hypothesis if the test
value is greater than the critical value.
•Do not reject the null hypothesis if
the test value is less than the critical
value.
5. Summarize the results.

When there is only one degree of freedom

(this means that a 22 contingency table is
given), Yate’s correction for continuity is
applied by reducing the absolute value of
each difference by 0.5 before squaring.
Hence, the formula to use is

Example #1
An education analyst wishes to see whether
the academic achievement a person has
completed is related to his or her socio
economic status. A sample of 88 people is
randomly selected. At α = 0.05, can it be
conclude that a person’s academic Example #2
achievement is dependent on the person’s A study was conducted to see if there was a
socio economic status? relationship between the memory recall and
the length of gadget usage per day of
children. A sample of 338 grade level pupils
is randomly selected and the results are
shown on the table below. At α = 0.01 level
of significance, can it be assumed that
*Note that this problem involves two
memory recall and the length of gadget
categorical variables, the academic
usage per day of children are dependent?
achievement a person has
completed and his or her socio economic
status, so we use the independence test.

*Note that this problem involves two

categorical variables, the memory recall and require the population.
the length of gadget usage per day of estimation of one or
children, so we use the independence test. more unknown
parameters (e.g.,
population mean or
variance).
Large sample sizes Can often be used in
are often required to smaller samples.
invoke the Central
Limit Theorem.
Can be used for
ordinal data.
Nonparametric
methods are
procedures that work
their magic without
reference to specific
parameters (or
measure of the
population).

Parametric Test Nonparametric

Test
If the information If there is no
about the population knowledge about the
is completely known population
by means of its parameters (i.e., µ or
parameters, (i.e., µ σ), but still it is
and σ), then we use required to test the
the parametric tests hypothesis of the
(e.g., t-test, z-test, population, then we
f-test, ANOVA, use the
Pearson correlation nonparametric tests
coefficient) (e.g., Single-sign
test, Wilcoxon signed
Module 5C - NON PARAMETRIC rank test, Wilcoxon
TESTS rank sum test, chi-
square test,
KruskalWallis test,
Classifications of Test of Hypothesis Spearman rank
PARAMETRIC NONPARAMETRIC
Parametric Usually focus on the
hypothesis tests sign or rank of the Difference between parametric and
require the data rather than the Nonparametric
estimation of one or exact numerical Parametric Nonparametric
more unknown value. Test statistics is test statistics is
parameters (e.g., based on the arbitrary
population mean or distribution
variance). Information about No information about
Parametric Do not specify the the population is the population is
hypothesis tests shape of the parent completely known available
(given σ or µ) Advantages and Disadvantages of
Specific assumptions No assumptions are Nonparametric Tests
are made regarding made regarding the • If parametric and nonparametric tests are
the population population both applicable to the same data set, we
Null hypothesis is The null hypothesis is should carry out the more efficient
made on parameters free from parameters parametric technique over its nonparametric
of the population counterpart.
distribution • Since we do not always have quantitative
Focus on the actual Focus on the sign or measurements and the assumptions of
numerical value rank of the data normality is not at all times justified, the
Parametric tests Nonparametric test nonparametric tests or distribution-free
require the assumes no method will compliment to their customary
estimation of one or knowledge of the parametric tests.
more unknown distributions of the • With nonparametric tests data analyst has
parameters population, except more ammunition to accommodate a wider
(population that it is continuous variety of experimental situations.
parameters)
Parametric Tests are Nonparametric tests
applicable only for are applicable to both Advantages of Nonparametric Tests
variable variable and 1. They can be used to test population
attributes parameters when the variable is not
No parametric test Nonparametric test normally distributed.
exist for nominal do exists for nominal 2. They can be used when the data are
scale data and ordinal scale nominal or ordinal.
data 3. They can be used to test hypotheses
Parametric test is Nonparametric test is that do not involve population
more powerful when more efficient when parameters.
data assumes data seriously 4. Can be used in small samples, thus,
normality departs from assumptions of normality are not
normality required.
5. In most cases, the computations are
easier than those for the parametric
methods.
6. Their interpretation is often more
direct than the interpretation of
parametric tests.
7. Nonparametric tests are simple and
easy to understand.
8. It will not involve complicated
sampling theory.
9. No assumption is made regarding the
population.

Disadvantages of Nonparametric Tests

1. They are less sensitive than their
parametric counterparts when the
assumptions of the parametric
methods are met.
2. Larger differences are needed before
the null hypothesis can be rejected.
3. They do not utilize all the information
provided by the sample.
4. Require special tables for small
samples.
5. They are less efficient than their
parametric counterparts when the
assumptions of the parametric
methods are met (normality).
6. Larger sample sizes are needed to
overcome the loss of information.
7. It can be applied only for nominal or
ordinal scale.

Ranking of Data
There are many applications in
business where data are reported not as
values on a continuum but rather on an
ordinal scale, thus, assigning ranks to the
values is necessary to draw an analysis of
the data. The distribution-free methods
therefore allows the data analyst to make an
analysis of ranks rather than the actual data One-Sample Runs Test //checks non
values which makes nonparametric tests randomness
very appealing and intuitive. The one-sample runs test is also called the
Wald-Wolfowitz test after its inventors,
For example, assuming that the Abraham Wald (1902-1950), and his student
nonparametric test is applicable, and an HR Jacob Wolfowitz.
personnel would like to determine the degree • One-sample runs test purpose is to
of relationship between the performance detect nonrandomness.
rank obtained by the ten trainees during the • A nonrandom pattern suggests that
first and second evaluation period. A the observations are not independent.
nonparametric test could then be used to • Here, we investigate whether each
determine if there is an agreement between observation in a sequence is
the two rank evaluations. independent of its predecessor (or the
appearance of one is not dependent
Thus, since nonparametric tests can on the appearance of another).
be applied to ordinal scale of data •
measurement, it is important for the analyst
to be efficient in ranking data sets.
Runs Test //checks for randomness
This test is to determine whether a sequence
of binary events (two outcomes involved)
follows a random pattern. A nonrandom
sequence suggests nonindependent
observations.
The hypotheses are:
Ho : Events follow a random pattern.
H1 : Events do not follow a random
pattern.
To test the hypothesis of randomness, we
first count the number of outcomes of each
type:
n1 : number of outcomes in the first
type
n2 : number of outcomes in the
second type
n = total sample size = n1 + n2

Wald-Wolfowitz Runs Test

When n1 ≥10 and n2 ≥10 (large sample
situation), the number of runs R may be
assumed to be normally distributed with
mean µR and standard deviation σR .

Summary of Procedures in Conducting a

One-Sample Runs Test
1. State the hypotheses and identify the
claim.
2. Count the runs by grouping sequences
of similar outcomes.
3. Compute the test value/statistic zcalc .
4. Find the critical value using the table
for the areas under the normal curve
(z - table). For a given level of
significance α, find the critical value
zα/2 for a two-tailed test. Because
either too many runs or too few runs
would be nonrandom, we choose a
two-tailed test.
5. Make the decision. Reject the
hypothesis of a random pattern if zcalc
< −zα/2 or if zcalc > +zα/2 .

Example 1: Wilcoxon signed-rank test of

the Sample Median Vs. Benchmark
The machines in a company used for the
production of paraffin wax candles operates
with a median of 5.6 hours before it reach its
downtime (due to limited specifications). The
operations manager would like to prove that
the same set of machines used for the mass
production of gel wax candles operates with
the same median operating time as that the
paraffin wax candles produced by the
machines. Twenty randomly selected
machines are inspected to see whether the
manager’s claim is valid. Use 0.05 level of
significance to test the claim.
Example 2: Wilcoxon Signed-rank test of
Paired Samples (Dependent Samples)
The table on the right shows the number of
baggage being shipped by a shipping
company over a 22 week period before and
after change of management. Data set A
shows the number of baggage shipped in the
previous management, while data set B is
the number of baggage shipped under new
management. Determine whether the old
and new management, on average, shipped
the same number of baggage against the
alternative that the old management ships
more baggage than the new management.
Use a 0.10 level of significance.

Wilcoxon Rank Sum Test (Mann-

Whitney Test)
The Wilcoxon rank sum test is a
nonparametric test to compare two
populations, utilizing only the ranks of the
data from two independent samples. If
the populations differ only in location
(center), it is a test for equality of
medians, corresponding to the parametric
two sample t-test.

• Wilcoxon rank sum test is named after

statisticians Frank Wilcoxon (1892 -
1965), Henry B. Mann (1905-2000),
and D. Ransom Whitney (1915 -
2007).
• Compares two populations whose
distributions are assumed to be the
same except for a shift in location.
• It is a test of differences between the
medians of two different populations
that are obviously nonnormal, and Summary of Procedures in Conducting a
samples are independent (no pairing Wilcoxon Rank Sum Test
of observations). 1. State the hypotheses and identify the
• It is analogous to the t – test for two claim.
independent sample means, thus, it 2. Find the critical. For large samples (n1
requires independent samples from ≥ 10, n2 ≥ 10), use a z-test.
populations with equal variances. 3. Compute the statistic zcalc by
• It does not assume normality. following the procedures below.
• Sort the combined samples from
lowest to highest.
Wilcoxon Rank Sum Test for Two • Assign rank to each value, use the
Independent Samples average of the ranks when there are
tied
The test of the hypothesis can be either one-
• Separate the data into two groups
tailed or two-tailed test.
according to classification or grouping
• Testing for the difference of two
(i.e., sample 1, sample 2).
population medians, then the test is
• Sum the ranks for each group (e.g.,
twotailed.
T1 , T2 ).
• If one median is greater or less than
• The sum of the ranks T1 + T2 must
the other, the test in one-tailed.
be equal to n(n + 1)/2, where n = n1
+ n2 .
• Calculate the mean rank sums—
Where : M1 is the median of population 1, mean of T1 and mean of T2 .
and M2 is the median of population 2 • The test statistic is computed using
Method A or Method B indicated on
the previous slide.
4. Make the decision. For a given α,
Assuming that the only difference in the
reject the null hypothesis if zcalc <
populations is in location, the hypotheses for
a two-tailed test of the population medians −zα/2 or if zcalc > +zα/2 .
5. Summarize the results.
can also be expressed as

Example 1: Wilcoxon Rank Sum Test for

Note: Two Independent Samples
• This module will illustrate only a large- The production of certain products, A and B
sample version (n1 ≥ 10, n2 ≥ 10) of is intermittent depending on the availability
this test, thus we can use the z-test. of resources. The following table shows 28
• The test statistics is the difference in randomly selected production of the products
mean ranks, divided by its standard on a day. Test the hypothesis that the
error. average production of product A and product
B are equal using 0.05 level of significance.
more populations.
If there is only one independent variable or
factor that distinguishes between the
different populations under study, the
procedure is called “one-way ANOVA” or
“one-factor ANOVA”.

One-Way ANOVA
• Also known as Completely
Randomized design or OneFactor
ANOVA.
• Experimental units (or subjects are
assigned randomly to treatments or
groups
• Subjects are assumed homogeneous
• Only one factor or independent
variable
• With three or more treatment levels
• Analyzed by one-factor analysis of
variance
• This technique is used to test claims
involving three or more means.
Examples:
- Accident rates in an assembly line for 1st,
2nd and 3rd shifts
-Expected mileage for 5 brands of tires
-Time it takes for 3 groups of students to
solve a problem in FARR
*Note that for each example, there is only
one independent variable.
*If two variables are considered, the
technique is referred to as two-way ANOVA.

One-Way ANOVA – parametric test

• The F-test can only show whether a
difference exists among the three or
more means. It cannot reveal where
the difference lies – that is between
𝑋1 and 𝑋2, 𝑋2 and 𝑋3, and 𝑋1 and 𝑋3.
MODULE 6A – One way ANOVA • If the F-test indicates that there is a
difference among the means, other
statistical tests are used to find where
The analysis of variance is used to test the the difference exists. The most
equality of three or more means using commonly used are the Scheffee test
sample variances. This method was and the Tukey test. These are not
developed by R. A. Fisher in the early 1920s. covered in this presentation.

The technique is called analysis of variance

(ANOVA) when an Ftest is used to test a
hypothesis concerning the means of three or
One-Way ANOVA
With the F-test, two different estimates of
the population variance are made.
1. Between-group variance – finding the
variance of the means.
2. Within-group variance – computing
the variance using all the data and is
not affected by differences in the
means.
If there is no difference in the means, these
variance estimates will be equal, and the F-
test value will be approximately 1.
*Note that since variances are compared,
this procedure (F-test) is called analysis of
variance or ANOVA

Appropriate hypotheses

*Excel not included in this reviewer, kung

gusto niyo reviewhin check Module 6a slide
11.

Hypothesis testing using traditional

method: one-way ANOVA One-way ANOVA – non parametric
1. State the hypotheses and identify the The analysis of variance uses the F test to
claim. compare the means of three or more
2. Find the F-test value, procedure is as populations. The assumptions for the ANOVA
shown. test are that the populations are normally
3. Find the critical value using the F- distributed and that the population variances
distribution table. are equal. When these assumptions cannot
4. Make the decision. be met, the nonparametric Kruskal-Wallis
5. Summarize the results. test, sometimes the H test, can be used to
compare three or more means.
Kruskal-Wallis test

Example #1 Kruskal-Wallis Test

Module 6b – Simple Regression

and Binary Logistic Regression
Analysis Part 1: Simple linear
Regression Analysis
Correlation Analysis
• The analysis of bivariate data typically
begins with a scatter plot that displays
each observed pair of data (x, y) as a
dot on the x-y plane.
• Correlation analysis is used to
measure the strength of the linear Test for significant correlation using
relationship between two variables. Student’s t:
• Correlation is only concerned with • To test the hypothesis 𝐻𝑜: 𝜌 = 0, the
strength of the relationship. test statistic is
• No causal effect is implied with
correlation.
• The sample correlation coefficient (like
Pearson’s r and Spearman rho)
measures the degree of linearity in the
relationship between two random
variables X and Y, with values in the • After calculating this value, we can
interval [-1, 1]. find its p-value by using Excel’s
function = 𝑇.𝐷𝐼𝑆𝑇. 2𝑇(𝑡, deg _𝑓𝑟𝑒𝑒𝑑𝑜𝑚)

Correlation Analysis
Regression Analysis
Scatter plots showing various correlation
coefficient values • The hypothesized relationship may be
linear, quadratic or some other form.
In Excel, use these functions to get the value
• The next slide presents some of the
of the correlation coefficient.
possible patterns.
1. =CORREL(array1, array2)
• The module will focus on the simple
2. =PEARSON(array1, array2)
linear model commonly referred to as
a simple regression equation.

Regression Analysis: Types of

relationships

Correlation Analysis
Test for significant correlation using
Student’s t:
• The sample correlation coefficient r is
an estimate of the population
correlation coefficient 𝜌 (Greek
alphabet rho).
• There is no flat rule for a “high”
correlation because sample size must
be taken into consideration.
• To test the hypothesis 𝐻𝑜: 𝜌 = 0, the
test statistic is
Interpreting an Estimated Regression
Equation
The slope tells use how much, and in what
direction, the dependent or response
variable will change for each one unit
increase in the predictor variable. On the
other hand, the intercept is meaningful only
if the predictor variable would reasonably
have a value equal to zero.
Equation:

Interpretation:
Each extra P1 million of advertising will
generate P7.37 million of sales on average.
The firm would average P268 million of sales
with zero advertising. However, the intercept
Simple Linear Regression Model may not be meaningful because Ads = 0
• Only one independent variable, X may be outside the range of observed data.
• The relationship between X and Y is
described by a linear function.
• The changes in Y are related to Interpreting an Estimated Regression
changes in X. Equation

Prediction Using Regression

Simple Linear Regression Equation One of the main uses of regression is to
make predictions. Once we have a fitted
regression equation that show the estimated
relationship between X and Y, we can plug in
any value of X (within the range of our
sample x values) to obtain the prediction for
Y.
Standard Error of Estimate
The standard deviation of the variation
of observations around the regression
line is estimated by:

Assumptions of Regression (L.I.N.E)

Linearity – the relationship between X and Y
is linear
Independence of errors – the error values
(difference between observed and estimated
values) are statistically independent.
Normality of error – the error values are
normally distributed for any given value of X
Example #1 Standard error of estimate
Equal variance or homoskedasticity – the
probability distribution of the errors has
constant variance.

Example #1 excel, look module 6b slides 18

- 23

Assessing Fit: Coefficient of

determination, R2
• The coefficient of determination is the Comparing Standard Errors
portion of the total variation in the
𝑆𝑌𝑋 is a measure of the variation of observed
dependent variable that is explained
Y values from the regression line. The
by the variation in the independent
magnitude of 𝑆𝑌𝑋 should always be judged
variable.
relative to the size of the Y values in the
• It is also called r-squared and is
sample data.
obtained by:

Example 1 Coefficient of determination

Inferences about the slope using the t-
test
• The t-test for a population slope is
used to determine if there is a linear
relationship between X and Y.
• Null and alternative hypotheses
H0 : β1 = 0 (no linear relationship)
H1 : β1 ≠ 0 (linear relationship does
exist)
Example 1 Inferences about the slope

Checking the assumptions by examining

the residuals
Residual Analysis for Normality:
1. Examine the Stem-and-Leaf Display of
the Residuals.
2. Examine the Box-and-Whisker Plot of
the Residuals.
3. Examine the Histogram of the
Residuals.
4. Construct a normal probability plot.
5. Construct a Q-Q plot.
Strategies when performing regression
analysis
• Start with a scatter plot of X on Y to
observe possible relationship.
• Perform residual analysis to check the
assumptions.
• Plot the residuals vs X to check for
violations of assumptions such as
equal variance.
• Use a histogram, stem and leaf
display, box and whisker plot or
normal probability plot of the residuals
to uncover possible non-normality.
Measuring Autocorrelation
• If there is any violation of any
• The Durbin-Watson Statistic measures
assumption, use alternative methods
detects the presence of
or models.
autocorrelation.
• If there is no evidence of assumption
• It is used when data are collected over
violation, then test for the significance
time to detect the presence of
of the regression coefficients.
autocorrelation.
• Avoid making predictions or forecasts
• Autocorrelation exists if residuals in
outside the relevant range.
one time period are related to
residuals in another period.
• The presence of autocorrelation of
errors (or residuals) violates the Module 6c – Multiple Linear
regression assumption that residuals Regression Analysis
are statistically independent.

Multiple Regression
• Multiple regression extends simple
regression to include several
independent variables (called
predictors or explanatory variables).
• It is required when a single-predictor
model is inadequate to describe the
relationship between the response
The Durbin-Watson, DW, Statistic variable (Y) and its potential
▪ The DW statistic is used to test for predictors (X1 , X2 , X3 , …).
autocorrelation. • The interpretation is similar to simple
regression since simple regression is a
special case of multiple regression.
• Calculations are done by computer.
• Using multiple predictors is more than
a matter of “improving its fit”. Rather,
it is a question of specifying a correct yields unbiased, consistent, efficient
model. estimates of the unknown parameters.
• A low R2 in a simple regression model
does not necessarily mean that X and
Y are unrelated, but may simply The estimated regression equation
indicate that the model is incorrectly
specified.
• Omission of relevant predictors (model
misspecification) can cause biased
estimates and misleading results.

Limitations of Simple Regression

• Multiple relationships usually exist. Fitted regression: comparison between
• The estimates are biased if relevant a 1-predictor model versus a 2-predictor
predictors are omitted. model
• The lack of fit (low R-squared) does
not show that X is unrelated to Y if the
true model is multivariate.
• Simple regression is only used then
there is a compelling need for a simple
model, or when other predictors have
only modest effects and a simple
logical predictor ”stands out” as doing
a very good job all by itself. Example #1
A distributor of frozen dessert pies wants to
evaluate factors though to influence
The population regression model demand.
- the dependent variable is pie sales (units
per week)
- the independent variables are price (in
USD) and advertising cost (in hundred USD)

The data are collected for 15 weeks.

The population regression model

• the random error ε represents
everything that is not part of the
model;
• the unknown regression coefficients,
denoted by Greek letters, are
parameters;
• each coefficient shows the change in
the expected value of y for a unit
change in Xi while holding everything
constant (ceteris paribus).
• the errors are assumed to be
unobservable, independent random
disturbances that are normally
distributed with zero mean and
constant variance. Under these
assumptions, the ordinary least
squares (OLS) estimation method
For a regression with k predictors, the
hypotheses to be tested are:
Ho : All the true coefficients are zero (𝛽1 =
𝛽2 = ⋯ = 𝛽𝑘 = 0)
H1 : At least one of the coefficients is
nonzero.

COEFFICIENT OF MULTIPLE DETERMINATION

• The coefficient of multiple
determination reports the proportion
total variation in Y that is explained by
the variation of all predictor variables
taken together
• It is also called r-squared and is
obtained by:

ASSESSING OVERALL FIT

ASSESSING OVERALL FIT: F-test for

significance

ADJUSTED R2
• R-squared decreases when a new
Before determining which, if any, of the predictor variable X is added to the
individual predictors are significant, we model.
perform a global test for overall fit using the • This can be a disadvantage when
F-test. comparing models.
• What is the net effect of adding a new
variables?
• We lose a degree of freedom when a
new variable is added.
• Did the new X variable add enough
independent power to offset the loss
of one degree of freedom?
The adjusted R2 shows the proportion of
variation in Y explained by all X variables
adjusted for the number of X variables used.

• It penalizes excessive use of

unimportant predictor variables.
• It is smaller than R2 .
• It is useful when comparing models

How many predictors?

• One way to prevent over fitting the
model is to limit the number of
predictors based on the sample size.

Detecting MULTICOLLINEARITY
• These rules are merely suggestions.
• When the predictor variables are
related to each other instead of being
independent, we have a condition
SIGNIFICANCE OF PREDICTORS known as multicollinearity.
• We are usually interested in testing • Multicollinearity induces variance
each estimated coefficient to see inflation and makes the t statistics less
whether it is significantly different reliable.
from zero, that is, if a predictor • Least squares estimation fails when
variable helps explain the variation in this condition is present.
Y. Ways of detecting multicollinearity:
• Use t-tests of individual variable
• To check whether 2 predictors are
slopes.
correlated, compute the correlation
• Shows if there is a linear relationship
coefficients. Suspect multicollinearity
between the variables Y and Xi .
if two predictors are highly correlated
• Hypotheses:
(r ≥ 0.80) or if the correlation
coefficient exceeds the multiple R.
• Multicollinearity is present if variance
inflationary factor (VIF) is at least 10.
The VIF is provided in regression
output in JASP.

REGRESSION DIAGNOSTICS
Independence of errors – the error values
(difference between observed and estimated
values) are statistically independent OR non-
autocorrelated. (for time-series data and
panel data)
Normality of error – the error values are
normally distributed for any given value of X
Equal variance or homoskedasticity – the
probability distribution of the errors has
constant variance Measuring Autocorrelation
• Another way of checking for
independence of errors is by testing
Checking the assumptions by examining the significance of the Durbin Watson
the residuals Statistic.
• The Durbin-Watson Statistic measure
detects the presence of
autocorrelation.
• It is used when data are collected over
time to detect the presence of
autocorrelation.
• Autocorrelation exists if residuals in
one time period are related to
Checking the assumptions by examining residuals in another period.
the residuals • The presence of autocorrelation of
Residual Analysis for Normality: errors (or residuals) violates the
1. Examine the Stem-and-Leaf Display of regression assumption that residuals
the Residuals are statistically independent.
2. Examine the Box-and-Whisker Plot of
the Residuals
3. Examine the Histogram of the
Residuals
4. Construct a normal probability plot.
5. Construct a Q-Q plot

The Durbin-Watson, DW, Statistic

The DW statistic is used to test for
autocorrelation.

Yr 7 Science Textbook (Brunei)
75% (4)
Yr 7 Science Textbook (Brunei)
184 pages
Summative Assessment Excel Template+ (for+STUDENTS) + (Accounting Cycle & Transaction Analysis)
No ratings yet
Summative Assessment Excel Template+ (for+STUDENTS) + (Accounting Cycle & Transaction Analysis)
22 pages
RL800 Series ONU User Manual
No ratings yet
RL800 Series ONU User Manual
45 pages
B Exercises: E3-1B (Transaction Analysis-Service Company)
No ratings yet
B Exercises: E3-1B (Transaction Analysis-Service Company)
8 pages
CH 01
No ratings yet
CH 01
52 pages
Module 11 Unit 2 Simple Linear Regression
No ratings yet
Module 11 Unit 2 Simple Linear Regression
12 pages
Research Statistics Midterm
No ratings yet
Research Statistics Midterm
3 pages
ECON 321 Extra Sample Midterm 3 Solutions 2
No ratings yet
ECON 321 Extra Sample Midterm 3 Solutions 2
8 pages
Chaper 8 Estimation Revised
No ratings yet
Chaper 8 Estimation Revised
9 pages
Test Sheet-Accounts v1
No ratings yet
Test Sheet-Accounts v1
6 pages
Vril The Religion of Lucifer
100% (1)
Vril The Religion of Lucifer
3 pages
Lesson 19 Hypothesis Testing For Two Means of Independent Samples
No ratings yet
Lesson 19 Hypothesis Testing For Two Means of Independent Samples
7 pages
Day 4 Chap 1 Rev. FI5 Ex PR PDF
No ratings yet
Day 4 Chap 1 Rev. FI5 Ex PR PDF
7 pages
All 12 Assignment Answers
No ratings yet
All 12 Assignment Answers
5 pages
(Ebook PDF) Skills For Accounting Research, 4e FASB Codification & eIFRS All Chapter Instant Download
100% (4)
(Ebook PDF) Skills For Accounting Research, 4e FASB Codification & eIFRS All Chapter Instant Download
51 pages
RP ch06
No ratings yet
RP ch06
121 pages
Statistic Exercise 2
No ratings yet
Statistic Exercise 2
3 pages
Operations Research: Simplex Method
No ratings yet
Operations Research: Simplex Method
27 pages
Quantitative Techniques Study Pack 2006 Wip
100% (2)
Quantitative Techniques Study Pack 2006 Wip
434 pages
Scale of Measurement - Adekola Kamaldeen
No ratings yet
Scale of Measurement - Adekola Kamaldeen
5 pages
Pract 2, March 2010
No ratings yet
Pract 2, March 2010
8 pages
Anthropometry Lab Report 1
No ratings yet
Anthropometry Lab Report 1
4 pages
05 - One-Sample T Test
100% (1)
05 - One-Sample T Test
26 pages
Decision Theory
No ratings yet
Decision Theory
24 pages
Quantitative Techniques in Management - Set Theory
No ratings yet
Quantitative Techniques in Management - Set Theory
20 pages
IAS-40 Investment Property Revised
No ratings yet
IAS-40 Investment Property Revised
22 pages
Completing The Accounting Cycle
No ratings yet
Completing The Accounting Cycle
9 pages
Exercises Managerial Accounting Cost Concepts
No ratings yet
Exercises Managerial Accounting Cost Concepts
12 pages
Lesson 1 Cluster 2 FSUU Accounting
No ratings yet
Lesson 1 Cluster 2 FSUU Accounting
6 pages
Decision Making Reporter
No ratings yet
Decision Making Reporter
17 pages
Midterm Examination II Theory
No ratings yet
Midterm Examination II Theory
43 pages
AGE-100---Introduction-to-Cartography-&-Map-Analysis-Premium-Notes
100% (1)
AGE-100---Introduction-to-Cartography-&-Map-Analysis-Premium-Notes
78 pages
Unsolved Paper Part I
No ratings yet
Unsolved Paper Part I
107 pages
Managerial Midterm
No ratings yet
Managerial Midterm
6 pages
Two-Way Anova: (BS Chem 3B - Group 2)
No ratings yet
Two-Way Anova: (BS Chem 3B - Group 2)
21 pages
KHALID-MOVING AVERAGESx
No ratings yet
KHALID-MOVING AVERAGESx
8 pages
Operations Management Chapter 3
No ratings yet
Operations Management Chapter 3
35 pages
Accounting in Action: 97. The Partners' Capital Statement Includes Each of The Following Except
No ratings yet
Accounting in Action: 97. The Partners' Capital Statement Includes Each of The Following Except
10 pages
CH 03 - Descriptive Statistics: Numerical: Page 1
100% (1)
CH 03 - Descriptive Statistics: Numerical: Page 1
72 pages
Chap 004
100% (2)
Chap 004
149 pages
Chapter 20
No ratings yet
Chapter 20
14 pages
Management Accounting
25% (4)
Management Accounting
3 pages
Task Performance IN Math: Members
No ratings yet
Task Performance IN Math: Members
6 pages
Measurement Ipsas 46
No ratings yet
Measurement Ipsas 46
11 pages
Chapter 10 Exercises Acc101
No ratings yet
Chapter 10 Exercises Acc101
6 pages
Sampling Distribution
No ratings yet
Sampling Distribution
18 pages
Cost Accounting Quiz 1 - Statement of Cost of Goods Manufactured
No ratings yet
Cost Accounting Quiz 1 - Statement of Cost of Goods Manufactured
4 pages
Cost Terms, Concepts and Behaviors
No ratings yet
Cost Terms, Concepts and Behaviors
4 pages
Excel IF Statement With Multiple AND/OR Conditions, Nested IF Formulas, and More
No ratings yet
Excel IF Statement With Multiple AND/OR Conditions, Nested IF Formulas, and More
9 pages
Week 05
No ratings yet
Week 05
23 pages
Operations Management 4
No ratings yet
Operations Management 4
17 pages
Exercise 1. History, Development, and Functions of The Standard-Setting Bodies
No ratings yet
Exercise 1. History, Development, and Functions of The Standard-Setting Bodies
14 pages
Statistical Analysis With Software Application Module - 7
No ratings yet
Statistical Analysis With Software Application Module - 7
4 pages
F5 Variances Test
No ratings yet
F5 Variances Test
5 pages
Accrual and Provision
100% (1)
Accrual and Provision
66 pages
MODULE 4 - Standard Costing and Variance Analysis
No ratings yet
MODULE 4 - Standard Costing and Variance Analysis
7 pages
Accounting Homework - 1 Week To Complete
No ratings yet
Accounting Homework - 1 Week To Complete
3 pages
Notes - Master Budget
No ratings yet
Notes - Master Budget
5 pages
Adstat Final Exam Reviewer2
No ratings yet
Adstat Final Exam Reviewer2
29 pages
Inbound 8609162511062510069
No ratings yet
Inbound 8609162511062510069
28 pages
Z-Test of Difference
No ratings yet
Z-Test of Difference
5 pages
Statistics Unit 9 Notes
No ratings yet
Statistics Unit 9 Notes
10 pages
How to Find Inter-Groups Differences Using Spss/Excel/Web Tools in Common Experimental Designs: Book Two
From Everand
How to Find Inter-Groups Differences Using Spss/Excel/Web Tools in Common Experimental Designs: Book Two
P.Y. Cheng
No ratings yet
BDA Schieberanlage Englisch 2
No ratings yet
BDA Schieberanlage Englisch 2
20 pages
The Industrial Revolution
No ratings yet
The Industrial Revolution
5 pages
Saw Devices: Presented By:-Ashish Shavarna
No ratings yet
Saw Devices: Presented By:-Ashish Shavarna
14 pages
Wadoryu Grading Syllabus: 5 Kyu - 5 Dan
No ratings yet
Wadoryu Grading Syllabus: 5 Kyu - 5 Dan
28 pages
Diary firbe
No ratings yet
Diary firbe
7 pages
1 Introduction To Electrical Drives
75% (4)
1 Introduction To Electrical Drives
45 pages
Homework Checklist For Parents
100% (1)
Homework Checklist For Parents
8 pages
Building Inspection Report Assignment
No ratings yet
Building Inspection Report Assignment
5 pages
Reviewer Baptism and Confirmation
No ratings yet
Reviewer Baptism and Confirmation
5 pages
Federalism Ronald Watts
No ratings yet
Federalism Ronald Watts
133 pages
Narimaster Cattle. Sibi
No ratings yet
Narimaster Cattle. Sibi
14 pages
CURRICULUM VITAE
No ratings yet
CURRICULUM VITAE
4 pages
Diagnosis 000
No ratings yet
Diagnosis 000
4 pages
AirFrame Support Services Playbook
No ratings yet
AirFrame Support Services Playbook
13 pages
Masava Student 2016 PDF
No ratings yet
Masava Student 2016 PDF
94 pages
GFRC Benefits - The Concrete Network
No ratings yet
GFRC Benefits - The Concrete Network
14 pages
UTAMU
No ratings yet
UTAMU
2 pages
Jurnal Plasmid
No ratings yet
Jurnal Plasmid
5 pages
Auditing, Assurance Services, and Forensics (PDFDrive)
100% (1)
Auditing, Assurance Services, and Forensics (PDFDrive)
494 pages
Funny Stories
No ratings yet
Funny Stories
5 pages
Senior High School Electronic Class Record: Instructions
No ratings yet
Senior High School Electronic Class Record: Instructions
33 pages
6.1 Differential Calculus 01 Solutions
100% (3)
6.1 Differential Calculus 01 Solutions
10 pages
Brochure Thermomark Roll PHOENIX CONTACT
No ratings yet
Brochure Thermomark Roll PHOENIX CONTACT
23 pages
English Language Secondary 2 TR1 C W11
No ratings yet
English Language Secondary 2 TR1 C W11
2 pages
Mubarak Olayiwola Ahmad - Resume
No ratings yet
Mubarak Olayiwola Ahmad - Resume
3 pages
Entrepreneur USA - JulyAugust 2023
No ratings yet
Entrepreneur USA - JulyAugust 2023
120 pages
PSX 700 Graffiti
No ratings yet
PSX 700 Graffiti
4 pages

Adstat Final Exam Reviewer2highlighted

Uploaded by

Adstat Final Exam Reviewer2highlighted

Uploaded by

Adstat Final Exam Reviewer Note: d0 is the hypothesized difference.

Tests of difference between

Two sample tests for the Mean:

Equal or Unequal Variances

Assumptions for the two independent

Two sample tests for the Mean:

Note: μd is the mean of the population

Assumptions for the two dependent

*Example #3 ginamitan ng Excel kaya di

Module 5a – Chi square

The Chi-square Distribution

Summary Procedures in conducting Chi-

4. Make the decision

*The expected frequency E is computed by

Summary Procedures in conducting

4. Make the decision.

When there is only one degree of freedom

*Note that this problem involves two

Parametric Test Nonparametric

Disadvantages of Nonparametric Tests

Wald-Wolfowitz Runs Test

Summary of Procedures in Conducting a

Example 1: Wilcoxon signed-rank test of

Wilcoxon Rank Sum Test (Mann-

• Wilcoxon rank sum test is named after

Example 1: Wilcoxon Rank Sum Test for

One-Way ANOVA – parametric test

The technique is called analysis of variance

*Excel not included in this reviewer, kung

Hypothesis testing using traditional

Example #1 Kruskal-Wallis Test

Module 6b – Simple Regression

Regression Analysis: Types of

Prediction Using Regression

Assumptions of Regression (L.I.N.E)

Example #1 excel, look module 6b slides 18

Assessing Fit: Coefficient of

Example 1 Coefficient of determination

Checking the assumptions by examining

Limitations of Simple Regression

The data are collected for 15 weeks.

The population regression model

COEFFICIENT OF MULTIPLE DETERMINATION

ASSESSING OVERALL FIT

ASSESSING OVERALL FIT: F-test for

• It penalizes excessive use of

How many predictors?

The Durbin-Watson, DW, Statistic

You might also like