Statsprob -Reviewer q2
Statsprob -Reviewer q2
natural phenomenon. It is used mostly in the field of science, where the scientific method is
used to test it.
Examples of a Hypothesis:
1. By the end of the year, there will be a big increase in the number of recoveries of COVID19
patients.
2. The change in climate temperature sets everyone in the community to be more careful in
their daily activities.
The process of hypothesis testing involves making a decision between two opposing
hypotheses (null and its alternative). If one is true, the other hypothesis must be false. It
means that if the improbability of occurrence can be established in one hypothesis, then the
other hypothesis is likely to occur.
Types of Errors:
Type I Error: If the null hypothesis is true and rejected, the decision is incorrect.
Ex: Punishing a person who is truly innocent and putting them wrongly to jail.
Type II Error: If the null hypothesis is false and accepted, the decision is incorrect.
Ex: Bryan thinks that he is a six-footer. His actual height is 156cm.
Rejection Region refers to the region where the value of the test statistic lies for which we
will reject the null hypothesis. This region is also known as critical region.
A. Non- Directional (Two-Tailed Test) – The probability is found on both tails of the
distribution.
B. Directional (One-Tailed, Left Tail) – The probability is found at the left tail of the
distribution.
C. Directional (One-Tailed, Right Tail) – The probability is found at the right tail of the
distribution.
Sample is a group of subjects carefully selected from a population of interest (e.g. As of May
15, 8:15pm, 10% of the votes have been counted and Nancy Binay is in the 5th spot.)
Parameter is the numerical value that describes characteristics of a population (e.g. total
votes)
Statistic is the numerical value that describes a particular sample (e.g. 10% of votes)
Hypothesis Testing is the process of using statistics to evaluate the utility and validity of the
research theory, and this activity always begins with formulating statement or expectation to a
certain phenomenon.
● Check the type of measurement used on the given. Use “µ” for mean/ average and “p”
for proportion.
● Assess whether the statement denotes a direction.
Hint: a directional hypothesis contains greater than, less than, at least, at most, and other
similar terms.
Test Statistic – a statistical way of testing a hypothesis whether to reject the null hypothesis
and it also compares your data with what is expected under the null hypothesis.
● if the sample size n is large enough, population mean 𝝁 and the population variance 𝝈²
are known.
● if the sample size n is large enough, population mean 𝝁 is known, and the population
variance 𝝈² is unknown. (by applying Central Limit Theorem sample variance 𝒔² may
be used as an estimate value of the population variance 𝝈²)
● if the sample size n is less than 30 ( 𝒏 < 𝟑𝟎 ) but the population variance 𝝈² is known.
● if the sample size n is less than 30 ( 𝒏 < 𝟑𝟎 ) but the population variance 𝝈² is
unknown. (If we assume that the sample comes from a normally distributed
population, then the sample variance 𝒔² can be used to estimate population variance
𝝈².)
Example of how to solve problems involving:
a. State the null hypothesis (𝐻₀) and the alternative hypothesis (𝐻ₐ).
b. Determine the test statistic to be used for the hypothesis test, and calculate its value.
c. Identify the critical value for the test and define the critical region.
d. Make a decision and draw a conclusion by comparing the calculated value of the test
statistic with the critical value.
1. A school claims that their Grade 11 HUMSS students study an average of 3 hours per
day. A teacher suspects this is an overestimate and decides to test the claim. She
takes a random sample of 30 students and finds that they study an average of 2.5
hours per day with a standard deviation of 0.6 hours.
Solution:
2. A school wants to know if a new learning technique improves students' test scores in
Math. Last year, the average score of students was 75. This year, a group of 40
students using the new technique has an average score of 78 with a standard
deviation of 5.
Solution:
P-value - a measure of how much sample data deviates from what is expected under the
assumption of a true null hypothesis.
The central limit theorem states that if you have a population with mean μ and standard
deviation σ and take sufficiently large random samples from the population with replacement,
then the distribution of the sample means will be approximately normally distributed.
We have mentioned in the previous lesson that the population proportion can be estimated
only for large sample size (n ≥ 30). The same is true in testing a claim or hypothesis about the
population proportion (p).
To test a claim about population proportion, we use the z-test for population proportion.
As in the use of z-test for means, the decision rule below is used:
● If Zcomputed ≥ Zcritical REJECT Ho
● If Zcomputed < Zcritical Do not Reject Ho
Example:
A researcher who is studying on the rapid growth of the population wants to determine the
proportion of female rats in a certain region, then he doesn’t need to catch every rat he sees
and record its gender. He only needs a sufficient sample from which he will make inference
about the proportion of female rats.
The researcher may initially believe that 50% of the rat population are female. Out of 50 rats
he collected, 23 are female.
From the above example the researcher wants to test his belief that 50% or 0.5 of the
population of rats is female. From the collected samples, 23 out of 50 are female. Would this
support the claim? Use α = 0.05.
Another example:
Newborn babies are more likely to be boys than girls. A random sample found 13,173 boys
were born among 25,468 newborn children. The sample proportion of boys was 0.5172. Is
this sample evidence that the birth of boys is more common than the birth of girls in the entire
population? (Use 5% level of significance)
Solution:
Some research studies involve two variables. One of these two variables is the independent
variable and the other one is the dependent variable.
● Independent Variable: is the variable that may affect the dependent variable to
change.
● Dependent Variable: is the variable that is influenced or affected by the independent
variable.
The data collection in this type of study that involves two variables are called bivariate data.
Bivariate data deal with two variables that are compared in order to find or establish their
relationship.
Examples:
1) Number of hours spend in studying and corresponding test scores
2) Ice cream sales versus the temperature on that day
3) IQ scores and the amount of sleeping time
4) Mileage and age of the car
5) Height and weight of children below 18 yrs. Old
The relationship of variables in bivariate data can be displayed using a graph called scatter
plot.
● A scatter plot is the most common display of qualitative data. It shows patterns,
trends, relationship and possible extraordinary value/s between the variable.
● Using the scatter plot, we can describe the form, direction and strength of association
between two variables.
We can also describe the relationship of the variables by looking at the direction of the points
on the scatter plot. The pattern has a positive direction if it runs from the lower left to the
upper right. If it runs from the upper left to the lower right, then it has a negative direction. It
tells us whether the values on the two variables go up or down together or not.
The strength of the pattern can also be described in the scatter plot. It is related to how
closely clustered the points are around the form. It tells us the degree to which values of one
variable are related to the values of the second variable. We normally used the words, weak,
moderate or strong to describe the strength of associations or relationship.
Steps in Constructing a Scatter Plot
1. Draw a graph and label the x- and y- axes.
2. Assign each qualitative variable to an axis.
3. Choose a range for each axis that includes the maximum and the minimum values in
the data set.
4. Plot each point on the graph.
Example:
1st Semester Grade vs. 2nd Semester Grade of Ten Grade 11 Students
Thus, the scatter plot describes a positive relationship between the 1st Semester Grade
and 2nd Semester Grade of Ten Grade 11 Students.
Direction of Correlation
● Positive Correlation exists when high values of one variable correspond to high values
in the other variable or low values in one variable correspond to low values in the other
variable.
● Negative Correlation exists when high values of one variable correspond to low values
in the other variable or low values in one variable correspond to high values in the
other variable.
● Zero Correlation exists when high values in one variable correspond to either high or
low values in the other variable
Strength of Correlation
● Perfect
● Very high
● Moderately high
● Moderately low
● Very low
● Zero
The trend line is the line closest to the point. The direction of the line tells the direction of
correlation that exist between the variables. If the trend line points to the right, its slope is
positive, thus there is a positive correlation between two variables. If it points to the left, there
is negative correlation between two variables.
The Pearson Product-Moment Correlation Coefficient, also called the sample correlation
coefficient r, is a widely used statistical measure of strength of a linear relationship between
two variables. It is given by
We will use the given table to determine the strength of the computed r. :
Example
Determine the value of Pearson r for the following data and interpret the results.
● c) Use the Pearson Product Moment Correlation Formula to solve for r and interpret.
Solving for r
● r = 0.32; moderately low but positive
1. determine the strength of the predictors, that is, identifying the strength of the effect
that the independent variable(s) have on a dependent variable.
2. forecast effects or impact of changes, that is, understanding how much the dependent
variable changes with a change in one or more independent variables; and
3. predict trends and future values, that is, getting a point estimates.
The basic and commonly used regression analysis is the linear regression. Linear
regression estimates are used to explain the relationship between one dependent variable
and one or more independent variables.
The simplest form of linear regression is called simple linear regression. It is a linear
regression model with two-dimensional sample points, one dependent variable and one
independent variable.
A teacher wants to know the effect of attendance on the academic performance of the
students.
Solution:
❖ The independent variable is the attendance of the student. The teacher can
manipulate the length of time and the students that will participate in the experiment.
To interpret the slope and y-intercept of the regression line, remember that in regression:
Example:
Five randomly selected students were surveyed about their Statistics 1st quarter test score
and their 1st quarter grade in Statistics. Assuming that there is a significant relationship
between the two variables, determine the slope and y-intercept of the regression line. Then,
interpret the result.
Solution:
Step 3. Calculate the value of a and b in the formula, substitute the summations found
in step 2 and the sample size n given in the problem, which is 5 students, thus, n=5.
Step 4. Interpret the result.
● The slope of the regression line is 0.82, which indicates that for every grade of 0.82,
there corresponds a score of 1 in Statistics.
● The y-intercept of the regression line is 53.47, which indicates that for a test score of
0, there will be an average grade of 53.47 in Statistics
In the example, the y-intercept does not make sense because we don’t expect that the score
to be near 0.
The field of Statistics that deals with prediction is called regression analysis.
The horizontal axis representing the independent variable and the vertical axis
representing the dependent variable. In this function Y is the dependent variable which is the
event expected to change; and X is the independent variable which is manipulated. To solve
for Y, substitute the given value of X.
Y= f (X)
Linear regression quantifies the relationship between one or more predictor variables and
one outcome variable.
● For example it can be used to quantify the relative impacts of age, gender, and diet
(the predictor variables) on height (the outcome variable). Y is the outcome or
dependent variable whereas X is the predictor or independent variable.
● When the trend line is drawn, we observed that some of the points are on the line
while others are below or above the line. In other words, we say that the points in the
scatterplot regress with reference to the line. If the average Y distances of the points
from this line is the least, then we call this line the regression line or the line that
“best fit” in the scatterplot. The regression line is the same as the trend line.
● If two variables are correlated, we can predict the value of one variable in terms of the
other variable. The relationship or correlation must be significant. This means that the
actual relationship exists in the population, not just in the sample.
● The regression analysis is then used to predict the value of one variable in terms of
the other variable. Thus, we do correlation analysis first before performing
regression analysis.
The regression line Y’ = bX + a is also called the line prediction equation because we use it
to predict Y if X is known. Since in the analysis, only the Y distance was considered, the line
cannot be used to predict X from Y.
To determine the regression line or do the regression analysis, we go through the following
steps:
DECISION RULE
➢ If the computed t ≥ critical value of t then, reject Ho, accept the HA
Interpretation: There is a significant relationship between the two variables
Example 1
➔ If the computed t= 7.35 and the critical t= 2.105 , what would be the interpretation if
the null hypothesis is rejected?
ANSWER
The null hypothesis is rejected, there is a significant relationship between the two variables.
Example 2
➔ The following data pertains to the heights of fathers and their eldest sons in inches. Is
there a significant relationship between the two variables, predict the height of the son
if the height of his father is 78 inches.
Solution:
1. Identify the dependent and independent variable.
➔ Solution: Here, the dependent variable is the height of the son while the independent
variable is the height of the father.
➔ b. Solving for t
5. Make a decision
➔ Solution: Since the computed t=8.61 is greater than the critical t=2.306, we reject the
null hypothesis. So, there is a significant relationship between the two variables.
7. Compute the values of a and b in the regression equation Y’=bX+a. Using the
following formulas:
Y’=0.78X+16.55
9. Predict the height of the son if the height of the father is 78 inches
➔ Solution: Find the value of Y when X=78 in the regression equation.
So, the predicted height of the son whose father is 78 inches is 77 inches. Remember that
this is just a predicted value based on the given data.