0% found this document useful (0 votes)
12 views

Statsprob -Reviewer q2

Uploaded by

janadamz72
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Statsprob -Reviewer q2

Uploaded by

janadamz72
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Hypothesis is an educated guess or proposition that attempts to explain a set of facts or

natural phenomenon. It is used mostly in the field of science, where the scientific method is
used to test it.

Examples of a Hypothesis:

1. By the end of the year, there will be a big increase in the number of recoveries of COVID19
patients.

2. The change in climate temperature sets everyone in the community to be more careful in
their daily activities.

Hypothesis testing is another area of Inferential Statistics. It is a decision - making process


for evaluating claims about a population based on the characteristics of a sample purportedly
coming from the population. The decision is whether the characteristic is acceptable or not.

The process of hypothesis testing involves making a decision between two opposing
hypotheses (null and its alternative). If one is true, the other hypothesis must be false. It
means that if the improbability of occurrence can be established in one hypothesis, then the
other hypothesis is likely to occur.

Two Types of Hypotheses:

NULL HYPOTHESIS, denoted by 𝐻0, is a statement that there is NO difference between a


parameter and a specific value, or that there is NO difference between two parameters.

- H0: μ1 = μ2, H0: p1 = p2


-=,≤,≥
- Directional (One-Tailed, Left Tail) - The probability is found at the left tail of the
distribution.
- Directional (One-Tailed, Right Tail) – The probability is found at the right tail of the
distribution.

ALTERNATIVE HYPOTHESIS, denoted by 𝐻1, is a statement that there is difference


between a parameter and a specific value, or that there is a difference between two
parameters.

- H1: μ1 ≠ μ2, H1: p1 ≠ p2


-<,>,≠
- Non-directional Alternative hypothesis (two-tailed test) - The probability is found on both
tails of the distribution.
Level of Significance, also denoted as alpha or 𝛼, is a measure of the strength of the
evidence that must be present in your sample before you will reject the null hypothesis and
conclude that the effect is statistically significant. The researcher determines the significance
level before conducting the experiment. To obtain the level of significance use the formula 𝜶 =
𝟏 − 𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 𝒍𝒆𝒗𝒆𝒍.

Types of Errors:

Type I Error: If the null hypothesis is true and rejected, the decision is incorrect.
Ex: Punishing a person who is truly innocent and putting them wrongly to jail.

Type II Error: If the null hypothesis is false and accepted, the decision is incorrect.
Ex: Bryan thinks that he is a six-footer. His actual height is 156cm.

Rejection Region refers to the region where the value of the test statistic lies for which we
will reject the null hypothesis. This region is also known as critical region.

A. Non- Directional (Two-Tailed Test) – The probability is found on both tails of the
distribution.
B. Directional (One-Tailed, Left Tail) – The probability is found at the left tail of the
distribution.

C. Directional (One-Tailed, Right Tail) – The probability is found at the right tail of the
distribution.

Other Elements of Hypothesis Testing:

Population refers to the totality of objects, individuals, characteristics, or reactions of interest


(e.g. based on the total count of votes in the national level Grace Poe was proclaimed as the
number 1 senator.)

Sample is a group of subjects carefully selected from a population of interest (e.g. As of May
15, 8:15pm, 10% of the votes have been counted and Nancy Binay is in the 5th spot.)

Parameter is the numerical value that describes characteristics of a population (e.g. total
votes)
Statistic is the numerical value that describes a particular sample (e.g. 10% of votes)

Hypothesis Testing is the process of using statistics to evaluate the utility and validity of the
research theory, and this activity always begins with formulating statement or expectation to a
certain phenomenon.

These are the things to consider in formulating hypotheses:

● It should be reasonable, stated in definite terms;


● It should follow the findings of the previous studies;
● It should be testable, stated in well-defined operational form (mean
● or proportion).

These are the steps in formulating null and alternative hypothesis:

● Check the type of measurement used on the given. Use “µ” for mean/ average and “p”
for proportion.
● Assess whether the statement denotes a direction.
Hint: a directional hypothesis contains greater than, less than, at least, at most, and other
similar terms.

● Use the following symbol for null hypothesis:

For non-directional statement:


H0: µ = ___ H0: p = ___
For directional statement:
H0: µ ≥ ___ H0: µ ≤ ___
H0: p ≥ ___ H0: p ≤ ___

● Use the following symbol for alternative hypothesis:

For non-directional statement:


H1: µ ≠ ___ H1: p ≠ ___
For directional statement:
H1: µ < ___ H1: µ > ___
H1: p < ___ H1: p > ___

Test Statistic – a statistical way of testing a hypothesis whether to reject the null hypothesis
and it also compares your data with what is expected under the null hypothesis.

Z – Test – a statistical way of testing hypothesis given the following conditions:

● if the sample size n is large enough, population mean 𝝁 and the population variance 𝝈²
are known.

● if the sample size n is large enough, population mean 𝝁 is known, and the population
variance 𝝈² is unknown. (by applying Central Limit Theorem sample variance 𝒔² may
be used as an estimate value of the population variance 𝝈²)

T – Test – a statistical way of testing hypothesis given the following conditions:

● if the sample size n is less than 30 ( 𝒏 < 𝟑𝟎 ) but the population variance 𝝈² is known.

● if the sample size n is less than 30 ( 𝒏 < 𝟑𝟎 ) but the population variance 𝝈² is
unknown. (If we assume that the sample comes from a normally distributed
population, then the sample variance 𝒔² can be used to estimate population variance
𝝈².)
Example of how to solve problems involving:

a. State the null hypothesis (𝐻₀) and the alternative hypothesis (𝐻ₐ).
b. Determine the test statistic to be used for the hypothesis test, and calculate its value.
c. Identify the critical value for the test and define the critical region.
d. Make a decision and draw a conclusion by comparing the calculated value of the test
statistic with the critical value.

1. A school claims that their Grade 11 HUMSS students study an average of 3 hours per
day. A teacher suspects this is an overestimate and decides to test the claim. She
takes a random sample of 30 students and finds that they study an average of 2.5
hours per day with a standard deviation of 0.6 hours.

Solution:
2. A school wants to know if a new learning technique improves students' test scores in
Math. Last year, the average score of students was 75. This year, a group of 40
students using the new technique has an average score of 78 with a standard
deviation of 5.

Solution:

P-value - a measure of how much sample data deviates from what is expected under the
assumption of a true null hypothesis.

Steps in Hypothesis Testing using P-Value Method

1. Describe the population parameter of interest.


2. Formulate the null and alternative hypothesis.
3. Check the assumptions.
4. Choose a significance level size for α.
Make α small when the consequences of rejecting a true Ho is severe.
● - Is the test two-tailed or one-tailed?
● - Get the critical values from the test statistic table.
● - Establish the critical regions.
5. Select the appropriate test statistic.
● - Compute the test statistic using the appropriate formula.
6. State the decision rule for rejecting or not the null hypothesis.
For a two-tailed test:
● - Reject Ho if the computed probability value (multiplied by two) is ≥ α.
● - Do not reject Ho if the computed probability value (multiplied by two) is < α.
For a one-tailed test (Right):
● - Reject Ho if the computed probability value is ≥ α.
● - Do not reject Ho if the computed probability value is < α.
For a one-tailed test (Left):
● - Reject Ho if the computed probability value is ≤ α.
● - Do not reject Ho if the computed probability value is > α.
● *decision is dependent on the confidence level or alpha (α)
7. Compare the computed probability value and alpha (α).
● - Decide
● - Interpret

The central limit theorem states that if you have a population with mean μ and standard
deviation σ and take sufficiently large random samples from the population with replacement,
then the distribution of the sample means will be approximately normally distributed.

We have mentioned in the previous lesson that the population proportion can be estimated
only for large sample size (n ≥ 30). The same is true in testing a claim or hypothesis about the
population proportion (p).
To test a claim about population proportion, we use the z-test for population proportion.

As in the use of z-test for means, the decision rule below is used:
● If Zcomputed ≥ Zcritical REJECT Ho
● If Zcomputed < Zcritical Do not Reject Ho

Example:

A researcher who is studying on the rapid growth of the population wants to determine the
proportion of female rats in a certain region, then he doesn’t need to catch every rat he sees
and record its gender. He only needs a sufficient sample from which he will make inference
about the proportion of female rats.

The researcher may initially believe that 50% of the rat population are female. Out of 50 rats
he collected, 23 are female.

From the above example the researcher wants to test his belief that 50% or 0.5 of the
population of rats is female. From the collected samples, 23 out of 50 are female. Would this
support the claim? Use α = 0.05.

Using the five-step hypothesis testing procedure:

Null Hypothesis (Ho) and Alternative Hypothesis (Ha)


● H0: p = 0.5
● Ha: p ≠ 0.5

1. Statistical test= z-test for proportions (two tailed)


● α = 0.05.
● Zcritical= 1.96 (see the table)
3. Decision: Reject or not to reject Ho.
● Since the computed z is less than the critical value of z, Ho is NOT REJECTED.

4. Conclusion: There is no sufficient evidence to deny the researcher’s claim.


● Thus, 50% of the rat population are female.

Another example:

Newborn babies are more likely to be boys than girls. A random sample found 13,173 boys
were born among 25,468 newborn children. The sample proportion of boys was 0.5172. Is
this sample evidence that the birth of boys is more common than the birth of girls in the entire
population? (Use 5% level of significance)

Solution:

State the null and alternative hypothesis


● Ho : p = 0.5
● Ha : p > 0.5
Conclusion: There is a sufficient evidence to conclude that boys are common than girls in the
entire population.

Some research studies involve two variables. One of these two variables is the independent
variable and the other one is the dependent variable.

● Independent Variable: is the variable that may affect the dependent variable to
change.
● Dependent Variable: is the variable that is influenced or affected by the independent
variable.

The data collection in this type of study that involves two variables are called bivariate data.

Bivariate data deal with two variables that are compared in order to find or establish their
relationship.

Examples:
1) Number of hours spend in studying and corresponding test scores
2) Ice cream sales versus the temperature on that day
3) IQ scores and the amount of sleeping time
4) Mileage and age of the car
5) Height and weight of children below 18 yrs. Old

The relationship of variables in bivariate data can be displayed using a graph called scatter
plot.

● A scatter plot is the most common display of qualitative data. It shows patterns,
trends, relationship and possible extraordinary value/s between the variable.
● Using the scatter plot, we can describe the form, direction and strength of association
between two variables.

● In terms of the form or shape, we can describe if there is a linear relationship


between two variables – that is, the points closely follow a straight line or if they form a
curve while increasing or decreasing steadily. It is also possible that there is no
underlying form.

We can also describe the relationship of the variables by looking at the direction of the points
on the scatter plot. The pattern has a positive direction if it runs from the lower left to the
upper right. If it runs from the upper left to the lower right, then it has a negative direction. It
tells us whether the values on the two variables go up or down together or not.

The strength of the pattern can also be described in the scatter plot. It is related to how
closely clustered the points are around the form. It tells us the degree to which values of one
variable are related to the values of the second variable. We normally used the words, weak,
moderate or strong to describe the strength of associations or relationship.
Steps in Constructing a Scatter Plot
1. Draw a graph and label the x- and y- axes.
2. Assign each qualitative variable to an axis.
3. Choose a range for each axis that includes the maximum and the minimum values in
the data set.
4. Plot each point on the graph.

Example:
1st Semester Grade vs. 2nd Semester Grade of Ten Grade 11 Students

Thus, the scatter plot describes a positive relationship between the 1st Semester Grade
and 2nd Semester Grade of Ten Grade 11 Students.

Correlation analysis is a statistical method used to determine whether a relationship


between two variables exist.

Direction of Correlation

● Positive Correlation exists when high values of one variable correspond to high values
in the other variable or low values in one variable correspond to low values in the other
variable.
● Negative Correlation exists when high values of one variable correspond to low values
in the other variable or low values in one variable correspond to high values in the
other variable.
● Zero Correlation exists when high values in one variable correspond to either high or
low values in the other variable

Strength of Correlation
● Perfect
● Very high
● Moderately high
● Moderately low
● Very low
● Zero

The trend line is the line closest to the point. The direction of the line tells the direction of
correlation that exist between the variables. If the trend line points to the right, its slope is
positive, thus there is a positive correlation between two variables. If it points to the left, there
is negative correlation between two variables.

The Pearson Product-Moment Correlation Coefficient, also called the sample correlation
coefficient r, is a widely used statistical measure of strength of a linear relationship between
two variables. It is given by

We will use the given table to determine the strength of the computed r. :
Example

Determine the value of Pearson r for the following data and interpret the results.

● a) Construct the table shown below

● b) Complete the table above by:

● c) Use the Pearson Product Moment Correlation Formula to solve for r and interpret.

Solving for r
● r = 0.32; moderately low but positive

Regression analysis is a statistical treatment of data which involves identifying the


relationship between a dependent variable and one or more independent variables.
Regression analysis is used to:

1. determine the strength of the predictors, that is, identifying the strength of the effect
that the independent variable(s) have on a dependent variable.
2. forecast effects or impact of changes, that is, understanding how much the dependent
variable changes with a change in one or more independent variables; and
3. predict trends and future values, that is, getting a point estimates.

The basic and commonly used regression analysis is the linear regression. Linear
regression estimates are used to explain the relationship between one dependent variable
and one or more independent variables.

The simplest form of linear regression is called simple linear regression. It is a linear
regression model with two-dimensional sample points, one dependent variable and one
independent variable.

● An independent variable is a variable that is hypothesized to have an impact on the


dependent variable, can be manipulated or changed, and usually denoted by X.
● The dependent variable is a variable that is being tested, its value relies or depends
on the value of the independent variable, and usually denoted by Y.
Example:

A teacher wants to know the effect of attendance on the academic performance of the
students.

Solution:

i. Place each variable in the blank found in the statement.


● Statement : “Academic performance depends upon the attendance of the student”

ii. Evaluate whether the statement is logical.


● Since the statement in i is logical, that is, the attendance is relatively responsible to
academic performance and the academic performance is relatively dependent to the
attendance, then:

❖ The independent variable is the attendance of the student. The teacher can
manipulate the length of time and the students that will participate in the experiment.

❖ The dependent variable is the academic performance. The students’ academic


performance can be affected by their attendance.

To interpret the slope and y-intercept of the regression line, remember that in regression:
Example:

Five randomly selected students were surveyed about their Statistics 1st quarter test score
and their 1st quarter grade in Statistics. Assuming that there is a significant relationship
between the two variables, determine the slope and y-intercept of the regression line. Then,
interpret the result.

Solution:

Step 1. Identify the dependent and independent variable.


● The dependent variable is the 1st quarter grade in Statistics and the independent
variable is the 1st quarter test score in Statistics.

Step 2. Accomplish the table below.

Step 3. Calculate the value of a and b in the formula, substitute the summations found
in step 2 and the sample size n given in the problem, which is 5 students, thus, n=5.
Step 4. Interpret the result.

● The slope of the regression line is 0.82, which indicates that for every grade of 0.82,
there corresponds a score of 1 in Statistics.
● The y-intercept of the regression line is 53.47, which indicates that for a test score of
0, there will be an average grade of 53.47 in Statistics

In the example, the y-intercept does not make sense because we don’t expect that the score
to be near 0.

The field of Statistics that deals with prediction is called regression analysis.

The horizontal axis representing the independent variable and the vertical axis
representing the dependent variable. In this function Y is the dependent variable which is the
event expected to change; and X is the independent variable which is manipulated. To solve
for Y, substitute the given value of X.

Y= f (X)

Linear regression quantifies the relationship between one or more predictor variables and
one outcome variable.

● For example it can be used to quantify the relative impacts of age, gender, and diet
(the predictor variables) on height (the outcome variable). Y is the outcome or
dependent variable whereas X is the predictor or independent variable.

● When the trend line is drawn, we observed that some of the points are on the line
while others are below or above the line. In other words, we say that the points in the
scatterplot regress with reference to the line. If the average Y distances of the points
from this line is the least, then we call this line the regression line or the line that
“best fit” in the scatterplot. The regression line is the same as the trend line.
● If two variables are correlated, we can predict the value of one variable in terms of the
other variable. The relationship or correlation must be significant. This means that the
actual relationship exists in the population, not just in the sample.
● The regression analysis is then used to predict the value of one variable in terms of
the other variable. Thus, we do correlation analysis first before performing
regression analysis.

To solve for the correlation coefficient (r),

The regression line Y’ = bX + a is also called the line prediction equation because we use it
to predict Y if X is known. Since in the analysis, only the Y distance was considered, the line
cannot be used to predict X from Y.

To determine the regression line or do the regression analysis, we go through the following
steps:

1. Find the value of the correlation coefficient (r)


2. Test the significance of r. If r is significant, proceed to regression analysis (Proceed to
Step 3). If r is not significant , regression analysis cannot be done (Stop)

STEPS IN TESTING THE SIGNIFICANCE OF r


★ a. State the null and alternative hypothesis
★ b. Compute for the value of t
★ c. Compare the computed value of t with the critical value of t, as found in the table.
Based on the null hypothesis, the test calls for a two-tailed test. The degree of
freedom is n-2
★ d. Make the decision

DECISION RULE
➢ If the computed t ≥ critical value of t then, reject Ho, accept the HA
Interpretation: There is a significant relationship between the two variables

➢ If the computed t < critical value of t then, accept Ho.


Interpretation: There is no significant relationship between the two variables

3. Find the values of a and b.

4. Substitute the values of a and b in the regression line Y’ = bX + a.

Example 1

➔ If the computed t= 7.35 and the critical t= 2.105 , what would be the interpretation if
the null hypothesis is rejected?

ANSWER
The null hypothesis is rejected, there is a significant relationship between the two variables.

Example 2

➔ The following data pertains to the heights of fathers and their eldest sons in inches. Is
there a significant relationship between the two variables, predict the height of the son
if the height of his father is 78 inches.

Solution:
1. Identify the dependent and independent variable.
➔ Solution: Here, the dependent variable is the height of the son while the independent
variable is the height of the father.

2. Compute the correlation coefficient (r) using the formula


➔ Solution:

3. Test the significance of r using the formula


➔ a. Ho: There is no significant relationship between the number of height of the father
and height of the son. Ha: There is a significant relationship between the two variables

➔ b. Solving for t

4. Compare the computed t-value to the critical t-value


➔ Solution: Using df=n-2=10-2=8, a=0.05, two-tailed test, we find from the table that the
critical value of t is 2.306

5. Make a decision
➔ Solution: Since the computed t=8.61 is greater than the critical t=2.306, we reject the
null hypothesis. So, there is a significant relationship between the two variables.

6. Summarize the results


➔ Solution: There is a sufficient evidence to conclude that there is a significant
relationship between number of height of the father and height of the son. Thus, we
will proceed to regression analysis.

7. Compute the values of a and b in the regression equation Y’=bX+a. Using the
following formulas:

➔ Solution: Using the values obtained in Step 2, we have the following:

8. Form the regression equation.


➔ Solution: Substitute the values of a and b in the equation Y’=bX+a

Y’=0.78X+16.55

9. Predict the height of the son if the height of the father is 78 inches
➔ Solution: Find the value of Y when X=78 in the regression equation.

So, the predicted height of the son whose father is 78 inches is 77 inches. Remember that
this is just a predicted value based on the given data.

@ eleven humss descartes ‘24-’25


Thank you and godbless!
★ dhru_js

You might also like