0% found this document useful (0 votes)
3 views62 pages

4b Hypothesis

The document discusses inferential statistics, focusing on chi-squared tests for hypothesis testing, including goodness of fit and independence tests. It provides examples related to petroleum engineering, such as evaluating coolant formulations and oil types, and explains the F-test for comparing variances and ANOVA for comparing means across multiple groups. The document emphasizes the importance of hypothesis testing in determining statistical significance in various engineering applications.

Uploaded by

lordfortkiller
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views62 pages

4b Hypothesis

The document discusses inferential statistics, focusing on chi-squared tests for hypothesis testing, including goodness of fit and independence tests. It provides examples related to petroleum engineering, such as evaluating coolant formulations and oil types, and explains the F-test for comparing variances and ANOVA for comparing means across multiple groups. The document emphasizes the importance of hypothesis testing in determining statistical significance in various engineering applications.

Uploaded by

lordfortkiller
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

PE 367: Statistics for Petroleum Engineers

Dr. Stephen Adjei


Department of Petroleum Engineering, KNUST.
Email: [email protected]
Office: PB, 444

Inferential Statistics: Hypothesis Testing


Chi-squared (χ )
2 Tests
§ The chi-squared test measures the difference between
observed (actual) and expected frequencies to evaluate
whether the differences are due to chance or a
significant factor.
§ It evaluates how well observed data matches an
expectation.

§ It is a right tail test.


Application of Chi-squared hypothesis tests

§ The goodness of fit test checks if a


sample observed distribution/pattern
matches an expected
distribution/pattern.
§ It tests if the differences between
observed frequencies and expected
frequencies are due to random variation
or a significant factor.
Hypotheses for goodness of fit:

üNull Hypothesis (Ho): The observed


distribution fits/follows the expected
distribution.

üAlternative Hypothesis (Ha): The observed


distribution does not fit the expected
distribution.
Example
§ An engineering team at a vehicle manufacturing
company is evaluating three different coolant
formulations (Coolant A, Coolant B, and
Coolant C) to determine their effectiveness in
maintaining optimal engine temperatures.

§ They hypothesize that each coolant formulation


will perform equally well, leading to optimal
performance for approximately one-third of the
vehicles tested.
§ The team defines optimal performance as
maintaining the engine temperature within a
specified range of 90-100°C under standard
operating conditions.

§ If a vehicle’s engine temperature consistently


stays within this range during testing, it is
counted as achieving optimal performance.
§ To conduct the experiment, the team tests a random
sample of 75 vehicles, assigning 25 vehicles to each
coolant type. Based on their hypothesis, they expect an
equal proportion (distribution) of optimal performance,
meaning they anticipate that 25 vehicles for each
coolant will meet the performance criteria.

§ Thus, the expected optimal distribution (proportion) is:

ü Coolant A: 25
ü Coolant B: 25
ü Coolant C: 25
§ After running the tests, the team records the
observed counts of vehicles achieving optimal
performance for each coolant:

üCoolant A: 15 vehicles met the criteria,


üCoolant B: 10 vehicles met the criteria,
üCoolant C: 20 vehicles met the criteria.

§ Observed Counts:
üCoolant A: 15
üCoolant B: 10
üCoolant C: 20
§ Use a chi-square goodness of fit test to assess whether the
observed proportion of optimal performance significantly
deviates from their expected distribution (proportion) of 25
vehicles for each coolant.
§ The chi-square test will compare the observed counts with the
expected counts to determine if the differences are statistically
significant.
§ This analysis will help the team conclude whether any of the
coolant formulations perform better or worse than anticipated,
thereby aiding in the selection of the most effective coolant for
high-performance engines.
§ The expected distribution serves as a baseline, allowing the team
to understand if their initial assumptions about coolant
performance hold true.
Example
A petroleum company wants to determine if the types of
oil drilled from three different fields occur in the
expected proportions: 40% light crude, 35% medium
crude, and 25% heavy crude.
In a sample of 200 barrels from recent operations, they
found 80 light crude, 70 medium crude, and 50 heavy
crude. At a significance level of 0.05, is there a
significant difference from the expected distribution?
Solution
§ Null Hypothesis (Ho): The observed frequencies
match the expected distribution.

§ Alternative Hypothesis (Ha): The observed


frequencies do not match the expected
distribution.
Critical value determination

ü Since χ2=0 < 5.991, we fail to reject the null hypothesis


ü There is no significant difference from the expected distribution.
The sample data supports the claim that the distribution of oil
types matches the expected proportions.
Hypotheses for test of independence:
The test of independence examines if two
categorical variables are related or independent.

ü Null Hypothesis (Ho): There is no association between the


two categorical variables (they are independent).

ü Alternative Hypothesis (Ha): There is an association between


the two categorical variables (they are not independent).
Example:
üDetermine if there’s an association between
equipment failure rates and operational
conditions (like temperature or pressure) in an
plant.
A petroleum engineering team is analyzing whether there is a
relationship between drilling depth categories (Shallow, Medium,
Deep) and well productivity levels (Low, Medium, High) in a
particular oil field. They collect data on 100 wells and categorize
them by depth and productivity as follows:

Low Medium High Row Total


Shallow 10 15 5 30
Medium 5 20 15 40
Deep 10 5 15 30
Column Total 25 40 35 100
§ Using a chi-squared test of independence, determine if
depth and productivity are associated at a significance
level of 0.05.
§ Null Hypothesis (Ho): Drilling depth and well
productivity are independent.

§ Alternative Hypothesis (Ha): Drilling depth and


well productivity are not independent.
Low Medium High Row Total
Shallow 10 15 5 30
Medium 5 20 15 40
Deep 10 5 15 30
Column Total 25 40 35 100
§ From the chi-squared table, at a 0.05 significance level and 4
degrees of freedom, the critical value is 9.488.
§ Since χ2 = 9.75 > 9.488, we reject the null hypothesis. There is
significant evidence to conclude that drilling depth and
productivity levels are associated in this oil field.
Assignment- Use a Python Code
• A candy manufacturer claims that the candies produced come in four
colors in equal proportions (i.e., each color should appear 25% of the
time). A quality control manager randomly selects 200 candies and
records the following observed counts for each color:
ü Red: 45
ü Blue: 55
ü Green: 48
ü Yellow: 52
• Using the chi-squared goodness-of-fit test, determine at the α=0.05
significance level whether the observed color distribution significantly
differs from the expected equal distribution.
• Remember to State the hypothesis
F-Table (F-Distribution Table)
F-test
The F-test is a statistical test used to determine whether the
variances (spread) of two groups differ significantly or if the
observed difference is due to random chance.

The F-distribution is used in two main applications:


§ Comparing Variances (F-test for Variance Equality):
Determines if two populations have equal variances.

§ ANOVA (Analysis of Variance): Compares multiple group


means by analyzing the ratio of variance between groups to
variance within groups.
§ When using the F-distribution, we check whether the
higher variability (spread) in one group is statistically
significant or just due to random chance.

§ If the F-test suggests the difference is likely due to


chance, we cannot conclude that a real cause exists.
The variability might just be natural fluctuation,
meaning no corrective action is needed.

§ If the F-test suggests the difference is real, we can


investigate the cause of the increased variability and
find a solution if needed.
F-Test for Variance Equality
Example
§ You’re comparing the variability in production rates from two
oil reservoirs.
§ You want to know if one reservoir's production is more
unpredictable than the other.

ü Reservoir A has production rates of: 500,520,530 barrels/day.


ü This shows small changes. The numbers are close to each
other.
ü Reservoir B has production rates of: 400,600,800 barrels/day.
ü This shows large changes. The numbers are spread out more
widely. v After running the F-test, you find that the
difference in variability between the two is
not statistically significant (because it could
just be random variation).
§ In an F-test, the data is assumed to follow an
F-distribution.
Hypothesis
§ Null Hypothesis (Ho): The variances of the two
populations are equal.

§ Alternative Hypothesis (Ha): The variances of


the two populations are different.
§ The F distribution has two degrees of freedom:

ü df1 for the numerator: rows

ü df2 for the denominator: columns

§ There is a different F distribution for each


combination of the degrees of freedom of the
numerator and denominator.
§ Since there are so many F distributions, the F
tables are organized somewhat differently than
the tables for the other distributions.

§ In order to use the F table, first select the


significance level to be used, and then
determine the appropriate combination of
degrees of freedom.
§ The table is for a particular combination of numerator
and denominator degrees of freedom.

§ Not every single value exist on the tables.

§ Computer formulas can assist with the computation of


every number.
Test Statistic for the F Distribution
§ For comparing two variances, the F-statistic is
calculated as:
Degrees of Freedom

§ Numerator degrees of freedom: df1=n1−1 where


n1 is the size of the first sample—read on the row.

§ Denominator degrees of freedom: df2=n2−1,


where n2 is the size of the second sample- read
on the column.
Example
§ A mechanical engineer is testing two different coolant
fluids, Coolant X and Coolant Y, to evaluate their heat
absorption capabilities in a heat exchanger system.
§ The goal is to determine if the variability in heat
absorption rates is significantly different between the
two coolants.
§ The engineer collects a sample of 16 measurements
for Coolant X and 10 measurements for Coolant Y
under controlled conditions, recording their
respective heat absorption rates in kW.
§ The measurements for each coolant sample are
assumed to follow a normal distribution.
§ The sample variances observed are:

§ Using a significance level of α=0.05, perform an


F-test to determine if there is a significant difference
in the variances of heat absorption rates between
Coolant X and Coolant Y.
Solution
§ Null Hypothesis (Ho): The variances of heat
absorption rates for Coolant X and Coolant Y are
equal.

§ Alternative Hypothesis (Ha): : The variances of


heat absorption rates for Coolant X and Coolant
Y are different.
§ The larger variance should be the numerator.

Degrees of freedom for Coolant X sample, df1=nX−1=16−1=15

Degrees of freedom for Coolant Y sample, df2=nY−1=10−1=9


ü Using an F-distribution
table with α=0.05 and df
values of 15 and 9, the
critical F-value is
approximately 3.01.

MAKE A CONCLUSION
• We fail to reject the null hypothesis.

• Note that Hypothesis testing never proves anything.

• It only provides evidence for or against a claim.

• Failing to reject Ho doesn’t mean the variances are


exactly equal; it just means we don’t have strong
enough evidence to say they are different. The
observed variance difference could be due to
random chance.

• Rejecting Ho doesn’t mean the variances are


absolutely different just that the evidence suggests
they are unlikely to be equal given the chosen
significance level (α).
Analysis of Variance (ANOVA)
§ ANOVA, is a statistical method used to compare the
means of three or more groups to see whether there
are any statistically significant differences between
the means of three or more independent groups.

§ ANOVA applies the F-test to check for differences in


group means by analyzing within-group and
between-group variances.

§ Note that Z and t-tests can be used to compare the


means of two groups.
Types of ANOVA

§ One-Way ANOVA is used to compare the means


of three or more groups to see if at least one of
them is different from the others.

§ Two-way ANOVA tests means across two


factors, often including an interaction effect.
One-Way ANOVA
§ Suppose an engineer wants to assess the effect of three
different types of drilling fluids on the rate of drill bit
wear.
§ Groups are created for each type of drilling fluid, such as
synthetic-based fluid, water-based fluid, and oil-based
fluid.
§ A one-way ANOVA could be applied to test if the mean
wear rates are significantly different across these
groups, indicating whether a particular fluid type affects
drill bit durability.
Two-Way ANOVA
§ Assume a study is conducted to determine if
well productivity is influenced by both type of
reservoir rock (e.g., sandstone, limestone,
shale) and drilling method (e.g., vertical,
directional, or horizontal drilling).

§ A two-way ANOVA can be used to examine if


there is a significant interaction between the
reservoir rock type and the drilling method on
well productivity, or if one factor independently
impacts productivity more than the other.
Hypothesis
§ Null Hypothesis: The means for all groups are the same.
Ho: 𝜇! = 𝜇" = 𝜇# =… 𝜇$

§ Alternative hypothesis: The means are atleast different for


one pair of groups.

Ha: 𝜇! ≠ 𝜇" ≠ 𝜇# ≠… 𝜇$

§ If any of the group mean is significantly different from the


overall mean, we reject the null hypothesis.
Test Statistic

!"#$"%&' )'*+''%
ANOVA =
!"#$"%&' +$*,$%
Degree of Freedom
§ Between groups degrees of freedom: This tells us how much
freedom we have to vary between the groups being compared.

K is the number of groups

§ Within groups degrees of freedom: This represents the freedom


for variability within each individual group.

n is the total number of observations across all groups


Procedure
§ The Stress levels in offshore workers at different shifts are given below.
Determine if the difference in the means of each group is significant at 0.05
significance level.
Solution
§ Hypothesis
Ho : 𝜇!= 𝜇"=𝜇#
Ho : 𝜇! ≠ 𝜇" ≠ 𝜇#

Step 1: Calculate MSSw


"
∑ "#"$!
MSSw=
%#&
"
∑ "#"!
MSSw= $#%
• Step 2

Calculate MSSB
"
∑ %! "$! #"$ #
MSSb =
&#'
• Step 3

Calculate the F-value—Test Statistic


• Step 4
Calculate the F-value (Test Statistic)

Determine if this F-value represents a significant difference


between the three groups.

• Degree of freedom between groups (row), dfB= k-1 = 3-1= 2


• Degree of freedom within groups (columns), dfW= n-k = 36-3 = 33

• The F value can be computed using software.


• We can also use our distribution Table.
• There is no 33 in our Table so for the sake of this example let’s
use 30.
This gives an F critical value of
3.32

Since Fstat > Fcritical , we


reject the null hypothesis

You might also like