0% found this document useful (0 votes)
2 views

Lecture 03

This document covers the foundations of data science, focusing on the chi-square (𝜒²), t, and F distributions. It explains the properties, probability density functions (pdf), cumulative distribution functions (CDF), and examples of how to use these distributions in statistical analysis. Additionally, it provides tables for reference and example problems for practice.

Uploaded by

kht07144
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture 03

This document covers the foundations of data science, focusing on the chi-square (𝜒²), t, and F distributions. It explains the properties, probability density functions (pdf), cumulative distribution functions (CDF), and examples of how to use these distributions in statistical analysis. Additionally, it provides tables for reference and example problems for practice.

Uploaded by

kht07144
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

SEHH2311

FOUNDATIONS OF DATA SCIENCE


LECTURE 3
F, t and 𝝌𝝌𝟐𝟐 Distributions
Topics
1. 𝜒𝜒 2 distribution
2. t distribution
3. F distribution

SEHH2311 Foundations of Data Science Page 2


𝟐𝟐
𝝌𝝌 Distribution
• Suppose 𝑍𝑍1 , 𝑍𝑍2 , …, 𝑍𝑍𝑘𝑘 are iid 𝑁𝑁(0,1) random variables.
• If 𝑋𝑋 = 𝑍𝑍12 + 𝑍𝑍22 + ⋯ + 𝑍𝑍𝑘𝑘2 , then 𝑋𝑋 has a chi-square distribution with 𝑘𝑘 degrees
of freedom, denoted by 𝝌𝝌𝟐𝟐 (𝒌𝒌).
• pdf of 𝜒𝜒 2 random variable
1 𝑘𝑘
−1 −
𝑥𝑥
𝑓𝑓 𝑥𝑥 = 𝑥𝑥 2 𝑒𝑒 2 , 0 ≤ 𝑥𝑥 < ∞
𝑘𝑘
Γ 2 𝑘𝑘/2
2

where Γ 𝑧𝑧 = ∫0 𝑢𝑢 𝑧𝑧−1 𝑒𝑒 −𝑢𝑢 𝑑𝑑𝑑𝑑 (Gamma Function)
• pdf of 𝜒𝜒 2 is skewed to the right.
• CDF of 𝜒𝜒 2 random variable
𝑥𝑥
𝐹𝐹 𝑥𝑥 = ∫0 𝑓𝑓 𝑢𝑢 𝑑𝑑𝑑𝑑 (This is somewhat complicated)
• 𝐸𝐸 𝑋𝑋 = 𝑘𝑘
• 𝑉𝑉𝑉𝑉𝑉𝑉 𝑋𝑋 = 2𝑘𝑘

SEHH2311 Foundations of Data Science Page 3


𝟐𝟐
pdf of 𝝌𝝌 Distribution
pdf of 𝜒𝜒 2 with different degrees of freedom

https://en.wikipedia.org/wiki/Chi-squared_distribution

SEHH2311 Foundations of Data Science Page 4


𝟐𝟐
CDF of 𝝌𝝌 Distribution
CDF of 𝜒𝜒 2 with different degrees of freedom

https://en.wikipedia.org/wiki/Chi-squared_distribution

SEHH2311 Foundations of Data Science Page 5


𝟐𝟐
Using 𝝌𝝌 Distribution Table
𝜒𝜒 2 Distribution Table is typically used 𝜒𝜒 2 Distribution with 5 d.f.
when constructing confidence intervals
and testing hypotheses.
Example:
If 𝑋𝑋 is 𝜒𝜒 2 (5), then 𝐹𝐹 1.61 = 0.1.
or d.f.
2
𝜒𝜒0.9; 5 = 1.61
1 − 𝐹𝐹(𝑥𝑥)

F(x) 0.005 0.010 0.025 0.050 0.100 0.900 0.950 0.975 0.990 0.995
1-F(x) 0.995 0.990 0.975 0.950 0.900 0.100 0.050 0.025 0.010 0.005
d.f.
1 0.00 0.00 0.00 0.00 0.02 2.71 3.84 5.02 6.63 7.88
2 0.01 0.02 0.05 0.10 0.21 4.61 5.99 7.38 9.21 10.60
3 0.07 0.11 0.22 0.35 0.58 6.25 7.81 9.35 11.34 12.84
4 0.21 0.30 0.48 0.71 1.06 7.78 9.49 11.14 13.28 14.86
5 0.41 0.55 0.83 1.15 1.61 9.24 11.07 12.83 15.09 16.75

SEHH2311 Foundations of Data Science Page 6


Example 1
(a) If 𝑋𝑋~𝜒𝜒 2 (10), find 𝑤𝑤 such that 𝑃𝑃 𝑋𝑋 ≤ 𝑤𝑤 = 0.975
(b) If 𝑋𝑋~𝜒𝜒 2 6 , find 𝑤𝑤 such that 𝑃𝑃 𝑋𝑋 > 𝑤𝑤 = 0.95
2
(c) Find 𝜒𝜒0.01;(12) .
2
(d) Find 𝜒𝜒0.975;(12) .
Solution:
(a) 𝑤𝑤 = 20.48
(b) 𝑤𝑤 = 1.64
2
(c) 𝜒𝜒0.01; 12 = 26.22
2
(d) 𝜒𝜒0.975; 12 = 4.40

SEHH2311 Foundations of Data Science Page 7


Example 2
(a) If 𝑋𝑋~𝜒𝜒 2 (9), find 𝑤𝑤 such that 𝑃𝑃 𝑋𝑋 ≤ 𝑤𝑤 = 0.05
(b) If 𝑋𝑋~𝜒𝜒 2 13 , find 𝑤𝑤 such that 𝑃𝑃 𝑋𝑋 > 𝑤𝑤 = 0.9
2
(c) Find 𝜒𝜒0.025;(3) .
2
(d) Find 𝜒𝜒0.995;(10) .
Solution:

SEHH2311 Foundations of Data Science Page 8


Example 2
(a) If 𝑋𝑋~𝜒𝜒 2 (9), find 𝑤𝑤 such that 𝑃𝑃 𝑋𝑋 ≤ 𝑤𝑤 = 0.05
(b) If 𝑋𝑋~𝜒𝜒 2 13 , find 𝑤𝑤 such that 𝑃𝑃 𝑋𝑋 > 𝑤𝑤 = 0.9
2
(c) Find 𝜒𝜒0.025;(3) .
2
(d) Find 𝜒𝜒0.995;(10) .
Solution:
(a) 𝑤𝑤 = 3.33
(b) 𝑤𝑤 = 7.04
2
(c) 𝜒𝜒0.025; 3 = 9.35
2
(d) 𝜒𝜒0.995; 10 = 2.16

SEHH2311 Foundations of Data Science Page 9


F Distribution
• Suppose 𝑌𝑌1 and 𝑌𝑌2 are independent random variables, such that 𝑌𝑌1 ~𝜒𝜒 2 (𝑑𝑑1 ),
𝑌𝑌2 ~𝜒𝜒 2 (𝑑𝑑2 )
𝑌𝑌1 /𝑑𝑑1
• If 𝑋𝑋 = , then 𝑋𝑋 is said to have a 𝐹𝐹-distribution with 𝑑𝑑1 and 𝑑𝑑2 degrees
𝑌𝑌2 /𝑑𝑑2
of freedom. i.e. 𝑋𝑋~𝐹𝐹(𝑑𝑑1 , 𝑑𝑑2 )
• 𝑑𝑑1 is the numerator d.f. and 𝑑𝑑2 is the denominator d.f.
• pdf of 𝐹𝐹-distribution
𝑑𝑑1 𝑑𝑑1 +𝑑𝑑2
𝑑𝑑1 −
1 𝑑𝑑1 2
2 −1
𝑑𝑑1 2
𝑓𝑓 𝑥𝑥 = 𝑥𝑥 1 + 𝑥𝑥 , 0 ≤ 𝑥𝑥 < ∞
𝑑𝑑 𝑑𝑑 𝑑𝑑2 𝑑𝑑2
𝐵𝐵 21 , 22

where 𝐵𝐵(𝑢𝑢1 , 𝑢𝑢2 ) is the Beta function.


• pdf of F-distribution is skewed to the right.
• CDF of 𝐹𝐹-distribution
𝑥𝑥
𝐹𝐹 𝑥𝑥 = � 𝑓𝑓 𝑢𝑢 𝑑𝑑𝑑𝑑 (This is somewhat complicated)
0

SEHH2311 Foundations of Data Science Page 10


F Distribution
𝑑𝑑2
• 𝐸𝐸 𝑋𝑋 = , for 𝑑𝑑2 > 2.
𝑑𝑑2 −2
– When 𝑑𝑑2 ≤ 2, 𝐸𝐸(𝑋𝑋) doesn’t exist

2𝑑𝑑22 (𝑑𝑑1 +𝑑𝑑2 −2)


• 𝑉𝑉𝑉𝑉𝑉𝑉 𝑋𝑋 = , for 𝑑𝑑2 > 4
𝑑𝑑1 𝑑𝑑2 −2 2 (𝑑𝑑2 −4)
– When 𝑑𝑑2 ≤ 4, 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋) doesn’t exist

SEHH2311 Foundations of Data Science Page 11


pdf of F-Distribution
pdf of F-Distributions with different degrees of freedom

f(x)

SEHH2311 Foundations of Data Science Page 12


CDF of F-Distribution
CDF of F-Distributions with different degrees of freedom

F(x)

SEHH2311 Foundations of Data Science Page 13


Using F-Distribution Table
F-Distribution tables are often used F-Distribution with 𝑑𝑑1 = 3 and 𝑑𝑑2 = 5
when conducting hypothesis testing.

Example:
If 𝑋𝑋~𝐹𝐹(3,5), then 𝑃𝑃 𝑋𝑋 > 12.06 = 0.01
or
𝐹𝐹0.01; 3,4 = 12.06

F-Distribution Table (𝛼𝛼 = 0.01)


d1
d2 1 2 3 4 5 6 7 8 9 10 ∞
1 4052.18 4999.50 5403.35 5624.58 5763.65 5858.99 5928.36 5981.07 6022.47 6055.85 6365.86
2 98.50 99.00 99.17 99.25 99.30 99.33 99.36 99.37 99.39 99.40 99.50
3 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 27.35 27.23 26.13
4 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 14.66 14.55 13.46
5 16.26 13.27 12.06 11.39 10.97 10.67 10.46 10.29 10.16 10.05 9.02

SEHH2311 Foundations of Data Science Page 14


Example 3
(a) If 𝑋𝑋~𝐹𝐹 3,10 , find 𝑤𝑤 such that 𝑃𝑃 𝑋𝑋 ≤ 𝑤𝑤 = 0.99
(b) If 𝑋𝑋~𝐹𝐹 5,15 , find 𝑤𝑤 such that 𝑃𝑃 𝑋𝑋 > 𝑤𝑤 = 0.025
(c) Find 𝐹𝐹0.05;(7,60) .
(d) Find 𝐹𝐹0.01;(30,10) .

Solution:
(a) 𝑤𝑤 = 6.55
(b) 𝑤𝑤 = 3.58
(c) 𝐹𝐹0.05; 7,60 = 2.17
(d) 𝐹𝐹0.01; 30,10 = 4.25

SEHH2311 Foundations of Data Science Page 15


Example 4
(a) If 𝑋𝑋~𝐹𝐹 6,12 , find 𝑤𝑤 such that 𝑃𝑃 𝑋𝑋 ≤ 𝑤𝑤 = 0.975
(b) If 𝑋𝑋~𝐹𝐹 2,20 , find 𝑤𝑤 such that 𝑃𝑃 𝑋𝑋 > 𝑤𝑤 = 0.01
(c) Find 𝐹𝐹0.025;(10,30) .
(d) Find 𝐹𝐹0.05;(8,120) .

Solution:
(a)
(b)
(c)
(d)

SEHH2311 Foundations of Data Science Page 16


Example 4
(a) If 𝑋𝑋~𝐹𝐹 6,12 , find 𝑤𝑤 such that 𝑃𝑃 𝑋𝑋 ≤ 𝑤𝑤 = 0.975
(b) If 𝑋𝑋~𝐹𝐹 2,20 , find 𝑤𝑤 such that 𝑃𝑃 𝑋𝑋 > 𝑤𝑤 = 0.01
(c) Find 𝐹𝐹0.025;(10,30) .
(d) Find 𝐹𝐹0.05;(8,120) .

Solution:
(a) 𝑤𝑤 = 3.73
(b) 𝑤𝑤 = 5.85
(c) 𝐹𝐹0.025; 10,30 = 2.51
(d) 𝐹𝐹0.05; 8,120 = 2.02

SEHH2311 Foundations of Data Science Page 17


t-Distribution
t-Distribution is also called Student's t-distribution, but why do we
need t-distribution?

𝑋𝑋−𝜇𝜇
If 𝑋𝑋 is 𝑁𝑁(𝜇𝜇, 𝜎𝜎 2 ), we know that 𝑍𝑍 = is 𝑁𝑁(0,1). i.e. standard
𝜎𝜎
score of normal random variables will distributed as standard
normal.

In many practical situations, 𝜎𝜎 is unknown. What will happen if we


𝑋𝑋−𝜇𝜇
replace it by sample standard deviation 𝑠𝑠? (i.e. )
𝑠𝑠

𝑋𝑋−𝜇𝜇
Will be standard normal or something else?
𝑠𝑠

SEHH2311 Foundations of Data Science Page 18


t-Distribution
• If 𝑌𝑌 and 𝑍𝑍 are independent random variables such that 𝑌𝑌~𝜒𝜒 2 (𝜈𝜈) and
𝑍𝑍~𝑁𝑁(0,1).
𝑍𝑍
• If we let 𝑋𝑋 = , then 𝑋𝑋 is said to have a Student’s t Distribution with 𝜈𝜈
𝑌𝑌/𝜈𝜈
degrees of freedom. i.e 𝑋𝑋~𝑡𝑡(𝜈𝜈).
• pdf of t-distribution
𝜈𝜈 + 1 −
𝜈𝜈+1
Γ 𝑥𝑥 2 2
𝑓𝑓 𝑥𝑥 = 2
𝜈𝜈 1 + 𝜈𝜈 , −∞ < 𝑥𝑥 < ∞
𝜈𝜈𝜈𝜈Γ
2
where Γ(𝑢𝑢) is the gamma function.
• CDF of t-distribution
𝑥𝑥
𝐹𝐹 𝑋𝑋 = ∫−∞ 𝑓𝑓 𝑢𝑢 𝑑𝑑𝑑𝑑,−∞ < 𝑥𝑥 < ∞

SEHH2311 Foundations of Data Science Page 19


t-Distribution
• If 𝑌𝑌 and 𝑍𝑍 are independent random variables such that 𝑌𝑌~𝜒𝜒 2 (𝜈𝜈) and
𝑍𝑍~𝑁𝑁(0,1).
𝑍𝑍
• If we let 𝑋𝑋 = , then 𝑋𝑋 is said to have a Student’s t Distribution with 𝜈𝜈
𝑌𝑌/𝜈𝜈
degrees of freedom. i.e 𝑋𝑋~𝑡𝑡(𝜈𝜈).
• pdf of t-distribution
𝜈𝜈 + 1 −
𝜈𝜈+1
Γ 𝑥𝑥 2 2
𝑓𝑓 𝑥𝑥 = 2
𝜈𝜈 1 + 𝜈𝜈 , −∞ < 𝑥𝑥 < ∞
𝜈𝜈𝜈𝜈Γ
2
where Γ(𝑢𝑢) is the gamma function.
• pdf of t-distribution is symmetric about 0, very similar to normal. However it
has heavier tails.
• CDF of t-distribution
𝑥𝑥
𝐹𝐹 𝑋𝑋 = ∫−∞ 𝑓𝑓 𝑢𝑢 𝑑𝑑𝑑𝑑,−∞ < 𝑥𝑥 < ∞

SEHH2311 Foundations of Data Science Page 20


t-Distribution
• 𝐸𝐸 𝑋𝑋 = 0 for 𝜈𝜈 > 1
– when 𝜈𝜈 = 1, 𝐸𝐸(𝑋𝑋) doesn’t exist
𝜈𝜈
• 𝑉𝑉𝑉𝑉𝑉𝑉 𝑋𝑋 = for 𝜈𝜈 > 2
𝜈𝜈−2
– when 𝜈𝜈 ≤ 2, 𝑉𝑉𝑉𝑉𝑉𝑉(𝑋𝑋) doesn’t exist
• When 𝜈𝜈 goes to infinity, the distribution of 𝑋𝑋 approaches the
standard normal distribution 𝑁𝑁(0,1).

SEHH2311 Foundations of Data Science Page 21


pdf of t-distribution
pdf of t-distribution with different degrees of freedom

SEHH2311 Foundations of Data Science Page 22


CDF of t-distribution
CDF of t-distribution with different degrees of freedom

SEHH2311 Foundations of Data Science Page 23


Using t-Distribution Table
t-Distribution table is often used when t-Distribution with 4 degrees of freedom
constructing confidence intervals and carrying
out hypotheses tests.
Example:
If 𝑋𝑋~𝑡𝑡(4), then

𝑃𝑃 𝑋𝑋 ≤ 2.776 = 0.975
or
𝑡𝑡0.025; 4 = 2.776

F(x) 0.600 0.650 0.700 0.750 0.800 0.850 0.900 0.950 0.975 0.990 0.995
α =1-F(x) 0.400 0.350 0.300 0.250 0.200 0.150 0.100 0.050 0.025 0.010 0.005
d.f.
1 0.325 0.510 0.727 1.000 1.376 1.963 3.078 6.314 12.706 31.821 63.657
2 0.289 0.445 0.617 0.816 1.061 1.386 1.886 2.920 4.303 6.965 9.925
3 0.277 0.424 0.584 0.765 0.978 1.250 1.638 2.353 3.182 4.541 5.841
4 0.271 0.414 0.569 0.741 0.941 1.190 1.533 2.132 2.776 3.747 4.604
5 0.267 0.408 0.559 0.727 0.920 1.156 1.476 2.015 2.571 3.365 4.032

SEHH2311 Foundations of Data Science Page 24


Example 5
(a) If 𝑋𝑋~𝑡𝑡(9), find 𝑤𝑤 such that 𝑃𝑃 𝑋𝑋 ≤ 𝑤𝑤 = 0.75.
(b) If 𝑋𝑋~𝑡𝑡(9), find 𝑤𝑤 such that 𝑃𝑃 𝑋𝑋 ≤ 𝑤𝑤 = 0.25.
(c) If 𝑋𝑋~𝑡𝑡(25), find 𝑤𝑤 such that 𝑃𝑃 𝑋𝑋 > 𝑤𝑤 = 0.1.
(d) Find 𝑡𝑡0.005;(15).
(e) Find 𝑡𝑡0.99; 6 .
Solution:
(a) 𝑤𝑤 = 0.703
(b) 𝑤𝑤 = −0.703
(c) 𝑤𝑤 = 1.316
(d) 𝑡𝑡0.005; 15 = 2.947
(e) 𝑡𝑡0.99; 6 = −𝑡𝑡0.01 6 = −3.143

SEHH2311 Foundations of Data Science Page 25


Example 6
(a) If 𝑋𝑋~𝑡𝑡(10), find 𝑤𝑤 such that 𝑃𝑃 𝑋𝑋 ≤ 𝑤𝑤 = 0.9.
(b) If 𝑋𝑋~𝑡𝑡(12), find 𝑤𝑤 such that 𝑃𝑃 𝑋𝑋 > 𝑤𝑤 = 0.99.
(c) If 𝑋𝑋~𝑡𝑡(8), find 𝑤𝑤 such that 𝑃𝑃 𝑋𝑋 > 𝑤𝑤 = 0.2.
(d) Find 𝑡𝑡0.8;(11) .
(e) Find 𝑡𝑡0.1; 13 .

Solution:
(a)
(b)
(c)
(d)
(e)

SEHH2311 Foundations of Data Science Page 26


Example 6
(a) If 𝑋𝑋~𝑡𝑡(10), find 𝑤𝑤 such that 𝑃𝑃 𝑋𝑋 ≤ 𝑤𝑤 = 0.9.
(b) If 𝑋𝑋~𝑡𝑡(12), find 𝑤𝑤 such that 𝑃𝑃 𝑋𝑋 > 𝑤𝑤 = 0.99.
(c) If 𝑋𝑋~𝑡𝑡(8), find 𝑤𝑤 such that 𝑃𝑃 𝑋𝑋 > 𝑤𝑤 = 0.2.
(d) Find 𝑡𝑡0.8;(11) .
(e) Find 𝑡𝑡0.1; 13 .

Solution:
(a) 𝑤𝑤 = 1.372
(b) 𝑤𝑤 = −2.681
(c) 𝑤𝑤 = 0.889
(d) 𝑡𝑡0.8; 11 = −0.876
(e) 𝑡𝑡0.1; 13 = 1.350

SEHH2311 Foundations of Data Science Page 27


Motivations of 𝒕𝒕, 𝝌𝝌𝟐𝟐 and 𝑭𝑭 Distributions
• The distributions in this lectures are motivated by random
variables encountered when estimating unknown population
parameters.
∑𝑛𝑛 � 2
𝑖𝑖=1 𝑋𝑋𝑖𝑖 −𝑋𝑋
• Sample variance 𝑠𝑠 2 = is related to 𝜒𝜒 2 distribution
𝑛𝑛−1

𝑋𝑋−𝜇𝜇
• t-distribution is related to 𝑡𝑡 = 𝑠𝑠 , which is a standard score
𝑛𝑛
with population variance replaced by sample variance.
• F-distribution is used in the comparison of sample variances of
two populations

SEHH2311 Foundations of Data Science Page 28

You might also like