Inferential Statistics and Linear Regression
Inferential Statistics and Linear Regression
▪ What is inferential
Statistics?
▪ Making inferences
about populations
based on samples
Descriptive Vs Inferential Statistics
• Inferential Statistics refers to
▪ Descriptive – Description of population using sample
Statistics • Hypothesis Testing: Testing of
refers to statements about population based
▪ Summary/ on sample characteristics
Description of – F Test
sample – T Test
only(not
population) – Chi-square Test
• Predictions
– Regression
– Classification
Some examples of inferential
statistics
It is given that μ = 4 minutes. To do any calculations, you must know λ, the decay parameter. λ = 1/μ .
Therefore, λ = 1/4 = 0.25
For example, f(5) = 0.072. The postal clerk spends five minutes with the customers.
Exponential Distribution
import numpy as np
import seaborn as sns
scale = 1 / 3.3
draws = np.random.exponential(scale, size = 1_000_000)
sns.kdeplot(draws, shade=True, color='xkcd:lightish blue')
Normal(Gaussian) Distribution
• For what data, Normal
Distribution fits
– When probability of
occurrence of extreme value
from mean is low
• Example data where Normal
distribution fits
– Body temperature
– People's height
– Car mileage
– IQ scores
– Error distribution of
observed values of sensors
• Why to fit distribution
– To infer the occurrence of
events
Sample Distribution
• What is Conjecture?
– Any statement which is either true or false
• What is Statistical Hypothesis?
– Conjecture that can be tested experiments / observations
• Eg:
– Given drug X, and disease d, X is effective in treating d
– Avg monthly salary of an Indian is 10k
– Avg monthly salary of an Indian and a Chinese is the
same
– Performance of Algorithms A and B are statistically the
same
Statistical Hypothesis Testing
Hypothesis Testing using Z-Test to test
population mean
Suppose later that further testing shows that the machine was
working properly, what type of error did the employee make
(Type 1 or Type 2)?
P(x<2.31) =0.9896
Z-Table
Steps of Z-test for left tail to
population mean
Step 1: Formulate H0 and H1 Note:
The objective is to reject
H0: PM=50 (PM denotes Population mean)
null hypothesis when
H1: PM <50 population mean is
significantly less than50
Step 2: Select Significance Level
alpha = 5%
𝒚 = 𝒃𝟎 + 𝒃𝟏 𝒙
𝑦−𝑦 2
Standard Error of the Estimate =
𝑛−2
𝑥−𝑥 𝑦−𝑦
𝑏1 =
𝑥−𝑥 2
Linear Regression
x y
1 2
2 4
3 5
4 4
5 5
∑= 15 ∑= 20
𝑥=3 𝑦=4
Scatter plot
ȳ= 𝑏0 + 𝑏1x̄
(As per the example, to calculate the y intecercept
4 = 𝑏0 + 0.6 3
Contd.,
𝟐 𝟐
x-𝒙 y-𝒚 𝒙−𝒙 (x- 𝒚 ( 𝒚-y) 𝒚−y
𝒙)(y−𝒚)
0 1 0 0 4 -1 1
𝑦−𝑦 2 2⋅4
= = 0 ⋅ 89
𝑛−2 5−2
xi −𝑥 yi −𝑦 6
𝑏1 (𝑆𝑙𝑜𝑝𝑒) = = = 0.6
xi−𝑥 2 10