0% found this document useful (0 votes)
104 views

Probability_piyushwairale

Uploaded by

Shreyas Bhatt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views

Probability_piyushwairale

Uploaded by

Shreyas Bhatt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

GATE

Data Science & AI


2025
Probability &
Statistics
For GATE DA Course & Test
Series
Visit:
www.piyushwairale.com
GATE Data Science and AI Study Materials
Probability & Statistics
by Piyush Wairale

Instructions:
• Kindly go through the lectures/videos on our website www.piyushwairale.com
• Read this study material carefully and make your own handwritten short notes. (Short notes must not be
more than 5-6 pages)
• Attempt the mock tests available on portal.
• Revise this material at least 5 times and once you have prepared your short notes, then revise your short
notes twice a week

• If you are not able to understand any topic or required a detailed explanation and if there are any typos or
mistake in study materials. Mail me at [email protected]
Contents
1 Fundamental Principles of Counting 5
1.1 Addition Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Multiplication Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Permutations 5

3 Combinations 6

4 Introduction to Probability 7
4.1 Definition of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.2 Theorems of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

5 Conditional Probability and Bayes Theorem 9


5.1 Bayes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

6 Discrete and Continuous Random Variables 12


6.1 Probability Mass Function (PMF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6.2 Probability Density Function (PDF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

7 Expectation (Mean), Variance, and Standard Deviation 13


7.1 Expectation (Mean) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
7.3 Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.4 Covariance and Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

8 Discrete Distributions 17
8.1 Discrete Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
8.2 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
8.3 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

9 Continuous Distributions 19
9.1 Uniform/Rectangular Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
9.2 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
9.3 Standard Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
9.4 Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

10 Joint Distribution of Random Variables 22


10.1 Joint Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
10.2 Conditional Probability Functions of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . 23
10.3 Independent Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

11 Mean, Median, Mode and Standard Deviation 24


11.1 Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
11.2 Median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
11.3 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
11.4 Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

12 t-distribution 26
12.1 Key Characteristics: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
12.1.1 Comparing the t-distribution and Normal Distribution: . . . . . . . . . . . . . . . . . . . . . 26
12.2 Formula for the t-distribution: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
12.3 Mean, Median, Mode and SD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

13 Chi-Square Distribution 28
13.1 Key Characteristics of the Chi-Square Distribution: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
13.2 Key Formulas for the Chi-Square Tests: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2
14 Hypothesis Testing 30
14.1 Key Terms: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
14.2 Steps in Hypothesis Testing: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
14.3 Significance Level and Confidence Level in Terms of t-Value . . . . . . . . . . . . . . . . . . . . . . . 32

15 t-test 34

16 z-test 36

17 chi-test 37

18 Central Limit Theorem 40


18.1 Key Points of the Central Limit Theorem: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
LinkedIn

Youtube Channel

Instagram

Telegram Group

Facebook

Download Andriod App


1 Fundamental Principles of Counting
1.1 Addition Principle
If there are n1 ways to perform task 1 and n2 ways to perform task 2, and the tasks cannot be done simultaneously,
then there are n1 + n2 ways to perform either task.
Example: If there are 3 routes from city A to B and 2 routes from city A to C, and you want to go to either
B or C, there are 3 + 2 = 5 ways.

1.2 Multiplication Principle


If there are n1 ways to perform task 1 and n2 ways to perform task 2, then there are n1 × n2 ways to perform both
tasks in succession.
Example: If you have 3 shirts and 2 pants, there are 3 × 2 = 6 ways to choose an outfit.

2 Permutations
A permutation is an arrangement of objects in a specific order. The order of arrangement is important.

Note: In a permutation, the order of arrangement of the objects is important. Thus abc is a different permu-
tation from bca.
In a combination, the order in which objects are selected does not matter.
Thus abc and bca are the same combination.

Formula for Permutations


• Number of permutations of n distinct objects taken r at a time:
n!
P (n, r) = nP r =
(n − r)!

where n! (n factorial) is the product of all positive integers up to n.

Permutations of All Objects


• Number of permutations of n distinct objects:

n! = n × (n − 1) × · · · × 2 × 1

Permutations with Repetition


• Number of permutations of n objects where n1 are of type 1, n2 are of type 2, . . . , nk are of
type k:
n!
n1 ! × n2 ! × · · · × nk !

Circular Permutations
• Number of circular permutations of n distinct objects:

– Without distinguishing rotations:


(n − 1)!
– If reflections are considered identical:
(n − 1)!
2
Permutations with Restrictions
• Derangements: Number of permutations where no element appears in its original position.
– Number of derangements of n objects (denoted by !n):
n
X (−1)k
!n = n!
k!
k=0

– Approximate formula:
n!
!n ≈
e
where e is the base of the natural logarithm (e ≈ 2.71828).

3 Combinations
A combination is a selection of objects where the order does not matter.

Formula for Combinations


• Number of combinations of n distinct objects taken r at a time:
n!
C(n, r) = nCr =
r! × (n − r)!

Properties of Combinations
• Symmetry Property:
nCr = nC(n − r)

• Addition Formula:
nCr + nC(r − 1) = (n + 1)Cr

Combinations with Repetition


• Number of ways to choose r objects from n types with unlimited repetitions:
(n + r − 1)!
C(n + r − 1, r) =
r! × (n − 1)!

Pascal’s Triangle and Binomial Theorem


Pascal’s Triangle
• A triangular array where each number is the sum of the two numbers directly above it.
• The n-th row corresponds to the coefficients in the expansion of (a + b)n .

Binomial Theorem
• Expansion of (a + b)n :
n
X
n
(a + b) = nCr an−r br
r=0

Applications:

• Finding coefficients in binomial expansions.


• Solving combinatorial identities.
4 Introduction to Probability
Random Experiment
Consider an action which is repeated under essentially identical conditions. If it results in any one of the several
possible outcomes, but it is not possible to predict which outcome will appear, such an action is called a Random
Experiment. One performance of such an experiment is called a Trial.

Sample Space
The set of all possible outcomes of a random experiment is called the sample space. All the elements of the sample
space together are called as exhaustive cases. The number of elements of the sample space i.e. the number of
exhaustive cases is denoted by n(S) or N or n.

Event
Any subset of the sample space is called as an Event and is denoted by some capital letter like A, B, C, or A1 , A2 ,
A3 , ..., or B1 , B2 , ... etc.

Favourable Cases
The cases which ensure the happening of an event A, are called as the cases favourable to the event A. The number
of cases favourable to event A is denoted by n(A) or NA or nA .

Mutually Exclusive Events or Disjoint Events


Two events A and B are said to be mutually exclusive or disjoint if A ∩ B = ∅, i.e. if there is no element common
to A and B.

Equally Likely Cases


Cases are said to be equally likely if they all have the same chance of occurrence i.e. no case is preferred to any
other case.

4.1 Definition of Probability


Consider a random experiment which results in a sample space containing n(S) cases which are exhaustive, mutually
exclusive, and equally likely. Suppose, out of n(S) cases, n(A) cases are favourable to an event A. Then, the
probability of event A is denoted by P (A) and is defined as follows:

n(A)
P (A) =
n(S)

Complement of an Event
The complement of an event A is denoted by A′ and it contains all the elements of the sample space which do not
belong to A.
For example: Random experiment: an unbiased die is rolled.

S = {1, 2, 3, 4, 5, 6}

A = {1, 4}, A′ = {2, 3, 5, 6}


P (A) + P (A′ ) = 1 i.e. P (A) = 1 − P (A′ )
For any events A and B,
P (A ∩ B) = P (A) · P (B)
Independent Events
Two events A and B are said to be independent if

P (A ∩ B) = P (A) · P (B)
Note: If A and B are independent, then

• A and B are independent


• A′ and B are independent
• A and B ′ are independent

• A′ and B ′ are independent

4.2 Theorems of Probability


Addition Theorem
If A and B are any two events, then

P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
Note:

1. A ∪ B : either A or B or both i.e. at least one of A B


2. A′ ∩ B ′ : neither A nor B i.e. none of A B
3. A ∪ B and A′ ∩ B ′ are complement to each other

P (A′ ∪ B ′ ) = 1 − P (A ∪ B)

4. If A and B are mutually exclusive, P (A ∩ B) = 0

P (A ∪ B) = P (A) + P (B)

5. For three events, we have:

P (A1 ∪ A2 ∪ A3 ) = P (A1 ) + P (A2 ) + P (A3 ) − P (A1 ∩ A2 ) − P (A1 ∩ A3 ) − P (A2 ∩ A3 ) + P (A1 ∩ A2 ∩ A3 )


5 Conditional Probability and Bayes Theorem
The conditional probability of an event A given that another event B has occurred is denoted by P (A|B) and
is defined as:
P (A ∩ B)
P (A|B) = , provided P (B) > 0.
P (B)
Conditional probability measures the probability of event A occurring when it is known that event B has already
occurred.

Properties
1. Multiplication Rule:
P (A ∩ B) = P (B) × P (A|B) = P (A) × P (B|A).
2. If A and B are independent:
P (A|B) = P (A).
3. Total Probability: X
P (Ai |B) = 1,
i
where {Ai } are mutually exclusive and exhaustive events.

Example
Suppose we have a standard deck of 52 playing cards. What is the probability that a card drawn at random is a
King given that it is a face card?
Solution:
• Total face cards: Jack, Queen, King in each suit ⇒ 12 cards.
• Number of Kings: 4.
• Conditional probability:
4
P (King and Face card) 52 4 1
P (King|Face card) = = 12 = = .
P (Face card) 52
12 3

Law of Total Probability


If {B1 , B2 , . . . , Bn } is a partition of the sample space S (i.e., mutually exclusive and exhaustive events), then for
any event A:
Xn
P (A) = P (Bi )P (A|Bi ).
i=1

The law of total probability allows us to compute the probability of an event A by considering all the different
ways A can occur through the events Bi .

Example
A factory has three machines M1 , M2 , and M3 producing 30%, 45%, and 25% of the total products, respectively.
The defect rates are 1%, 2%, and 3%, respectively. What is the probability that a randomly selected product is
defective?
Solution:
• Let D be the event that a product is defective.
• Using the law of total probability:
P (D) = P (M1 )P (D|M1 ) + P (M2 )P (D|M2 ) + P (M3 )P (D|M3 )
P (D) = 0.30 × 0.01 + 0.45 × 0.02 + 0.25 × 0.03 = 0.0175.
5.1 Bayes’ Theorem
Bayes’ theorem relates the conditional and marginal probabilities of random events. For events A and B with
P (B) > 0:
P (B|A)P (A)
P (A|B) = .
P (B)

Extended Form (Multiple Events)


For a partition {B1 , B2 , . . . , Bn } of the sample space S:

P (A|Bk )P (Bk )
P (Bk |A) = Pn .
i=1 P (A|Bi )P (Bi )

Bayes’ theorem allows us to update the probability of an event based on new evidence or information.

Example
Using the previous factory example, if a randomly selected product is found to be defective, what is the probability
it was produced by machine M2 ?
Solution:

• We need to find P (M2 |D).

• Using Bayes’ theorem:


P (D|M2 )P (M2 )
P (M2 |D) =
P (D)
0.02 × 0.45 0.009 9 18
P (M2 |D) = = = = ≈ 0.5143.
0.0175 0.0175 17.5 35

Independent Events
Events A and B are independent if:
P (A ∩ B) = P (A) × P (B).

Conditional Independence
Events A and B are conditionally independent given event C if:

P (A ∩ B|C) = P (A|C) × P (B|C).

Relation to Conditional Probability


If A and B are independent, then:

P (A|B) = P (A), P (B|A) = P (B).

Total Probability and Bayes’ Theorem in Continuous Cases


Continuous Random Variables
For continuous random variables, the concepts extend using probability density functions (pdfs).
Law of Total Probability
If X is a continuous random variable with pdf fX (x), and Y is another variable, then:
Z ∞
P (Y ) = P (Y |X = x)fX (x) dx.
−∞

Bayes’ Theorem
Bayes’ theorem for continuous variables:

fY |X (y|x)fX (x)
fX|Y (x|y) = .
fY (y)
6 Discrete and Continuous Random Variables
A discrete random variable takes the values that are finite or countable. For example, when we consider the
experiment of tossing 3 coins, the number of heads can be appreciated as a discrete random variable X. X would
take 0, 1, 2, 3 as possible values.
A continuous random variable takes values in the form of intervals. Also, in the case of a continuous random
variable P (X = c) = 0, where c is a specified point. Heights and weights of people, area of land held by individuals,
etc., are examples of continuous random variables.

6.1 Probability Mass Function (PMF)


If X is a discrete random variable, which can take the values x1 , x2 , . . . and f (x) denote the probability that X
takes the value xr , then p(x) is called the Probability Mass Function (PMF) of X.

p(xr ) = P (X = xr )
The values that X can take and the corresponding probabilities determine the probability distribution of X.
We also have the following conditions:
1. p(x) ≥ 0
P
2. p(x) = 1

6.2 Probability Density Function (PDF)


If X is a continuous random variable, then a function fR(x), x ∈ I (interval), is called a Probability Density Function.
The probability statements are made as P (X ∈ I) = I f (x) dx.
We also have:

1. f (x) ≥ 0
R∞
2. −∞ f (x) dx = 1
The probability P (X ≤ x) is called the cumulative distribution function (CDF) of X and is denoted by F (X).
It is a point function. It is defined for discrete and continuous random variables. The following are the properties
of probability distribution function F (x):

1. F (x) ≥ 0

2. F (x) is non-decreasing, i.e., for x > y, F (x) ≥ F (y)


3. F (x) is right continuous
4. F (−∞) = 0 and F (+∞) = 1

5. P (a < X ≤ b) = F (b) − F (a)

For a continuous random variable:

• P r{x ≤ X ≤ x + dx} = F (x + dx) − F (x) = f (x) dx where dx is very small


• f (x) = d
dx [F (x)] where:
1. f (x) ≥ 0 for all x ∈ R
R∞
2. −∞ f (x) dx = 1
7 Expectation (Mean), Variance, and Standard Deviation
7.1 Expectation (Mean)
Mathematical Expectation is the weighted mean of values of a variable.
If X is a random variable which can assume any one of the values x1 , x2 , . . . , xn , with the respective probabilities
p1 , p2 , . . . , pn , then the mathematical expectation of X is given by

E(X) = p1 x1 + p2 x2 + · · · + pn xn
Expected value for a discrete random variable is given by:
n
X
E(X) = xi P (xi )
i=1

For a continuous random variable,


Z ∞
E(X) = xf (x) dx where f (x) is the PDF of X.
−∞

Expected value for a continuous random variable is given by:


Z b
E(X) = xf (x) dx
a

Properties of Expectation
Linearity of Expectation
1. Addition:
E[X + Y ] = E[X] + E[Y ].

2. Scalar Multiplication:
E[aX] = a E[X],
where a is a constant.

3. General Linearity: " n #


X n
X
E ai Xi = ai E[Xi ],
i=1 i=1

where ai are constants and Xi are random variables.

Note: The linearity property holds regardless of whether the random variables are independent or not.

Expectation of a Constant
E[c] = c,
where c is a constant.

Expectation of a Function of a Random Variable


For a function g(X):

• Discrete: X
E[g(X)] = g(x) P (X = x).
x

• Continuous: Z ∞
E[g(X)] = g(x) fX (x) dx.
−∞
Product of Independent Random Variables
If X and Y are independent random variables:

E[XY ] = E[X] × E[Y ].

Non-negative Random Variables


If X ≥ 0, then:

E[X] ≥ 0.

Jensen’s Inequality
For a convex function g:

E[g(X)] ≥ g(E[X]).
For a concave function, the inequality is reversed.
Example 1:

Let X be a random variable with E[X] = 5. Find E[3X + 2].


Solution:

E[3X + 2] = 3E[X] + 2 = 3 × 5 + 2 = 17.

7.2 Variance
Variance of a random variable is given by:

V ar(X) = E((X − µ)2 ) = E(X 2 ) − E(X)2 = E(X 2 ) − µ2

Properties of Variance
Variance of a Constant
Var(c) = 0,
where c is a constant.

Scaling Property
For a constant a:

Var(aX) = a2 Var(X).

Variance of a Linear Combination


For random variable X and constants a and b:

Var(aX + b) = a2 Var(X).
Note: Adding a constant b does not affect the variance.

Variance of the Sum of Random Variables


For any two random variables X and Y :

Var(X + Y ) = Var(X) + Var(Y ) + 2 Cov(X, Y ),


where Cov(X, Y ) is the covariance between X and Y .
Variance of Independent Random Variables
If X and Y are independent:

Cov(X, Y ) = 0,
so:

Var(X + Y ) = Var(X) + Var(Y ).

Generalization to n Independent Random Variables


If X1 , X2 , . . . , Xn are independent random variables:
n
! n
X X
Var Xi = Var(Xi ).
i=1 i=1

Variance of the Difference of Random Variables


Var(X − Y ) = Var(X) + Var(Y ) − 2 Cov(X, Y ).
For independent X and Y :

Var(X − Y ) = Var(X) + Var(Y ).

7.3 Standard Deviation


Standard deviation is the square root of variance:
p
σ= V ar(X)

Note
1. Expected value µ = E(X) is a measure of central tendency.
2. Standard deviation σ is a measure of spread.

7.4 Covariance and Correlation


Covariance
Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])] = E[XY ] − E[X]E[Y ].

Properties of Covariance
1. Symmetry:
Cov(X, Y ) = Cov(Y, X).

2. Covariance with Itself:


Cov(X, X) = Var(X).

3. Linearity:
Cov(aX + b, Y ) = a Cov(X, Y ).
Cov(X, aY + b) = a Cov(X, Y ).

4. Addition:
Cov(X1 + X2 , Y ) = Cov(X1 , Y ) + Cov(X2 , Y ).
Correlation Coefficient
Cov(X, Y )
ρXY = ,
σX σY
where σX and σY are the standard deviations of X and Y .
Properties:

• −1 ≤ ρXY ≤ 1.

• ρXY = 1 if Y = aX + b with a > 0.


• ρXY = −1 if Y = aX + b with a < 0.
• ρXY = 0 indicates no linear relationship (but not necessarily independence).

Example 2:

Let X and Y be independent random variables with Var(X) = 4 and Var(Y ) = 9. Find Var(2X − 3Y + 5).
Solution:

1. Calculate Variance:

Var(2X − 3Y + 5) = 22 Var(X) + (−3)2 Var(Y ) = 4 × 4 + 9 × 9 = 16 + 81 = 97.

Chebyshev’s Inequality
For any random variable X with finite mean µ and variance σ 2 , and for any k > 0:
1
P (|X − µ| ≥ kσ) ≤ .
k2
Interpretation: The probability that X deviates from its mean by k standard deviations or more is at most
1
k2 .
8 Discrete Distributions
8.1 Discrete Uniform Distribution
A discrete random variable defined for values of x from 1 to n is said to have a uniform distribution if its probability
mass function is given by
1
f (x) = , for x = 1, 2, 3, . . . , n
n
f (x) = 0, otherwise
The cumulative distribution function F (x) of the discrete uniform random variable x is given by:

x
 n , for 1 ≤ x ≤ n

F (x) = 1, for x > n

0, for x < 1

The mean of X = µ is
n+1
µ=
2
The variance of X = σ 2 is

n2 − 1
σ2 =
12

8.2 Binomial Distribution


An experiment which is made of n independent trials, each of which resulting in either ‘success’ with probability p
or ‘failure’ with probability q = 1 − p, then the probability distribution for the random variable X when represents
the number of success is called a binomial distribution. The probability mass function,
 
n x n−x
p(x) = b(x; n, p) = p q , x = 0, 1, 2, . . . , n
x

Properties of Binomial Distribution


1. E(X) = np (mean)
2. V (X) = E(X 2 ) − (E(X))2 = npq (variance) (mean > variance)

3. SD(X) = npq

4. Mode of a binomial distribution lies between (n + 1)p − 1 ≤ x ≤ (n + 1)p


5. If X1 ∼ b(n1 , p) and X2 ∼ b(n2 , p) and if X1 and X2 are independent, then X1 + X2 ∼ b(n1 + n2 , p), where
(n, p) is the pmf of binomial distribution.

8.3 Poisson Distribution


A random variable X is said to follow a Poisson distribution with parameter λ > 0, if it assumes only non-negative
values and its probability mass function is given by

e−λ λx
p(x) = p(x; λ) = , x = 0, 1, 2, . . . λ>0
x!
p(x) = 0, otherwise
In a binomial distribution if n is large compared to p, then np approaches a fixed constant λ. Such a distribution
is called Poisson distribution (limiting case of binomial distribution).
Properties of Poisson Distribution
P∞
1. E(X) = x=0 x · p(x) = λ

2. V (X) = E(X 2 ) − (E(X))2 = λ



3. SD(X) = λ (Mean = Variance)
4. Mode of a Poisson distribution lies between λ − 1 and λ

5. If X1 ∼ P (λ1 ) and X2 ∼ P (λ2 ), and X1 , X2 are independent, then X1 + X2 ∼ P (λ1 + λ2 )


9 Continuous Distributions
9.1 Uniform/Rectangular Distribution
A continuous random variable x defined on [a, b] is said to have a uniform distribution if its probability density
function is given by
1
f (x) = , for x ∈ [a, b]
b−a
f (x) = 0, otherwise
The cumulative distribution function of the continuous uniform random variable X is given by:

0,
 for x < a
F (x) = x−ab−a , for a≤x≤b

1, for x > b

• Mean of X = µ = a+b
2

(b−a)2
• Variance of X = σ 2 = 12

9.2 Normal Distribution


A continuous random variable X is said to follow a normal distribution with mean µ and variance σ 2 if its
probability density function (pdf) is given by:

(x − µ)2
 
1
fX (x) = √ exp − , −∞ < x < ∞.
2πσ 2 2σ 2
This is denoted as:

X ∼ N (µ, σ 2 ).
The graphical representation of normal distribution is as follows:

Note
• Symmetry: The normal distribution is symmetric about the mean µ.

• Mean, Median, Mode: All are equal and located at x = µ.


• Inflection Points: The curve changes concavity at x = µ ± σ.
• Standard Deviation: Measures the spread of the distribution; larger σ means wider spread.
• Total Area: The total area under the curve is 1.
The 68-95-99.7 Rule
Approximately:

• 68% of the data falls within one standard deviation (µ ± σ).


• 95% within two standard deviations (µ ± 2σ).
• 99.7% within three standard deviations (µ ± 3σ).

9.3 Standard Normal Distribution


The standard normal distribution is a special case of the normal distribution with mean µ = 0 and standard
deviation σ = 1. Its pdf is:
 2
1 z
fZ (z) = √ exp − , −∞ < z < ∞.
2π 2
We denote:

Z ∼ N (0, 1).

Standardization
Any normal random variable X ∼ N (µ, σ 2 ) can be transformed into a standard normal variable Z using:
X −µ
Z= .
σ
This process is called standardization and allows us to use standard normal distribution tables to compute
probabilities.

Example 1
Suppose X ∼ N (50, 16). Find P (X > 58).
Solution:

1. Standardize X:
X −µ 58 − 50
Z= = = 2.
σ 4
2. Find P (Z > 2):
P (Z > 2) = 1 − P (Z ≤ 2) = 1 − 0.9772 = 0.0228.
(Using standard normal distribution table)

Example 2
Let X1 , X2 , . . . , X36 be independent and identically distributed random variables with mean µ = 10 and variance
σ 2 = 9. Find the approximate probability that the sample mean X̄ exceeds 11.
Solution:

1. Compute the standard deviation of X̄:


σ 3
σX̄ = √ = = 0.5.
n 6

2. Standardize X̄:
X̄ − µ 11 − 10
Z= = = 2.
σX̄ 0.5

3. Find P (X̄ > 11):


P (Z > 2) = 0.0228.
Properties of Normal Distribution
1. The function is symmetrical about the value µ.

2. It has a maximum at x = µ.
3. The area under the curve within the interval (µ ± σ) is 68%.

That is, P (µ − σ ≤ X ≤ µ + σ) = 0.68

4. A fairly large number of samples taken from a ’Normal’ population will have average, median, and mode
nearly the same, and within the limits of average ±2 × SD, there will be 95% of the values.
R∞
5. E(X) = −∞ x · f (x) dx = µ.

6. V (X) = σ 2 , SD(X) = σ.
7. For a normal distribution,
Mean = Median = Mode

8. All odd order moments about mean vanish for a normal distribution.

µ2n+1 = 0 for n = 0, 1, 2, . . .

9. If X1 ∼ N (µ1 , σ12 ) and X2 ∼ N (µ2 , σ22 ), X1 , X2 independent, then,

X1 + X2 ∼ N (µ1 + µ2 , σ12 + σ22 )

Also,
X1 − X2 ∼ N (µ1 − µ2 , σ12 + σ22 )

10. If µ = 0 and σ 2 = 1, we call it a standard normal distribution. The standardization can be obtained by the
transformation,
x−µ
z=
σ
Also,
X −µ
∼ N (0, 1)
σ

9.4 Exponential Distribution


A continuous random variable X is said to have an exponential distribution if its probability density function f (x)
is given by:
(
λe−λx , for x > 0
f (x) =
0, otherwise
Here λ is the parameter of the exponential distribution and λ > 0.
The cumulative distribution function F (x) of an exponential distribution with λ as parameter is given by:
(
1 − e−λx , if x > 0
F (x) =
0, otherwise
The mean and variance for an exponential distribution are given by:
1
Mean = µ =
λ
1
Variance = σ 2 =
λ2
10 Joint Distribution of Random Variables
Let X and Y be two discrete random variables on the same sample space S with the range space of X as Rx =
{x1 , x2 , . . . , xm } and the range space of Y as Ry = {y1 , y2 , . . . , yn } and PX (x) and PY (y) as the probability mass
functions of x and y. Then the joint probability mass function Pxy (x, y) of the two dimensional random variable
(x, y) on the range space Rx × Ry is defined as:
(
P (X = xi , Y = yj ), for (xi , yj ) ∈ Rx × Ry
PXY (xi , yj ) =
0, otherwise
This joint probability mass function can be represented in the form of a table as follows:
Pn
X Y y1 y2 ··· yn j=1 Pxy (xi , yj )
x1 Pxy (x1 , y1 ) Pxy (x1 , y2 ) · · · Pxy (x1 , yn ) PX (x1 )
x2 Pxy (x2 , y1 ) Pxy (x2 , y2 ) · · · Pxy (x2 , yn ) PX (x2 )
x3 Pxy (x3 , y1 ) Pxy (x3 , y2 ) · · · Pxy (x3 , yn ) PX (x3 )
.. .. .. .. .. ..
. . . . . .
Pm xm Pxy (xm , y1 ) Pxy (xm , y2 ) · · · Pxy (xm , yn ) PX (xm )
i=1 xy (xi , yj )
P PY (y1 ) PY (y2 ) ··· PY (yn )
From the above table, it can be easily observed that the marginal probability mass functions of X and Y ,
namely PX (x) and PY (y) respectively, can be obtained from the joint probability mass function Pxy (x, y) as:
n
X
PX (xi ) = Pxy (xi , yj ), for i = 1, 2, . . . , m
j=1

And
m
X
PY (yj ) = Pxy (xi , yj ), for j = 1, 2, . . . , n
i=1

• Pxy (xi , yj ) ≥ 0 ∀ i, j
Pm Pn
• i=1 j=1 Pxy (xi , yj ) = 1

• The cumulative joint distribution function of the two dimensional random variable (X, Y ) is given by Fxy (x, y) =
P (X ≤ x, Y ≤ y).

10.1 Joint Probability Density Function


Let X and Y be two continuous random variables on the same sample space S with fX (x) and fY (y) as the
probability density functions respectively. Then a function fXY (x, y) is called the joint probability density function
of the two dimensional random variable (X, Y ) if the probability that the point (x, y) will lie in the infinitesimal
rectangular region of area dxdy is fXY (x, y) dx dy.
That is,
 
1 1 1 1
P x − dx ≤ X ≤ x + dx, y − dy ≤ Y ≤ y + dy = fXY (x, y) dx dy
2 2 2 2
R∞ R∞
• −∞ −∞ fXY (x, y) dx dy = 1
• The marginal probability density functions fX (x) and fY (y) of the two continuous random variables X and
Y are given by: Z ∞ Z ∞
fX (x) = fXY (x, y) dy and fY (y) = fXY (x, y) dx
−∞ −∞

• The cumulative joint distribution function FXY (x, y) of the two-dimensional random variable (X, Y ) (where
X and Y are any two continuous random variables defined on the same sample space) is given by:
Z x Z y
FXY (x, y) = fXY (x, y) dx dy
−∞ −∞
10.2 Conditional Probability Functions of Random Variables
Let X and Y be two discrete (continuous) random variables defined on the same sample space with joint probability
mass (density) function fXY (x, y), then:

1. The conditional probability mass (density) function fX|Y (x|y) of X, given Y = y, is defined as:

fXY (x, y)
fX|Y (x|y) = , where fY (y) ̸= 0
fY (y)

2. The conditional probability mass (density) function fY |X (y|x) of Y , given X = x, is defined as:

fXY (x, y)
fY |X (y|x) = , where fX (x) ̸= 0
fX (x)

10.3 Independent Random Variables


Two discrete (continuous) random variables X and Y defined on the same sample space with joint probability mass
(density) function PXY (x, y) are said to be independent if and only if:

PXY (x, y) = PX (x)PY (y)


Where PX (x) and PY (y) are the marginal probability mass (density) functions of the random variables X and
Y respectively.

Note
If the random variables X and Y are independent, then

P (a ≤ X ≤ b, c ≤ Y ≤ d) = P (a ≤ X ≤ b)P (c ≤ Y ≤ d)
11 Mean, Median, Mode and Standard Deviation
11.1 Mean
Mean of a data generally refers to the arithmetic mean of the data. However, there are two more types of mean
which are geometric mean and harmonic mean. The different types of mean for a set of values and grouped data
are given in the following table.

Type of Mean Set of


P
Values Grouped
P
Data
xi Pfi xi
Arithmetic Mean n fi
1
P 
fP
i log xi
Geometric Mean (x1 x2 x3 . . . xn ) n Antilog fi
P
n fi
Harmonic Mean 
1
 P fi
x1 + x1 +···+ x1n xi
2

11.2 Median
For an ordered set of values, the median is the middle value. If the number of values is even, median is taken as
the mean of the middle two values. For grouped data, median is given by:
 
N/2 − C
Median = L + h
f
where,
X
L = Lower boundary of the median class, N= fi , C = Cumulative frequency up to the class before the median class

h = Width of the median class, f = Frequency of the median class

Note
• Median does not take into consideration all the items.
• The sum of absolute deviations taken about median is least.
• Median is the abscissa of the point of intersection of the cumulative frequency curves.

• Median is the best suited measure for open end classes

11.3 Mode
For a set of values, mode is the most frequently occurring value. For grouped data, mode is given by:
 
f1 − fi−1
Mode = L + h
(2f1 − fi−1 − fi+1 )
where,

L = Lower boundary of the modal class, f1 = Frequency of the modal class (Highest frequency)

fi−1 = Frequency of the class before the modal class, fi+1 = Frequency of the class after the modal class, h = Width of the
Note: Relation between Mean, Median and Mode:

Mean − Mode = 3(Mean − Median)


11.4 Standard Deviation
Standard deviation is given by:

(xi − x̄)2
P
σ2 = For a Set of Values
n
fi (xi − x̄)2
P
σ2 = P For Grouped Data
fi
The standard deviation σ can alternatively be calculated using the formula:
sP  P 2
x2i xi
σ= −
n n
This is a useful formula for computational purposes.

Note
1. The square of standard deviation is termed as variance.
2. Standard deviation (SD) is the least mean square deviation.

3. If each item is increased by a fixed constant, the SD does not alter or SD is independent of change of
origin.
4. Standard deviation depends on each and every data item.
5. For a discrete series in the form a, a + d, a + 2d, . . . (Arithmetic Progression, AP), the standard deviation is
given by: r
n2 − 1
SD = d
12
where n is the number of terms in the series.
t-distribution and chi-distribution
12 t-distribution
The t-distribution, also known as Student’s t-distribution, is a probability distribution that is symmetric and bell-
shaped like the normal distribution. However, it has heavier tails, meaning it is more prone to producing values
that fall far from the mean. This distribution is used primarily when the sample size is small, or the population
standard deviation is unknown.

12.1 Key Characteristics:


1. Shape:

• Bell-shaped and symmetric around the mean, similar to the normal distribution.
• Heavier tails, which means there is more variability in extreme values compared to the normal distribu-
tion.
2. Degrees of Freedom (df):

• The shape of the t-distribution depends on the degrees of freedom (df), which are related to the sample
size.
• Lower df results in heavier tails (more spread out), while higher df makes the t-distribution closer to the
normal distribution.
• Degrees of freedom are typically equal to the sample size minus one (df = n - 1) for a one-sample t-test.

3. Used in:
• Hypothesis testing (like t-tests) when:
- The sample size is small (n < 30).
- The population standard deviation is unknown.

• Estimating the mean of a normally distributed population when the sample size is small.
• Constructing confidence intervals for small samples.

12.1.1 Comparing the t-distribution and Normal Distribution:


• Both are continuous probability distributions.

• The t-distribution is more spread out (has fatter tails) when the sample size is small, which accounts for the
extra uncertainty in estimating the population standard deviation.
• As the sample size increases, the t-distribution gets closer to the normal distribution. When the sample size
is large (df > 30), you can often use the normal distribution instead.

12.2 Formula for the t-distribution:


The probability density function of the t-distribution for a given degree of freedom df is given by:
− df+1
Γ df+1
 
2 x2 2

f (x) = √ df
 1 +
dfπ Γ 2 df
Where: - Γ is the gamma function (a generalization of factorial). - x is the value of the random variable.
12.3 Mean, Median, Mode and SD
1. Mean of t-distribution (df = 3): For a t-distribution:
- The mean exists and is equal to 0 when the degrees of freedom (df) are greater than 1.
- Since df = 3 > 1, the mean of the t-distribution is:

Mean = 0

2. Median of t-distribution (df = 3): - The t-distribution is symmetric about 0. Therefore, the median is the
same as the mean.
- The median of the t-distribution is:

Median = 0

3. Mode of t-distribution (df = 3): - The mode of a t-distribution is also located at 0, as the distribution is
symmetric and centered at 0.
- The mode of the t-distribution is:

Mode = 0

4. Variance of t-distribution (df = 3): - The variance of a t-distribution depends on the degrees of freedom (df).
The variance is defined as:

df
Variance =
df − 2
- For df = 3:

3 3
Variance = = =3
3−2 1
13 Chi-Square Distribution
The chi-square distribution is a statistical distribution commonly used in hypothesis testing, especially for categor-
ical data and variance tests. It arises when you sum the squares of independent standard normal random variables.
The chi-square distribution is primarily used in chi-square tests, which are applied to:
- Test goodness of fit (how well observed data fits a theoretical distribution).
- Test independence in contingency tables (e.g., association between two categorical variables).
- Test variance (whether the variance of a population equals a specified value).

13.1 Key Characteristics of the Chi-Square Distribution:


1. Non-negative Values: Since chi-square values are based on squared terms, the distribution is always non-
negative.
2. Asymmetry: The chi-square distribution is right-skewed, especially for small degrees of freedom (df). As the
degrees of freedom increase, the distribution becomes more symmetric and approaches a normal distribution.
3. Degrees of Freedom (df): The shape of the chi-square distribution depends on its degrees of freedom. The
degrees of freedom typically represent the number of independent pieces of information available to estimate
a parameter.

Chi-Square Distribution Formula:


If X1 , X2 , . . . , Xn are n independent normal variates with means µ1 , µ2 , . . . , µn and variances σ12 , σ22 , . . . , σn2 , then
the standard normal variates are:

Xi − E(Xi ) Xi − µi
Zi = p = , i = 1, 2, . . . , n
Var(Xi ) σi
The sum of squares of standard normal variates is known as the chi-square variate with n degrees of freedom,
i.e.,
n n  2
X X Xi − µi
χ2 = Zi2 =
i=1 i=1
σi

If Z1 , Z2 , ..., Zk are independent, standard normal random variables (i.e., Zi ∼ N (0, 1)), then the sum of their
squares follows a chi-square distribution:

X 2 = Z12 + Z22 + ... + Zk2


The degrees of freedom (df) in this case are equal to k, the number of squared normal variables.

The probability density function is given by:


1 2
f (χ2 ) = (χ2 )k/2−1 e−χ /2
, 0 ≤ χ2 < ∞
2k/2 Γ(k/2)

Applications of the Chi-Square Distribution:


1. Chi-Square Goodness-of-Fit Test:
- Used to determine whether observed data fits a particular distribution.
- Example: Testing whether a six-sided die is fair by comparing observed and expected frequencies of outcomes.

2. Chi-Square Test for Independence:


- Used to determine whether two categorical variables are independent.
- Example: Testing whether gender and voting preference are independent in a sample.
3. Chi-Square Test for Variance:
- Used to test whether the variance of a population is equal to a specified value.
- Example: Testing if the variance in test scores of students matches the expected variance.

13.2 Key Formulas for the Chi-Square Tests:


1. Chi-Square Goodness-of-Fit Test:
X (Oi − Ei )2
χ2 =
Ei
Where: - Oi = observed frequency, - Ei = expected frequency.
2. Chi-Square Test for Independence (Contingency Tables):
X (Oij − Eij )2
χ2 =
Eij
Where: - Oij = observed frequency in the i, j-th cell, - Eij = expected frequency in the i, j-th cell.
3. Chi-Square Test for Variance:

(n − 1)s2
χ2 =
σ2
Where: - s2 is the sample variance, - σ 2 is the population variance, - n is the sample size.

Probability Density Function (PDF) of Chi-Square Distribution


The chi-square distribution is a special case of the gamma distribution and its probability density function (PDF)
is given by the following formula:
1
f (x; k) = x(k/2)−1 e−x/2 , x>0
2k/2 Γ(k/2)
Where:
- k is the degrees of freedom (df) of the distribution, - Γ is the gamma function, which generalizes the factorial
function, - x is the random variable, which must be greater than 0.

Key Properties of Chi-Square Distribution


• Mean: µ = k
• Variance: σ 2 = 2k
• Skewness: The chi-square distribution is positively skewed (right-skewed), especially for smaller degrees of
freedom. The skewness decreases as k increases.

Graph of Chi-Square Distribution (PDF)

The graph above shows the Probability Density Functions (PDF) of the chi-square distribution for different
degrees of freedom (df = 2, 5, 10).
- For small degrees of freedom (df = 2), the distribution is highly skewed to the right.
- As the degrees of freedom increase (df = 5 and df = 10), the distribution becomes more symmetric and shifts to
the right.
14 Hypothesis Testing
In probability theory, we set up mathematical models of processes and systems that are affected by ‘chance’. In
statistics, we check these models against the reality, to determine whether they are faithful and accurate enough for
practical purposes. The process of checking models is called statistical inference. Methods of statistical inference are
based on drawing samples (or sampling). One of the most important methods of statistical inference is ‘Hypothesis
Testing’.

Testing of Hypothesis
We have some information about a characteristic of the population which may or may not be true. This information
is called statistical hypothesis or briefly hypothesis. We wish to know, whether this information can be accepted
or to be rejected. We choose a random sample and obtain information about this characteristic. Based on this
information, a process that decides whether the hypothesis to be accepted or rejected is called testing of hypothesis.
i.e., In brief, the test of hypothesis or the test of significance is a procedure to determine whether observed samples
differ significantly from expected results.

14.1 Key Terms:


1.Null Hypothesis (H0 ):
• This is the default or initial assumption about the population.
• Example: H0 : µ = 50, meaning the population mean is equal to 50.

2.Alternative Hypothesis (Ha or H1 ):


• Opposite of Null Hypothesis
• Example: Ha : µ ̸= 50, meaning the population mean is not equal to 50.

3.Test Statistic:
• A value calculated from the sample data that is used to decide whether to reject the null hypothesis. It
measures how far the sample data diverge from the null hypothesis.
• Common test statistics include t-values (for t-tests), z-values (for z-tests), or chi-square values (for chi-square
tests).
4.Significance Level (α):
• This is the threshold we set to determine how extreme the data must be to reject the null hypothesis. It is
usually set to 0.05 (5%), meaning there’s a 5% chance of rejecting the null hypothesis when it’s actually true
(Type I error).
• Example: If α = 0.05, there’s a 5% risk of incorrectly rejecting the null hypothesis.

5.p-value:
• The p-value is the probability of obtaining a test statistic at least as extreme as the one calculated from the
data, assuming that the null hypothesis is true.
• If the p-value is less than or equal to the significance level (p ≤ α), you reject the null hypothesis.

• If the p-value is greater than the significance level (p > α), you **fail to reject** the null hypothesis.

14.2 Steps in Hypothesis Testing:


1. State the hypotheses:

• Null Hypothesis (H0 ): This is the default claim you are testing.
• Alternative Hypothesis (Ha ): This is the claim you are trying to prove.
• Example:
- H0 : The average test score is 50.
- Ha : The average test score is not 50.

2. Choose the significance level (α):


- Typically, α = 0.05 is used. It represents the probability of making a Type I error (rejecting H0 when it is
true).
3. Calculate the test statistic:

• The test statistic depends on the type of data and the test you’re conducting (t-test, z-test, etc.).
• For a t-test, the formula is:
x̄ − µ
t= √
s/ n

Where: - x̄ is the sample mean. - µ is the population mean under the null hypothesis. - s is the sample
standard deviation. - n is the sample size.
4. Determine the p-value:
• The p-value tells us how extreme the test statistic is under the assumption that H0 is true.
• The p-value is compared to α (significance level).
5. Make a decision:
• If the p-value ≤ α: Reject the null hypothesis. There is evidence to support the alternative hypothesis.
• If the p-value > α: Fail to reject the null hypothesis. There is not enough evidence to support the
alternative hypothesis.
6. Draw a conclusion: - Based on the results, you can conclude whether or not there is evidence to support the
alternative hypothesis.
Types of Hypothesis Tests:
1. One-Tailed Test: - Used when the alternative hypothesis is testing for a direction (greater than or less than).
- Example: Ha : µ > 50 (right-tailed) or Ha : µ < 50 (left-tailed).
2. Two-Tailed Test: - Used when the alternative hypothesis is testing for any difference, not a specific direction.
- Example: Ha : µ ̸= 50.

Errors in Hypothesis Testing:


• Type I Error (False Positive): - Occurs when you reject the null hypothesis when it is actually true.
- The probability of making a Type I error is the significance level (α).

• Type II Error (False Negative): - Occurs when you fail to reject the null hypothesis when the alternative
hypothesis is true.

14.3 Significance Level and Confidence Level in Terms of t-Value


In hypothesis testing and confidence interval estimation, significance level and confidence level are key concepts
used to interpret results.

1. Significance Level (α)


The significance level (α) represents the probability of rejecting the null hypothesis when it is actually true. It
is the threshold we set for determining how unlikely a result must be, under the null hypothesis, to reject that
hypothesis. In other words, it’s the probability of making a Type I error (false positive).
- A common significance level is α = 0.05, meaning there is a 5% chance of incorrectly rejecting the null hy-
pothesis when it is actually true.
- In a t-test, the significance level helps determine the critical value of the t-distribution, which is the cutoff point
for deciding whether to reject the null hypothesis.

In Terms of t-Value:
- The t-critical value is the t-value that corresponds to the significance level in a t-distribution.
- For example, in a two-tailed test with α = 0.05, the critical t-values are chosen so that 5% of the total area under
the t-distribution curve is in the tails (2.5% in each tail). If the calculated t-value is more extreme than the critical
value, we reject the null hypothesis.

Example:
For a two-tailed test with α = 0.05 and df = 9:
- The critical values from the t-table are approximately ±2.262.
- If your calculated t-value is outside this range (less than -2.262 or greater than 2.262), you reject the null hypothesis.
2. Confidence Level
The confidence level is the percentage of confidence we have that the population parameter lies within the estimated
range. It is complementary to the significance level:

Confidence Level = 1 − α
For example: - If α = 0.05, the confidence level is 1 − 0.05 = 0.95

- This means we are 95% confident that the population parameter (such as the population mean) lies within
the interval calculated from the sample data.
In Terms of t-Value:
- The confidence interval (CI) is calculated using the t-critical value based on the confidence level. For a 95%
confidence interval, 95 of the data from the t-distribution will fall between the two t-critical values, leaving 5% in
the tails (2.5% in each tail).

- To calculate the confidence interval for the population mean, we use:


s
CI = x̄ ± tα/2 × √
n
Where: - x̄ is the sample mean, - tα/2 is the t-critical value corresponding to the confidence level, - s is the
sample standard deviation, - n is the sample size.
Example:
For a sample of size 10, with df = 9, the t-critical value for a 95% confidence interval is approximately ±2.262. If
the sample mean is 50 and the sample standard deviation is 4, the 95% confidence interval for the population mean
is:
4
CI = 50 ± 2.262 × √ = 50 ± 2.86
10
So the confidence interval is approximately 47.14 ≤ µ ≤ 52.86. This means we are 95% confident that the true
population mean lies within this range.
t-test, z-test and chi-test
15 t-test
A t-value (or t-statistic) is a ratio that compares the difference between a sample statistic (like a sample mean) and
a hypothesized population parameter (like a population mean) relative to the variability in the data. It is used in
t-tests, a type of hypothesis test, when the population standard deviation is unknown and the sample size is small
(usually n < 30).
The formula for calculating the t-value is:
x̄ − µ
t=
√s
n

Where: - x̄ is the sample mean, - µ is the hypothesized population mean, - s is the sample standard deviation,
- n is the sample size.
The t-value tells you how many standard deviations the sample mean is from the hypothesized population mean.

Types of t-Tests:
1. One-Sample t-Test: Compares the sample mean to a known population mean.
t-Statistic Calculation:

x̄ − µ
t=
√s
n

2. Two-Sample t-Test: Compares the means of two independent samples to see if they are significantly different.
t-Statistic Calculation (assuming equal variances):

x̄1 − x̄2
t= r  
s2p n11 + n12

Where s2p is the pooled variance:


3. Paired t-Test: Compares two related groups (like before and after measurements on the same group).

Properties of the t-Distribution:


• Symmetry: The t-distribution is symmetric and bell-shaped, just like the normal distribution.
• Heavier Tails: The t-distribution has heavier tails than the normal distribution, which means that it is
more prone to producing values that fall far from the mean.
• Degrees of Freedom (df ): The shape of the t-distribution depends on the degrees of freedom. As the
degrees of freedom increase, the t-distribution approaches the normal distribution.
- df for one-sample t-test: n − 1
- df for two-sample t-test: n1 + n2 − 2

• Mean: The mean of the t-distribution is always 0.

• Variance: The variance is greater than 1 for small sample sizes but approaches 1 as the degrees of freedom
increase.
• Use for Small Samples: The t-distribution is especially useful when the sample size is small and the
population standard deviation is unknown.
How is the t-Value Used in Hypothesis Testing?
1. Formulate Hypotheses: - Null Hypothesis (H0 ): There is no effect or no difference. - Alternative
Hypothesis (Ha ): There is an effect or a difference.
2. Calculate the t-Value: Use the formula for the t-statistic to calculate the t-value from the sample data.
3. Determine the Critical Value: Based on the significance level (usually α = 0.05) and degrees of freedom,
look up the critical value from the t-distribution table.

4. Compare the t-Value and Critical Value: - If the absolute value of the t-value is greater than the critical
value, reject the null hypothesis.
- If the absolute value of the t-value is less than the critical value, fail to reject the null hypothesis.

5. Decision: Based on the comparison, make a conclusion about whether there is enough evidence to support
the alternative hypothesis.

Properties of the t-Value in t-Tests:


1. Sensitivity to Sample Size: The t-value is sensitive to the sample size. Larger samples result in a t-distribution
that closely resembles the normal distribution.
2. Smaller Samples, More Variability: With smaller sample sizes, the t-distribution has more variability (heavier
tails), and thus, a higher t-value is needed to reject the null hypothesis.
3. Depends on Sample Standard Deviation: The t-value incorporates the sample standard deviation, making it
more appropriate for small samples where population variability is unknown.

4. Degrees of Freedom Influence: The degrees of freedom impact the shape of the t-distribution, with lower
degrees of freedom resulting in heavier tails (larger critical values).
16 z-test
A Z-test is a statistical test used to determine whether there is a significant difference between sample and population
means, or between two sample means, when the population variance is known or when the sample size is large
(typically n > 30). The Z-test is based on the standard normal distribution (also called the Z-distribution), which
has a mean of 0 and a standard deviation of 1.

Types of Z-Tests:
1. One-sample Z-test: Used to test if the sample mean is different from a known population mean.
Formula:
x̄ − µ
Z=
√σ
n

Where: - x̄ = sample mean, - µ = population mean, - σ = population standard deviation, - n = sample size.
2. Two-sample Z-test Used to compare the means of two independent samples.
Formula:
(x̄1 − x̄2 )
Z=q 2
σ1 σ22
n1 + n2

Where: - x̄1 , x̄2 = sample means, - σ12 , σ22 = population variances, - n1 , n2 = sample sizes.
3. Z-test for proportions: Used to compare proportions between two samples. Formula:
p1 − p2
Z=r  
p̂(1 − p̂) n11 + 1
n2

Where: - p1 , p2 = sample proportions, - p̂ = pooled proportion:


x1 + x2
p̂ =
n1 + n2
- n1 , n2 = sample sizes.
Z-Test Critical Values Table

• One-Tailed Test: Use this when you’re testing for a directional hypothesis (e.g., whether a sample mean is
greater than or less than a population mean). The critical value here corresponds to the tail on one side of
the normal distribution.

Example: If you are conducting a one-tailed test at α = 0.05, the Z-critical value is 1.645.
• Two-Tailed Test: Use this when you’re testing for a non-directional hypothesis (e.g., whether the sample
mean is different from the population mean, either higher or lower). This splits the significance level equally
between both tails of the normal distribution.

Example: If you’re conducting a two-tailed test at α = 0.05, the Z-critical values are ±1.96.
17 chi-test
The Chi-Square Test (or χ2 test) is a statistical test used to determine if there is a significant association between
observed and expected frequencies or whether two categorical variables are independent. It is widely used in
hypothesis testing for categorical data, especially when working with contingency tables and goodness-of-fit tests.
Types of Chi-Square Tests

1. Chi-Square Goodness-of-Fit Test: - Used when you want to determine if an observed frequency distribution
fits a particular theoretical distribution. - Example: Testing whether a die is fair by comparing observed
frequencies of each outcome with the expected frequencies.
2. Chi-Square Test for Independence: - Used when you want to determine if two categorical variables are
independent of each other. - Example: Testing whether gender is independent of voting preference in a
population using a contingency table.
Chi-Square Test Formula
The chi-square test statistic (χ2 ) is calculated as follows:
X (Oi − Ei )2
χ2 =
Ei
Where: - Oi = Observed frequency, - Ei = Expected frequency.
Key Assumptions:
1. Data must be in the form of frequencies (counts).
2. The observations must be independent of each other.
3. The sample size should be large enough (expected frequencies should ideally be 5 or more).

1. Chi-Square Goodness-of-Fit Test


This test compares the observed distribution of data to a theoretically expected distribution. The test checks how
well the observed data ”fits” the expected data.

Steps for Chi-Square Goodness-of-Fit Test:

1. Hypotheses:
- Null Hypothesis (H0 ): The observed data fits the expected distribution.
- Alternative Hypothesis (Ha ): The observed data does not fit the expected distribution.
2. Degrees of Freedom (df):
- df = k − 1, where k is the number of categories.

3. Test Statistic:
- Calculate the chi-square statistic using the formula:
X (Oi − Ei )2
χ2 =
Ei

4. Compare to Critical Value:


- Determine the critical value from the chi-square distribution table using the degrees of freedom and the
significance level α.

5. Decision:
- If the calculated chi-square value is greater than the critical value, reject H0 . - Otherwise, fail to reject H0 .
2. Chi-Square Test for Independence
This test checks whether two categorical variables are independent of each other. It’s often used in a contingency
table, which shows the frequency distribution of variables.
Steps for Chi-Square Test for Independence:

1. Hypotheses: - Null Hypothesis (H0 ): The two variables are independent.


- Alternative Hypothesis (Ha ): The two variables are dependent.

2. Expected Frequency:
- Calculate the expected frequency for each cell in the contingency table:
(Row Total) × (Column Total)
Eij =
Grand Total

3. Degrees of Freedom (df):


- df = (r − 1) × (c − 1), where r is the number of rows, and c is the number of columns.
4. Test Statistic:
- Calculate the chi-square statistic using the formula:
X (Oij − Eij )2
χ2 =
Eij

5. Compare to Critical Value:


- Find the critical value from the chi-square table using the degrees of freedom and the significance level α.
6. Decision:
- If χ2 exceeds the critical value, reject H0 . - Otherwise, fail to reject H0 .
Important Considerations:
• Expected Frequencies: Each expected frequency should ideally be 5 or more for the chi-square test to be
valid.
• Non-Negativity: Chi-square values are always non-negative because they are based on squared differences.
• Right-Skewed Distribution: The chi-square distribution is skewed to the right, especially for small degrees of
freedom.

3. Chi-Square Test for Variance


The Chi-Square Test for Variance is a statistical method used to test if the variance (or standard deviation) of
a population is equal to a specified value. This test is especially useful when we want to determine if a sample
variance differs significantly from a known or hypothesized population variance.
The chi-square distribution is used because the test statistic for the variance follows a chi-square distribution if
the population is normally distributed.
Formula for Chi-Square Test for Variance
The test statistic for the chi-square test for variance is:

(n − 1)s2
χ2 =
σ2
Where: - χ = Chi-square test statistic, - n = Sample size, - s2 = Sample variance, - σ 2 = Hypothesized
2

population variance.
Key Hypotheses for the Test

1. Null Hypothesis (H0 ): The population variance is equal to a specific value.

H0 : σ 2 = σ02
2. Alternative Hypothesis (Ha ): The population variance is not equal to the specified value (can be
one-sided or two-sided).
Ha : σ 2 =
̸ σ02 (for two-tailed test)
Ha : σ 2 > σ02 (for right-tailed test)
Ha : σ 2 < σ02 (for left-tailed test)

Steps for Conducting the Chi-Square Test for Variance

1. State the Hypotheses:


- Define the null and alternative hypotheses based on the problem.

2. Set the Significance Level (α):


- Common values for α are 0.05 or 0.01.
3. Calculate the Test Statistic:
- Use the formula:
(n − 1)s2
χ2 =
σ2
4. Degrees of Freedom (df):
- df = n − 1, where n is the sample size.
5. Find the Critical Value:
- Use the chi-square distribution table to find the critical value based on the significance level α and degrees
of freedom df .

6. Make a Decision:
- Compare the calculated chi-square value with the critical value(s):
- For a two-tailed test, if the calculated χ2 value falls outside the range of critical values, reject H0 .
- For a one-tailed test, if the calculated χ2 value is greater than the upper critical value (or less than the
lower critical value for a left-tailed test), reject H0 .

7. Conclusion:
- Based on the decision, either reject or fail to reject the null hypothesis.

You might also like