0% found this document useful (0 votes)
2 views

Notes

The document covers hypothesis testing, including formulating hypotheses, significance levels, and types of errors. It also discusses various statistical tests such as Z-tests, t-tests, F-tests, and Chi-square tests, along with their applications and key formulas. Additionally, it explains probability theory, correlation analysis, and regression analysis, emphasizing their significance in statistical decision-making.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Notes

The document covers hypothesis testing, including formulating hypotheses, significance levels, and types of errors. It also discusses various statistical tests such as Z-tests, t-tests, F-tests, and Chi-square tests, along with their applications and key formulas. Additionally, it explains probability theory, correlation analysis, and regression analysis, emphasizing their significance in statistical decision-making.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Unit 5

1. Hypothesis Testing

Hypothesis testing is a statistical method used to make decisions or inferences about a population
based on sample data. It involves the following key steps:

1. Formulating Hypotheses:
o Null Hypothesis (H0H_0H0): A statement of no effect or no difference. It
assumes the population parameter equals a specific value.
o Alternative Hypothesis (H1H_1H1): A statement contradicting the null
hypothesis. It represents what the researcher aims to prove.
2. Setting the Significance Level (α\alphaα):
o Commonly used values are 0.050.050.05 or 0.010.010.01, representing the
probability of rejecting H0H_0H0 when it is true.
3. Calculating the Test Statistic:
o Depends on the type of test and sample data (e.g., ttt-test, ZZZ-test).
4. Making a Decision:
o Compare the test statistic to a critical value or use the ppp-value approach to
accept or reject H0H_0H0.

2. Types of Errors in Hypothesis Testing

 Type I Error (α\alphaα): Rejecting the null hypothesis when it is true.


o Probability of occurrence: α\alphaα (significance level).
o Example: Convicting an innocent person.
 Type II Error (β\betaβ): Failing to reject the null hypothesis when it is false.
o Probability of occurrence: β\betaβ.
o Example: Acquitting a guilty person.

3. Tests for Hypothesis Testing

The choice of test depends on:

 Sample size (large or small).


 Distribution (normal or non-normal).
 Type of data (categorical or continuous).

a. Large Sample Tests


Used when the sample size is large (n>30n > 30n>30).

 ZZZ-Test: Based on the standard normal distribution.


o Tests for population mean, proportion, or difference of means.

b. Small Sample Tests

Used when the sample size is small (n≤30n \leq 30n≤30).

 ttt-Test: Based on Student's ttt-distribution.


o One-sample ttt-test: Compares the sample mean with a known population mean.
o Two-sample ttt-test: Compares the means of two independent samples.
o Paired ttt-test: Compares means of two related samples (e.g., before and after
observations).

c. FFF-Test

 Used to compare two variances to test if they are significantly different.


 Basis for analysis of variance (ANOVA).

d. Chi-Square (χ2\chi^2χ2) Test

 Used for categorical data to test:


o Goodness of Fit: If observed frequencies match expected frequencies.
o Independence: If two variables are independent.

4. Key Test Formulas

ZZZ-Test:

For population mean:

Z=Xˉ−μσ/nZ = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}}Z=σ/nXˉ−μ

Where:

 Xˉ\bar{X}Xˉ: Sample mean


 μ\muμ: Population mean
 σ\sigmaσ: Population standard deviation
 nnn: Sample size

ttt-Test:

For one-sample mean:


t=Xˉ−μs/nt = \frac{\bar{X} - \mu}{s / \sqrt{n}}t=s/nXˉ−μ

Where:

 sss: Sample standard deviation

FFF-Test:

F=s12s22F = \frac{s_1^2}{s_2^2}F=s22s12

Where:

 s12s_1^2s12, s22s_2^2s22: Sample variances

Chi-Square Test:

χ2=∑(Oi−Ei)2Ei\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}χ2=∑Ei(Oi−Ei)2

Where:

 OiO_iOi: Observed frequency


 EiE_iEi: Expected frequency

5. Guidelines for Choosing the Right Test

Test Purpose Conditions


Z-Test Compare means or proportions Large sample, known variance
Small sample, unknown
t-Test Compare means
variance
F-Test Compare variances Assumes normality
Goodness of fit or independence for categorical
χ2\chi^2χ2-Test Non-parametric test
data

Unit 4

1. Theory of Probability

Probability is a measure of uncertainty and quantifies the likelihood of an event occurring.

Basic Terms
1. Experiment: A procedure that produces outcomes (e.g., rolling a die).
2. Sample Space (SSS): The set of all possible outcomes.
3. Event (AAA): A subset of the sample space.

Key Probability Formula

The probability of an event AAA:

P(A)=Number of favorable outcomesTotal number of outcomesP(A) = \frac{\text{Number of


favorable outcomes}}{\text{Total number of
outcomes}}P(A)=Total number of outcomesNumber of favorable outcomes

Where:

0≤P(A)≤10 \leq P(A) \leq 10≤P(A)≤1

Axioms of Probability

1. P(S)=1P(S) = 1P(S)=1 (The probability of the sample space is 1).


2. P(∅)=0P(\emptyset) = 0P(∅)=0 (The probability of an impossible event is 0).
3. For mutually exclusive events AAA and BBB: P(A∪B)=P(A)+P(B)P(A \cup B) = P(A) +
P(B)P(A∪B)=P(A)+P(B)

2. Addition and Multiplication Laws

Addition Law of Probability

For any two events AAA and BBB:

P(A∪B)=P(A)+P(B)−P(A∩B)P(A \cup B) = P(A) + P(B) - P(A \cap B)P(A∪B)=P(A)+P(B)


−P(A∩B)

 If AAA and BBB are mutually exclusive (P(A∩B)=0P(A \cap B) = 0P(A∩B)=0):


P(A∪B)=P(A)+P(B)P(A \cup B) = P(A) + P(B)P(A∪B)=P(A)+P(B)

Multiplication Law of Probability

For two events AAA and BBB:

P(A∩B)=P(A)⋅P(B∣A)P(A \cap B) = P(A) \cdot P(B|A)P(A∩B)=P(A)⋅P(B∣A)

 If AAA and BBB are independent: P(A∩B)=P(A)⋅P(B)P(A \cap B) = P(A) \cdot


P(B)P(A∩B)=P(A)⋅P(B)
3. Bayes’ Theorem

Bayes’ Theorem provides a way to update probabilities based on new information.

P(A∣B)=P(B∣A)⋅P(A)P(B)P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}P(A∣B)=P(B)P(B∣A)⋅P(A)

Where:

 P(A∣B)P(A|B)P(A∣B): Probability of AAA given BBB.


 P(B∣A)P(B|A)P(B∣A): Probability of BBB given AAA.
 P(A)P(A)P(A): Prior probability of AAA.
 P(B)P(B)P(B): Total probability of BBB, calculated as: P(B)=∑P(B∣Ai)⋅P(Ai)P(B) = \
sum P(B|A_i) \cdot P(A_i)P(B)=∑P(B∣Ai)⋅P(Ai)

4. Probability Theoretical Distributions

Probability distributions describe how probabilities are distributed over values of a random
variable.

a. Binomial Distribution

Used when there are two possible outcomes (success or failure) in a fixed number of
independent trials.

P(X=k)=(nk)pk(1−p)n−kP(X = k) = \binom{n}{k} p^k (1-p)^{n-k}P(X=k)=(kn)pk(1−p)n−k

Where:

 nnn: Number of trials.


 kkk: Number of successes.
 ppp: Probability of success in a single trial.
 (nk)\binom{n}{k}(kn): Binomial coefficient n!k!(n−k)!\frac{n!}{k!(n-k)!}k!(n−k)!n!.

Mean: μ=n⋅p\mu = n \cdot pμ=n⋅p


Variance: σ2=n⋅p⋅(1−p)\sigma^2 = n \cdot p \cdot (1-p)σ2=n⋅p⋅(1−p)

b. Poisson Distribution

Used for modeling the number of events occurring in a fixed interval of time or space.
P(X=k)=e−λλkk!P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!}P(X=k)=k!e−λλk

Where:

 λ\lambdaλ: Average rate of occurrence.


 kkk: Number of events.

Mean: μ=λ\mu = \lambdaμ=λ


Variance: σ2=λ\sigma^2 = \lambdaσ2=λ

c. Normal Distribution

A continuous probability distribution used to model many natural phenomena. Its probability
density function (PDF) is given by:

f(x)=12πσ2e−(x−μ)22σ2f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\


sigma^2}}f(x)=2πσ21e−2σ2(x−μ)2

Where:

 μ\muμ: Mean (center of the distribution).


 σ2\sigma^2σ2: Variance.

Key Properties:

1. Symmetrical around the mean.


2. Total area under the curve is 1.
3. Approximately 68%68\%68%, 95%95\%95%, and 99.7%99.7\%99.7% of the data lie
within 1, 2, and 3 standard deviations of the mean, respectively.

5. Applications

1. Binomial Distribution:
o Modeling success/failure experiments (e.g., coin flips, defect detection).
2. Poisson Distribution:
o Modeling rare events (e.g., number of phone calls per hour).
3. Normal Distribution:
o Modeling natural phenomena (e.g., heights, weights, test scores).

Unit 3
1. Correlation Analysis

Correlation measures the strength and direction of a linear relationship between two variables.

1.1 Rank Correlation Method (Spearman’s Rank Correlation Coefficient)

Used for ordinal data or when the relationship between variables is not linear. It measures the
degree of association between two ranked variables.

rs=1−6∑di2n(n2−1)r_s = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)}rs=1−n(n2−1)6∑di2

Where:

 rsr_srs: Spearman’s rank correlation coefficient.


 did_idi: Difference between ranks of corresponding values.
 nnn: Number of observations.

Key Features:

 Value of rsr_srs lies between −1-1−1 and 111:


o rs=1r_s = 1rs=1: Perfect positive correlation.
o rs=−1r_s = -1rs=−1: Perfect negative correlation.
o rs=0r_s = 0rs=0: No correlation.

1.2 Karl Pearson’s Coefficient of Correlation

A widely used measure for linear correlation between two continuous variables.

r=∑(Xi−Xˉ)(Yi−Yˉ)∑(Xi−Xˉ)2⋅∑(Yi−Yˉ)2r = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\


sqrt{\sum (X_i - \bar{X})^2 \cdot \sum (Y_i - \bar{Y})^2}}r=∑(Xi−Xˉ)2⋅∑(Yi−Yˉ)2∑(Xi−Xˉ)
(Yi−Yˉ)

Where:

 Xi,YiX_i, Y_iXi,Yi: Observed values.


 Xˉ,Yˉ\bar{X}, \bar{Y}Xˉ,Yˉ: Mean of XXX and YYY.

Alternatively:

r=n∑XY−∑X∑Y[n∑X2−(∑X)2]⋅[n∑Y2−(∑Y)2]r = \frac{n \sum XY - \sum X \sum Y}{\sqrt{[n


\sum X^2 - (\sum X)^2] \cdot [n \sum Y^2 - (\sum Y)^2]}}r=[n∑X2−(∑X)2]⋅[n∑Y2−(∑Y)2]
n∑XY−∑X∑Y

Properties:
1. rrr lies between −1-1−1 and 111.
2. r>0r > 0r>0: Positive correlation.
3. r<0r < 0r<0: Negative correlation.
4. r=0r = 0r=0: No linear correlation.

2. Regression Analysis

Regression analysis estimates the relationship between dependent and independent variables.

2.1 Fitting a Regression Line

The regression equation of YYY on XXX:

Y=a+bXY = a + bXY=a+bX

Where:

 aaa: Intercept (value of YYY when X=0X = 0X=0).


 bbb: Slope (change in YYY for a unit change in XXX).

Formulas for aaa and bbb:

b=n∑XY−∑X∑Yn∑X2−(∑X)2b = \frac{n \sum XY - \sum X \sum Y}{n \sum X^2 - (\sum


X)^2}b=n∑X2−(∑X)2n∑XY−∑X∑Y a=Yˉ−bXˉa = \bar{Y} - b \bar{X}a=Yˉ−bXˉ

The regression equation of XXX on YYY:

X=c+dYX = c + dYX=c+dY

Where ccc and ddd are calculated similarly.

2.2 Interpretation of Results

1. The slope bbb indicates the direction and strength of the relationship.
2. The intercept aaa provides the starting value of YYY when X=0X = 0X=0.
3. Goodness of fit can be evaluated using the coefficient of determination (R2R^2R2).

3. Properties of Regression Coefficients


1. The regression coefficients (bYXb_{YX}bYX and bXYb_{XY}bXY) have the
following relationship: bYX⋅bXY=r2b_{YX} \cdot b_{XY} = r^2bYX⋅bXY=r2
2. The sign of regression coefficients is the same as the sign of the correlation coefficient
(rrr).
3. The regression coefficients are not symmetric; bYXb_{YX}bYX is not equal to
bXYb_{XY}bXY.

4. Relationship Between Correlation and Regression

1. Correlation:
o Measures the strength and direction of the relationship.
o Does not differentiate between dependent and independent variables.
2. Regression:
o Provides a functional relationship between variables.
o Differentiates between dependent and independent variables.
3. If r=0r = 0r=0, there is no linear relationship, and the regression line will be horizontal
(slope = 0).
4. The coefficient of determination (R2R^2R2):

R2=r2R^2 = r^2R2=r2

Indicates the proportion of variance in YYY explained by XXX.

Summary Table

Concept Correlation Regression


Purpose Measure relationship strength Predict dependent variable
Key Metric Correlation Coefficient (rrr) Regression Coefficients (a,ba, ba,b)
Symmetry Symmetric Asymmetric (dependent vs. independent variables)
Range −1≤r≤1-1 \leq r \leq 1−1≤r≤1 No specific range

Unit 2

Time Series Analysis

Concept
Time series analysis involves studying data points collected or recorded at specific time
intervals. It helps identify patterns, trends, and seasonality to make forecasts and informed
decisions.

Additive and Multiplicative Models

1. Additive Model:
The observed value at any time (Y) is expressed as the sum of its components:

Yt=Tt+St+Ct+RtY_t = T_t + S_t + C_t + R_tYt=Tt+St+Ct+Rt

Where:

o TtT_tTt: Trend component


o StS_tSt: Seasonal component
o CtC_tCt: Cyclical component
o RtR_tRt: Random or irregular component
2. Multiplicative Model:
The observed value at any time is the product of its components:

Yt=Tt×St×Ct×RtY_t = T_t \times S_t \times C_t \times R_tYt=Tt×St×Ct×Rt

o Used when the variations in seasonal or cyclical components are proportional to


the level of the trend.

Components of Time Series

1. Trend: Long-term upward or downward movement in data.


2. Seasonality: Regular periodic fluctuations due to seasons, months, or days.
3. Cyclicality: Fluctuations occurring over periods longer than a year, related to business or
economic cycles.
4. Random (Irregular): Unpredictable variations due to unforeseen factors.

Trend Analysis

Least Square Method


The least square method helps fit a trend line or curve to the time series data by minimizing the
sum of the squared deviations of observed values from the estimated values.

1. Linear Equation:

Y=a+bXY = a + bXY=a+bX
Where:

YYY: Estimated value


o
XXX: Time variable
o
aaa: Intercept
o
bbb: Slope (rate of change)
o
2. Non-Linear Equations:
Examples include exponential (Y=abXY = ab^XY=abX) and quadratic models
(Y=a+bX+cX2Y = a + bX + cX^2Y=a+bX+cX2).

Steps:

 Determine the values of aaa and bbb using the formulas: a=∑Y−b∑Xn,b=n∑(XY)
−∑X∑Yn∑X2−(∑X)2a = \frac{\sum Y - b \sum X}{n}, \quad b = \frac{n\sum(XY) - \
sum X \sum Y}{n\sum X^2 - (\sum X)^2}a=n∑Y−b∑X,b=n∑X2−(∑X)2n∑(XY)
−∑X∑Y

Applications in Business Decision-Making

 Forecasting sales, demand, and revenue


 Production and inventory planning
 Identifying seasonal peaks and troughs
 Policy and strategy formulation

Index Numbers

Meaning

Index numbers measure relative changes in variables over time, helping compare different
periods or places.

Types of Index Numbers

1. Price Index: Measures changes in prices of goods over time.


2. Quantity Index: Tracks changes in quantities produced or consumed.
3. Value Index: Combines price and quantity changes.

Uses of Index Numbers


 Analyzing economic trends like inflation
 Studying consumer price changes
 Comparing production or sales over time

Construction of Index Numbers

1. Fixed Base Method

 Uses a fixed base year as the reference point.


 Formula for a price index: PI=Price in Current YearPrice in Base Year×100PI = \frac{\
text{Price in Current Year}}{\text{Price in Base Year}} \times
100PI=Price in Base YearPrice in Current Year×100

2. Chain Base Method

 Uses the previous year as the base year for each calculation.
 Formula: CI=Price in Current YearPrice in Previous Year×100CI = \frac{\text{Price in
Current Year}}{\text{Price in Previous Year}} \times
100CI=Price in Previous YearPrice in Current Year×100

Price Index (Laspeyres and Paasche):

1. Laspeyres Index: PIL=∑(P1×Q0)∑(P0×Q0)×100PI_L = \frac{\sum(P_1 \times Q_0)}{\


sum(P_0 \times Q_0)} \times 100PIL=∑(P0×Q0)∑(P1×Q0)×100
2. Paasche Index: PIP=∑(P1×Q1)∑(P0×Q1)×100PI_P = \frac{\sum(P_1 \times Q_1)}{\
sum(P_0 \times Q_1)} \times 100PIP=∑(P0×Q1)∑(P1×Q1)×100

Quantity Index:

1. Laspeyres and Paasche formulas can also calculate quantity indices by interchanging PPP
with QQQ.

Unit 1

Statistics

Meaning

Statistics is the study of collecting, organizing, analyzing, interpreting, and presenting numerical
data to make informed decisions.

Scope of Statistics
1. Descriptive Statistics: Summarizes and describes the main features of a dataset using
measures like mean, median, and standard deviation.
2. Inferential Statistics: Makes predictions or inferences about a population based on a
sample.
3. Applications:
o Business: Sales forecasting, quality control
o Economics: Inflation, GDP analysis
o Healthcare: Medical research, patient statistics
o Social Sciences: Survey analysis

Types of Statistics

1. Descriptive Statistics:
o Central tendency measures (mean, median, mode)
o Measures of dispersion (range, standard deviation)
2. Inferential Statistics:
o Hypothesis testing
o Regression and correlation
o Probability distributions

Functions of Statistics

1. Summarizes complex data into a comprehensible form.


2. Aids in decision-making through analysis and predictions.
3. Provides insights into trends and patterns.
4. Helps in formulating policies and strategies.

Limitations of Statistics

1. Cannot deal with qualitative data without quantification.


2. Results may be misinterpreted or biased.
3. Relies heavily on correct sampling techniques.
4. Provides only approximations, not absolute certainty.

Measures of Central Tendency

1. Mean (Arithmetic Mean)


The average of all data points.

Mean=∑Xn\text{Mean} = \frac{\sum X}{n}Mean=n∑X

 Advantages: Simple to compute, considers all values.


 Disadvantages: Affected by extreme values.

2. Median

The middle value of a dataset when arranged in ascending order.

 Formula for position: Median Position=n+12\text{Median Position} = \frac{n+1}


{2}Median Position=2n+1
 Advantages: Not affected by extreme values.
 Disadvantages: Ignores data distribution.

3. Mode

The value that appears most frequently in the dataset.

 Useful for categorical data.


 Advantages: Easy to understand.
 Disadvantages: May not exist or may not be unique.

4. Quartiles

Divide the dataset into four equal parts.

 Q1: 25th percentile


 Q2: 50th percentile (median)
 Q3: 75th percentile

Measures of Dispersion

1. Range

Difference between the maximum and minimum values.

Range=Max−Min\text{Range} = \text{Max} - \text{Min}Range=Max−Min

2. Interquartile Range (IQR)

Difference between the third quartile (Q3) and first quartile (Q1).
IQR=Q3−Q1\text{IQR} = Q3 - Q1IQR=Q3−Q1

3. Mean Deviation

The average of the absolute deviations from the mean or median.

Mean Deviation=∑∣X−Mean∣n\text{Mean Deviation} = \frac{\sum |X - \text{Mean}|}


{n}Mean Deviation=n∑∣X−Mean∣

4. Standard Deviation (SD)

Measures the average deviation of data points from the mean.

SD=∑(X−Mean)2n\text{SD} = \sqrt{\frac{\sum (X - \text{Mean})^2}{n}}SD=n∑(X−Mean)2

5. Variance

The square of the standard deviation.

Variance=∑(X−Mean)2n\text{Variance} = \frac{\sum (X - \text{Mean})^2}


{n}Variance=n∑(X−Mean)2

6. Coefficient of Variation (CV)

Expresses standard deviation as a percentage of the mean.

CV=SDMean×100\text{CV} = \frac{\text{SD}}{\text{Mean}} \times 100CV=MeanSD×100

 Used for comparing variability across datasets.

Skewness and Kurtosis

1. Skewness

Measures the asymmetry of the data distribution.

 Positive Skew: Tail on the right.


 Negative Skew: Tail on the left.

Skewness=1n∑(X−Mean)3(SD)3\text{Skewness} = \frac{\frac{1}{n}\sum(X - \text{Mean})^3}


{(\text{SD})^3}Skewness=(SD)3n1∑(X−Mean)3

2. Kurtosis
Measures the "tailedness" or sharpness of the peak of the data distribution.

 Leptokurtic: Sharper peak (Kurtosis > 3).


 Mesokurtic: Normal distribution (Kurtosis = 3).
 Platykurtic: Flatter peak (Kurtosis < 3).

You might also like