0% found this document useful (0 votes)
28 views13 pages

Dsbda Unit2

The document explains key statistical concepts, including hypothesis testing, degree of freedom, skewness, kurtosis, and the Chi-square Goodness of Fit test. It outlines the steps for hypothesis testing, the significance of skewness and kurtosis in data analysis, and provides definitions and examples for various statistical measures and tests. Additionally, it differentiates between population and sample, and describes one-tailed and two-tailed t-tests with examples.

Uploaded by

surajjadhav3600
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views13 pages

Dsbda Unit2

The document explains key statistical concepts, including hypothesis testing, degree of freedom, skewness, kurtosis, and the Chi-square Goodness of Fit test. It outlines the steps for hypothesis testing, the significance of skewness and kurtosis in data analysis, and provides definitions and examples for various statistical measures and tests. Additionally, it differentiates between population and sample, and describes one-tailed and two-tailed t-tests with examples.

Uploaded by

surajjadhav3600
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Q 1 ) How hypothesis testing works? Explain steps ?

Hypothesis testing is a statistical method to determine whether there is


enough evidence in a sample to infer that certain condition is true for the
entire population.
Hypothesis testing is a way to check if a claim about data is true using statistics.
It helps decide whether to accept or reject an assumption based on sample
data.
Steps of Hypothesis Testing:
1. State the Hypothesis
o Null Hypothesis (H₀): This is the default assumption .
o Alternative Hypothesis (H₁ or Ha): This is what you want to prove .
2. Set a Significance Level (α)
o This is the probability of rejecting H₀ when it is actually true.
o Common values are 0.05 (5%) or 0.01 (1%).
3. Collect & Analyze Data
o Gather sample data from experiments or observations and perform
the analysis on it.
4. Calculate the Test Statistic.
o Choose a statistical test (e.g., t-test, chi-square test) depending on
data type.
o The test statistic measures how much the sample data differs from
H₀.
5. Make a Decision & p-value
o The p-value tells how likely you’d get the observed result if H₀ were
true.
o If p-value ≤ α, reject H₀ (supports H₁).
o If p-value > α, fail to reject H₀ (not enough evidence for H₁).
6. Draw a Conclusion
o Based on the decision, conclude whether the claim is statistically
significant.
Example:
Imagine a company claims their new battery lasts 10 hours on average. A test
on a sample shows an average of 9.5 hours with a low p-value. If p-value <
0.05, we reject H₀ and say the battery lasts less than 10 hours.
Q 2 )What is Degree of Freedom? Explain with Example
1. Degree of Freedom (DOF) refers to the number of independent values or
variables in a statistical calculation that can vary while estimating a
parameter.
2. Degree of Freedom (DOF) is the number of values in a calculation that are
free to vary.
3. In simple terms, it represents the number of choices or free variables
available after certain constraints are applied.
4. It is widely used in hypothesis testing, especially in Chi-square test and
Regression analysis.
Formula for Degree of Freedom:
DOF=n−k
Where:
 n = Total number of observations (sample size)
 k = Number of estimated parameters or constraints
Example:

 Imagine you have 5 numbers, and their average is 10.


Let's say four of the numbers are 8, 9, 11, and 12.
 To keep the average 10, the fifth number must be 10 (since
(8+9+11+12+X)/5=10).
 Here, you were free to choose 4 numbers, but the 5th number was fixed
based on the average.
 So, the degree of freedom = Total numbers - 1 = 5 - 1 = 4.

Q 3 ) Explain skewness and kurtosis. What is the purpose of finding


skewness of data?

What is Skewness?

Skewness tells us how data is distributed—whether it is symmetrical or leaning


more to one side . Skewness tells us if the data is tilted to one side.

 If skewness = 0, the data is perfectly symmetrical (like a bell curve).


 If skewness > 0 (positive skew), the data has a long tail on the right (higher
values are more spread out).
 If skewness < 0 (negative skew), the data has a long tail on the left (lower
values are more spread out).
Example of Skewness:

 If most students score 70-80 in an exam but a few score 100+, the data is
positively skewed.
 If most students score 50-60, but a few score below 30, the data is
negatively skewed.

What is Kurtosis?

Kurtosis measures how "sharp" or "flat" the peak of the data distribution is
compared to a normal distribution. Kurtosis tells us if the data has a sharp or
flat peak.

 High kurtosis (>3): Tall and sharp peak (many values are close to the mean,
with few extreme values).
 Low kurtosis (<3): Flatter peak (more spread out, with fewer extreme
values).
 Normal kurtosis (=3): Similar to a normal bell curve.

Example of Kurtosis:

 A class where most students score around 75 with very few extreme scores
has high kurtosis (sharp peak).
 A class where scores are widely spread out with no clear peak has low
kurtosis (flat).

Why Find Skewness? (Purpose)

1. Understand Data Distribution – Helps to see if data is balanced or has


extreme values.
2. Make Better Predictions – Helps in statistical models and machine
learning.
3. Choose the Right Test – Some statistical tests require normally distributed
data.
4. Detect Bias in Data – Helps in finance, business, and research to avoid
misleading conclusions.
Q 4) Describe Chi-square Goodness of Fit test.

The Chi-Square Goodness of Fit test checks whether observed data matches
what we expected based on some assumption.

It is a statistical test used to determine whether a sample data set fits a specific
theoretical distribution.

It helps in analyzing differences in categorical data frequencies.

Formula for Chi-Square Goodness of Fit Test:

Where:

 X = Chi-square test statistic


 O = Observed frequency
 E = Expected frequency
 ∑= Summation

Steps for Chi-Square Goodness of Fit Test

1. Define Hypothesis
o Null Hypothesis (Ho): The observed data follows the expected
distribution.
o Alternative Hypothesis (H1): The observed data does not follow the
expected distribution.
2. Collect and Organize Data
o Identify observed (O) and expected (E) frequencies for each category.
3. Apply the Chi-Square Formula
o Use the formula:
o Calculate the Chi-square test statistic.
4. Find the Critical Value
o Use the Chi-square table based on the significance level (α) and
degrees of freedom (df=n−1).
5. Compare & Make Decision
o If X2 is greater than the critical value → Reject Ho (Data does not fit
the distribution).
o If X2 is less than the critical value → Fail to Reject Ho (Data fits the
expected distribution).
Q 5 ) List out measures of dispersion with their significance and mathematical
formulae.
1. Absolute Measure of Dispersion
 These are expressed in the same unit as data.
(i) Range (R)
 Definition: Difference between maximum value and minimum value in
the dataset.
 Formula: R=Xmax−Xmin
 Significance:
1. Simple and easy to calculate.
2. Does not consider data distribution.
(ii) Mean Deviation (MD)
 Definition: The average of the absolute deviation from the central value.

 Formula:
 Where:
o Xi = individual observations
o M = mean or median
o N = total number of observations
 Significance:
1. More useful than range as it considers every data point.
2. Uses absolute differences to avoid negative values canceling out.
(iii) Variance (σ2\sigma^2σ2)
 Definition: Average of squared deviations from the mean.
 Significance:
o Provides a measure of spread around the mean.
o Squaring avoids negative deviations canceling positive ones.
(iv) Standard Deviation (σ)
 Definition: The square root of variance.

 Formula:
 Significance:
o Most commonly used measure of dispersion.
o Expressed in the same units as the data.
Coefficient of Variation (CV)
 Definition: Measures relative variability as a percentage.

 Formula:
 Significance:
o Useful for comparing datasets with different units.
Q 6) Write a short note on contingency table, explain with example.

Structure of a Contingency Table


A contingency table consists of:
 Rows: Represent one categorical variable.
 Columns: Represent another categorical variable.
 Cells: Show the frequency or count of occurrences for the combinations of
both variables.
This table allows researchers to calculate probabilities, relationships, and
dependency between variables using statistical methods like Chi-square tests.
Q 7 ) With an example explain Baye's theorem. Also explain its key terms.

Key Terms in Bayes' Theorem.


1. Prior Probability (P(B)) – The initial probability of an event before
considering new evidence.
2. Likelihood (P(A∣B)) – The probability of obtaining evidence given that an
event has occurred.
3. Posterior Probability (P(B∣A)) – The updated probability of an event after
considering new evidence.
4. Evidence (P(A)) – The total probability of the evidence occurring under all
possible conditions.

Q 8 ) What is population & how is it differ from a sample?


1. Population:
 A population is the entire group of individuals, items, or data that you
want to study.
 It includes every possible member related to your study.
 Example: If you want to study the height of students in a school, then all
students in the school form the population.
2. Sample:
 A sample is a smaller group selected from the population.
 It is used to study and make conclusions about the entire population
(because studying the whole population is difficult).
 Example: Instead of measuring all students in the school, you select 50
students randomly. This group of 50 students is a sample.
Key Differences:
Feature Population Sample
The entire group being
Definition A subset of the population
studied
Size Large (can be millions) Smaller, manageable
Example All students in a school 50 students chosen from the school
Used to estimate population
Use Used for complete analysis
characteristics
Cost & Expensive & time-
Cheaper & quicker
Time consuming

Q 9) With an example, explain one-tailed & two-tailed t-tests.


1. One-Tailed t-Test
 A one-tailed t-test is used when we want to check if one group’s mean is
either greater or smaller than the other.
 It only tests in one direction (greater or smaller).

Example:
A company claims that a new training program increases employee
productivity. You perform a one-tailed t-test with:
 Null Hypothesis (H₀): The training does not increase productivity.
 Alternative Hypothesis (H₁): The training increases productivity.
If the test result is significant, it means the training increased productivity. A
one-tailed test is used because we are only checking for an increase, not a
decrease.
2. Two-Tailed t-Test
 A two-tailed t-test checks both directions—whether one group is greater or
smaller than the other.
 It does not assume which direction the difference might be.
Example:
A teacher wants to know if boys and girls score differently in a math test. You
perform a two-tailed t-test with:
 Null Hypothesis (H₀): Boys and girls score the same on average.
 Alternative Hypothesis (H₁): Boys and girls do not score the same (one group
may be higher or lower).
Since the test checks both higher and lower scores, a two-tailed test is used.

Q)

You might also like