Dsbda Unit2
Dsbda Unit2
What is Skewness?
If most students score 70-80 in an exam but a few score 100+, the data is
positively skewed.
If most students score 50-60, but a few score below 30, the data is
negatively skewed.
What is Kurtosis?
Kurtosis measures how "sharp" or "flat" the peak of the data distribution is
compared to a normal distribution. Kurtosis tells us if the data has a sharp or
flat peak.
High kurtosis (>3): Tall and sharp peak (many values are close to the mean,
with few extreme values).
Low kurtosis (<3): Flatter peak (more spread out, with fewer extreme
values).
Normal kurtosis (=3): Similar to a normal bell curve.
Example of Kurtosis:
A class where most students score around 75 with very few extreme scores
has high kurtosis (sharp peak).
A class where scores are widely spread out with no clear peak has low
kurtosis (flat).
The Chi-Square Goodness of Fit test checks whether observed data matches
what we expected based on some assumption.
It is a statistical test used to determine whether a sample data set fits a specific
theoretical distribution.
Where:
1. Define Hypothesis
o Null Hypothesis (Ho): The observed data follows the expected
distribution.
o Alternative Hypothesis (H1): The observed data does not follow the
expected distribution.
2. Collect and Organize Data
o Identify observed (O) and expected (E) frequencies for each category.
3. Apply the Chi-Square Formula
o Use the formula:
o Calculate the Chi-square test statistic.
4. Find the Critical Value
o Use the Chi-square table based on the significance level (α) and
degrees of freedom (df=n−1).
5. Compare & Make Decision
o If X2 is greater than the critical value → Reject Ho (Data does not fit
the distribution).
o If X2 is less than the critical value → Fail to Reject Ho (Data fits the
expected distribution).
Q 5 ) List out measures of dispersion with their significance and mathematical
formulae.
1. Absolute Measure of Dispersion
These are expressed in the same unit as data.
(i) Range (R)
Definition: Difference between maximum value and minimum value in
the dataset.
Formula: R=Xmax−Xmin
Significance:
1. Simple and easy to calculate.
2. Does not consider data distribution.
(ii) Mean Deviation (MD)
Definition: The average of the absolute deviation from the central value.
Formula:
Where:
o Xi = individual observations
o M = mean or median
o N = total number of observations
Significance:
1. More useful than range as it considers every data point.
2. Uses absolute differences to avoid negative values canceling out.
(iii) Variance (σ2\sigma^2σ2)
Definition: Average of squared deviations from the mean.
Significance:
o Provides a measure of spread around the mean.
o Squaring avoids negative deviations canceling positive ones.
(iv) Standard Deviation (σ)
Definition: The square root of variance.
Formula:
Significance:
o Most commonly used measure of dispersion.
o Expressed in the same units as the data.
Coefficient of Variation (CV)
Definition: Measures relative variability as a percentage.
Formula:
Significance:
o Useful for comparing datasets with different units.
Q 6) Write a short note on contingency table, explain with example.
Example:
A company claims that a new training program increases employee
productivity. You perform a one-tailed t-test with:
Null Hypothesis (H₀): The training does not increase productivity.
Alternative Hypothesis (H₁): The training increases productivity.
If the test result is significant, it means the training increased productivity. A
one-tailed test is used because we are only checking for an increase, not a
decrease.
2. Two-Tailed t-Test
A two-tailed t-test checks both directions—whether one group is greater or
smaller than the other.
It does not assume which direction the difference might be.
Example:
A teacher wants to know if boys and girls score differently in a math test. You
perform a two-tailed t-test with:
Null Hypothesis (H₀): Boys and girls score the same on average.
Alternative Hypothesis (H₁): Boys and girls do not score the same (one group
may be higher or lower).
Since the test checks both higher and lower scores, a two-tailed test is used.
Q)