ADS EXP 1
ADS EXP 1
AIM: Explore the descriptive and inferential statistics on the given dataset.
THEORY
A) DESCRIPTIVE STATISTICS
Descriptive statistics summarize and present data in a meaningful way. They help
understand the distribution, central tendency, and variability of the dataset.
Mean
The mean is the average value of a dataset. It is calculated by adding all values and
dividing by the number of observations. The mean is useful but can be affected by
outliers.
Median
The median is the middle value when the dataset is arranged in ascending order. If the
dataset has an even number of observations, the median is the average of the two
middle values. It is more robust to outliers than the mean.
Mode
The mode is the most frequently occurring value in the dataset. A dataset can have one
mode (unimodal), two modes (bimodal), or multiple modes (multimodal).
2. Measures of Dispersion
The minimum value represents the smallest observation in the dataset, while the
maximum value represents the largest. These values help determine the range of the
data.
Sum
The range measures the spread of data by subtracting the minimum value from the
maximum value. A larger range indicates greater variability.
Q1 is the value below which 25% of the data falls, while Q3 is the value below which
75% of the data falls. These quartiles help in understanding data distribution.
IQR is the difference between Q3 and Q1. It represents the middle 50% of the data and
helps detect outliers.
Standard Deviation
The standard deviation measures how much the data deviates from the mean. A high
standard deviation indicates that data points are spread out, while a low value suggests
they are close to the mean.
Variance
Variance is a measure of how much data values differ from the mean. It is useful for
comparing variability between datasets.
3. Measures of Association
Correlation
SE of Mean measures how much the sample mean deviates from the population mean.
A smaller SE indicates higher accuracy.
Trimmed Mean
The trimmed mean is calculated after removing extreme values from both ends of the
dataset. This helps reduce the influence of outliers.
Sum of Squares
Sum of squares represents the total squared deviation from the mean. It is useful in
variance and regression analysis.
Skewness
● Positive skew: The right tail is longer, indicating more extreme high values.
● Negative skew: The left tail is longer, indicating more extreme low values.
● Zero skewness: A perfectly symmetrical distribution.
Kurtosis
Kurtosis measures how heavy or light the tails of the distribution are compared to a
normal distribution.
5. Data Visualization
Box-and-Whisker Plot
Scatter Plot
A scatter plot is used to visualize relationships between two numerical variables, helping
to identify patterns and correlations.
Correlation Matrix
B) INFERENTIAL STATISTICS
1. Distributions
Normal Distribution
A normal distribution is bell-shaped, where most values are concentrated around the
mean. Many statistical tests assume normality in the data.
Poisson Distribution
The Poisson distribution models the probability of a specific number of events occurring
within a fixed interval. It is often used for rare events, such as the number of accidents
in a day.
A confidence interval is a range of values likely to contain the true population parameter.
A 95% confidence interval means that if we repeat the sampling many times, 95% of the
time, the interval will contain the true value.
2. Hypothesis Testing
Z-Test
A Z-test is used when the sample size is large and the population standard deviation is
known. It tests whether the sample mean significantly differs from the population mean.
T-Test
A T-test is used when the sample size is small and the population standard deviation is
unknown.
ANOVA is used to compare means across multiple groups. If the variation between
groups is significantly greater than the variation within groups, the means are
considered different.
CONCLUSION
Descriptive statistics provide insights into the structure and distribution of data, while
inferential statistics allow us to make predictions and test hypotheses.