0% found this document useful (0 votes)
2 views

ADS EXP 1

This document outlines the principles of descriptive and inferential statistics, detailing measures of central tendency, dispersion, association, and data visualization techniques. It also covers inferential statistics concepts such as distributions, confidence intervals, and hypothesis testing methods like Z-tests and T-tests. The conclusion emphasizes the importance of these statistical concepts for data-driven decision-making across various industries.

Uploaded by

ritzinator24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

ADS EXP 1

This document outlines the principles of descriptive and inferential statistics, detailing measures of central tendency, dispersion, association, and data visualization techniques. It also covers inferential statistics concepts such as distributions, confidence intervals, and hypothesis testing methods like Z-tests and T-tests. The conclusion emphasizes the importance of these statistical concepts for data-driven decision-making across various industries.

Uploaded by

ritzinator24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

EXPERIMENT NO.

AIM: Explore the descriptive and inferential statistics on the given dataset.

THEORY

A) DESCRIPTIVE STATISTICS

Descriptive statistics summarize and present data in a meaningful way. They help
understand the distribution, central tendency, and variability of the dataset.

1. Measures of Central Tendency

Mean

The mean is the average value of a dataset. It is calculated by adding all values and
dividing by the number of observations. The mean is useful but can be affected by
outliers.

Median

The median is the middle value when the dataset is arranged in ascending order. If the
dataset has an even number of observations, the median is the average of the two
middle values. It is more robust to outliers than the mean.

Mode

The mode is the most frequently occurring value in the dataset. A dataset can have one
mode (unimodal), two modes (bimodal), or multiple modes (multimodal).

2. Measures of Dispersion

Min & Max

The minimum value represents the smallest observation in the dataset, while the
maximum value represents the largest. These values help determine the range of the
data.

Sum

The sum is the total of all data points in a given variable.


Range

The range measures the spread of data by subtracting the minimum value from the
maximum value. A larger range indicates greater variability.

First Quartile (Q1) & Third Quartile (Q3)

Q1 is the value below which 25% of the data falls, while Q3 is the value below which
75% of the data falls. These quartiles help in understanding data distribution.

Interquartile Range (IQR)

IQR is the difference between Q3 and Q1. It represents the middle 50% of the data and
helps detect outliers.

Standard Deviation

The standard deviation measures how much the data deviates from the mean. A high
standard deviation indicates that data points are spread out, while a low value suggests
they are close to the mean.

Variance

Variance is a measure of how much data values differ from the mean. It is useful for
comparing variability between datasets.

3. Measures of Association

Correlation

Correlation measures the relationship between two variables. It ranges from -1 to 1,


where:

●​ A value close to 1 indicates a strong positive relationship.


●​ A value close to -1 indicates a strong negative relationship.
●​ A value near 0 indicates little to no correlation.

4. Statistical Measures for Data Quality

Standard Error of Mean (SE of Mean)

SE of Mean measures how much the sample mean deviates from the population mean.
A smaller SE indicates higher accuracy.

Coefficient of Variation (CV)

CV is the ratio of standard deviation to mean, expressed as a percentage. It is useful for


comparing variability across different datasets.
Missing & Total Counts (N missing, N total)

●​ N missing: The number of missing values in the dataset.


●​ N total: The total number of observations in the dataset.

Cumulative N & Cumulative Percent

●​ Cumulative N: A running total of observations as values increase.


●​ Cumulative Percent: The percentage of total observations up to a certain value.

Trimmed Mean

The trimmed mean is calculated after removing extreme values from both ends of the
dataset. This helps reduce the influence of outliers.

Sum of Squares

Sum of squares represents the total squared deviation from the mean. It is useful in
variance and regression analysis.

Skewness

Skewness measures the asymmetry of a dataset.

●​ Positive skew: The right tail is longer, indicating more extreme high values.
●​ Negative skew: The left tail is longer, indicating more extreme low values.
●​ Zero skewness: A perfectly symmetrical distribution.

Kurtosis

Kurtosis measures how heavy or light the tails of the distribution are compared to a
normal distribution.

●​ High kurtosis: More extreme outliers.


●​ Low kurtosis: Fewer extreme values.

5. Data Visualization

Box-and-Whisker Plot

A boxplot visually represents data distribution, highlighting quartiles, median, and


potential outliers.

Scatter Plot

A scatter plot is used to visualize relationships between two numerical variables, helping
to identify patterns and correlations.
Correlation Matrix

A correlation matrix is a heatmap that displays the strength and direction of


relationships between multiple variables.

B) INFERENTIAL STATISTICS

Inferential statistics help make predictions or generalizations about a population based


on a sample.

1. Distributions

Normal Distribution

A normal distribution is bell-shaped, where most values are concentrated around the
mean. Many statistical tests assume normality in the data.

Poisson Distribution

The Poisson distribution models the probability of a specific number of events occurring
within a fixed interval. It is often used for rare events, such as the number of accidents
in a day.

Population Parameters & Sampling Errors

●​ Population Parameters: Characteristics of the entire population, such as mean


or variance.
●​ Sampling Error: The difference between a sample statistic and the true
population parameter due to random variation.

Confidence Intervals (CI)

A confidence interval is a range of values likely to contain the true population parameter.
A 95% confidence interval means that if we repeat the sampling many times, 95% of the
time, the interval will contain the true value.

2. Hypothesis Testing

Hypothesis testing is used to make statistical decisions about a population based on


sample data.

Z-Test

A Z-test is used when the sample size is large and the population standard deviation is
known. It tests whether the sample mean significantly differs from the population mean.
T-Test

A T-test is used when the sample size is small and the population standard deviation is
unknown.

●​ One-sample t-test: Compares a sample mean to a known value.


●​ Two-sample t-test: Compares the means of two independent groups to check if
they are significantly different.

Type I & Type II Errors

●​ Type I Error (False Positive): Rejecting a true null hypothesis.


●​ Type II Error (False Negative): Failing to reject a false null hypothesis.

ANOVA (Analysis of Variance)

ANOVA is used to compare means across multiple groups. If the variation between
groups is significantly greater than the variation within groups, the means are
considered different.

CONCLUSION

In this experiment, we learn about differential and inferential statistics.

Descriptive statistics provide insights into the structure and distribution of data, while
inferential statistics allow us to make predictions and test hypotheses.

Understanding these concepts is essential for data-driven decision-making, particularly


in real estate and other industries.

You might also like