0% found this document useful (0 votes)

2 views

ADS EXP 1

This document outlines the principles of descriptive and inferential statistics, detailing measures of central tendency, dispersion, association, and data visualization techniques. It also covers inferential statistics concepts such as distributions, confidence intervals, and hypothesis testing methods like Z-tests and T-tests. The conclusion emphasizes the importance of these statistical concepts for data-driven decision-making across various industries.

Uploaded by

ritzinator24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

ADS EXP 1

Uploaded by

ritzinator24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

EXPERIMENT NO.

AIM: Explore the descriptive and inferential statistics on the given dataset.

THEORY

A) DESCRIPTIVE STATISTICS

Descriptive statistics summarize and present data in a meaningful way. They help
understand the distribution, central tendency, and variability of the dataset.

1. Measures of Central Tendency

Mean

The mean is the average value of a dataset. It is calculated by adding all values and
dividing by the number of observations. The mean is useful but can be affected by
outliers.

Median

The median is the middle value when the dataset is arranged in ascending order. If the
dataset has an even number of observations, the median is the average of the two
middle values. It is more robust to outliers than the mean.

Mode

The mode is the most frequently occurring value in the dataset. A dataset can have one
mode (unimodal), two modes (bimodal), or multiple modes (multimodal).

2. Measures of Dispersion

Min & Max

The minimum value represents the smallest observation in the dataset, while the
maximum value represents the largest. These values help determine the range of the
data.

Sum

The sum is the total of all data points in a given variable.

Range

The range measures the spread of data by subtracting the minimum value from the
maximum value. A larger range indicates greater variability.

First Quartile (Q1) & Third Quartile (Q3)

Q1 is the value below which 25% of the data falls, while Q3 is the value below which
75% of the data falls. These quartiles help in understanding data distribution.

Interquartile Range (IQR)

IQR is the difference between Q3 and Q1. It represents the middle 50% of the data and
helps detect outliers.

Standard Deviation

The standard deviation measures how much the data deviates from the mean. A high
standard deviation indicates that data points are spread out, while a low value suggests
they are close to the mean.

Variance

Variance is a measure of how much data values differ from the mean. It is useful for
comparing variability between datasets.

3. Measures of Association

Correlation

Correlation measures the relationship between two variables. It ranges from -1 to 1,

where:

● A value close to 1 indicates a strong positive relationship.

● A value close to -1 indicates a strong negative relationship.
● A value near 0 indicates little to no correlation.

4. Statistical Measures for Data Quality

Standard Error of Mean (SE of Mean)

SE of Mean measures how much the sample mean deviates from the population mean.
A smaller SE indicates higher accuracy.

Coefficient of Variation (CV)

CV is the ratio of standard deviation to mean, expressed as a percentage. It is useful for

comparing variability across different datasets.
Missing & Total Counts (N missing, N total)

● N missing: The number of missing values in the dataset.

● N total: The total number of observations in the dataset.

Cumulative N & Cumulative Percent

● Cumulative N: A running total of observations as values increase.

● Cumulative Percent: The percentage of total observations up to a certain value.

Trimmed Mean

The trimmed mean is calculated after removing extreme values from both ends of the
dataset. This helps reduce the influence of outliers.

Sum of Squares

Sum of squares represents the total squared deviation from the mean. It is useful in
variance and regression analysis.

Skewness

Skewness measures the asymmetry of a dataset.

● Positive skew: The right tail is longer, indicating more extreme high values.
● Negative skew: The left tail is longer, indicating more extreme low values.
● Zero skewness: A perfectly symmetrical distribution.

Kurtosis

Kurtosis measures how heavy or light the tails of the distribution are compared to a
normal distribution.

● High kurtosis: More extreme outliers.

● Low kurtosis: Fewer extreme values.

5. Data Visualization

Box-and-Whisker Plot

A boxplot visually represents data distribution, highlighting quartiles, median, and

potential outliers.

Scatter Plot

A scatter plot is used to visualize relationships between two numerical variables, helping
to identify patterns and correlations.
Correlation Matrix

A correlation matrix is a heatmap that displays the strength and direction of

relationships between multiple variables.

B) INFERENTIAL STATISTICS

Inferential statistics help make predictions or generalizations about a population based

on a sample.

1. Distributions

Normal Distribution

A normal distribution is bell-shaped, where most values are concentrated around the
mean. Many statistical tests assume normality in the data.

Poisson Distribution

The Poisson distribution models the probability of a specific number of events occurring
within a fixed interval. It is often used for rare events, such as the number of accidents
in a day.

Population Parameters & Sampling Errors

● Population Parameters: Characteristics of the entire population, such as mean

or variance.
● Sampling Error: The difference between a sample statistic and the true
population parameter due to random variation.

Confidence Intervals (CI)

A confidence interval is a range of values likely to contain the true population parameter.
A 95% confidence interval means that if we repeat the sampling many times, 95% of the
time, the interval will contain the true value.

2. Hypothesis Testing

Hypothesis testing is used to make statistical decisions about a population based on

sample data.

Z-Test

A Z-test is used when the sample size is large and the population standard deviation is
known. It tests whether the sample mean significantly differs from the population mean.
T-Test

A T-test is used when the sample size is small and the population standard deviation is
unknown.

● One-sample t-test: Compares a sample mean to a known value.

● Two-sample t-test: Compares the means of two independent groups to check if
they are significantly different.

Type I & Type II Errors

● Type I Error (False Positive): Rejecting a true null hypothesis.

● Type II Error (False Negative): Failing to reject a false null hypothesis.

ANOVA (Analysis of Variance)

ANOVA is used to compare means across multiple groups. If the variation between
groups is significantly greater than the variation within groups, the means are
considered different.

CONCLUSION

In this experiment, we learn about differential and inferential statistics.

Descriptive statistics provide insights into the structure and distribution of data, while
inferential statistics allow us to make predictions and test hypotheses.

Understanding these concepts is essential for data-driven decision-making, particularly

in real estate and other industries.

STAB22 Midterm-2022with-Keys
No ratings yet
STAB22 Midterm-2022with-Keys
23 pages
ADS-EXP1
No ratings yet
ADS-EXP1
4 pages
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
Descriptive & Inferential Statistics
No ratings yet
Descriptive & Inferential Statistics
6 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Week - 1 Day - 1 Descriptive Statistics
No ratings yet
Week - 1 Day - 1 Descriptive Statistics
40 pages
statistics
No ratings yet
statistics
10 pages
OSTA-WS2024-Lecture 03
No ratings yet
OSTA-WS2024-Lecture 03
38 pages
program-1_
No ratings yet
program-1_
15 pages
Angilan, Ef
No ratings yet
Angilan, Ef
5 pages
RM-EBBA-class-8-CH0-11-Quatitative-analysis
No ratings yet
RM-EBBA-class-8-CH0-11-Quatitative-analysis
37 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
Statistical Analysis_ Descriptive Stat (2)
No ratings yet
Statistical Analysis_ Descriptive Stat (2)
6 pages
Data Analytics Theory
No ratings yet
Data Analytics Theory
54 pages
Statistics[1]
No ratings yet
Statistics[1]
152 pages
Statistics - Imp Points
No ratings yet
Statistics - Imp Points
6 pages
Class 1 - 20th August 2024 - Descriptive Statistic
No ratings yet
Class 1 - 20th August 2024 - Descriptive Statistic
6 pages
Descriptive Statistics (1)
No ratings yet
Descriptive Statistics (1)
63 pages
02 Exploratory Data Analytics
No ratings yet
02 Exploratory Data Analytics
41 pages
Stats
No ratings yet
Stats
109 pages
Module 3 Data Analysis Techniques
No ratings yet
Module 3 Data Analysis Techniques
55 pages
Statistics For Data Science 1
No ratings yet
Statistics For Data Science 1
65 pages
Statistics For Data Science
100% (1)
Statistics For Data Science
27 pages
Statistics
No ratings yet
Statistics
11 pages
Statistics
No ratings yet
Statistics
152 pages
Session 1 On Descriptive Statistics
No ratings yet
Session 1 On Descriptive Statistics
24 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
13 pages
2.descriptive Statistics
No ratings yet
2.descriptive Statistics
49 pages
Quantitative Analysis: Dr. Basheer Ahmad Samim
No ratings yet
Quantitative Analysis: Dr. Basheer Ahmad Samim
71 pages
Business Analytics
No ratings yet
Business Analytics
44 pages
Statistics - Reviewer
No ratings yet
Statistics - Reviewer
12 pages
ge8 statistics
No ratings yet
ge8 statistics
2 pages
Arm & Sa Spring 13
No ratings yet
Arm & Sa Spring 13
64 pages
Statistics
No ratings yet
Statistics
68 pages
Descriptive Statistics Summary (Session 1-5) : Types of Data - Two Types
No ratings yet
Descriptive Statistics Summary (Session 1-5) : Types of Data - Two Types
4 pages
Contents UNIT 42
No ratings yet
Contents UNIT 42
21 pages
Parameter Statistic Parameter Population Characteristic Statistic Sample Characteristic
No ratings yet
Parameter Statistic Parameter Population Characteristic Statistic Sample Characteristic
9 pages
Statistics_Compendium_DMS IIT DELHI_2025
No ratings yet
Statistics_Compendium_DMS IIT DELHI_2025
18 pages
Unit-3 DS Students
No ratings yet
Unit-3 DS Students
35 pages
Module 1 Statistical Inference
No ratings yet
Module 1 Statistical Inference
67 pages
Chapter 1
No ratings yet
Chapter 1
44 pages
Stats 1 Module Updated
No ratings yet
Stats 1 Module Updated
53 pages
Module I. Basic Calculations. Average, Standard Deviation by Excel (5)
No ratings yet
Module I. Basic Calculations. Average, Standard Deviation by Excel (5)
48 pages
Chapter 3: Statistics
No ratings yet
Chapter 3: Statistics
3 pages
Hns 2321 Biostatistics Descritive Statistics (1)
No ratings yet
Hns 2321 Biostatistics Descritive Statistics (1)
35 pages
Statistics For Data Analyst
No ratings yet
Statistics For Data Analyst
7 pages
Reviewer for Psych Stats
No ratings yet
Reviewer for Psych Stats
36 pages
Notes Stats Quiz 2
No ratings yet
Notes Stats Quiz 2
10 pages
Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods
No ratings yet
Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods
20 pages
Module 1 Overview_of_Statistics
No ratings yet
Module 1 Overview_of_Statistics
11 pages
Business Statstics Complete
No ratings yet
Business Statstics Complete
13 pages
Statistics Notes Self Made
100% (1)
Statistics Notes Self Made
41 pages
Part 2-Chapter 3 - Describing Data - Edit
No ratings yet
Part 2-Chapter 3 - Describing Data - Edit
46 pages
Module 1 Statistical Inference
No ratings yet
Module 1 Statistical Inference
67 pages
Unit 2 DS pdf
No ratings yet
Unit 2 DS pdf
97 pages
Deck 1- Data Types, Data Display, and Summary 2024F
No ratings yet
Deck 1- Data Types, Data Display, and Summary 2024F
42 pages
1..
No ratings yet
1..
4 pages
Business Statistics: Dr. Basheer Ahmad Samim
No ratings yet
Business Statistics: Dr. Basheer Ahmad Samim
70 pages
ST Formula Sheet Midterm
No ratings yet
ST Formula Sheet Midterm
4 pages
Statistics and Its Types(v1.0)
No ratings yet
Statistics and Its Types(v1.0)
6 pages
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Css Exp 3
No ratings yet
Css Exp 3
3 pages
Css Exp 1
No ratings yet
Css Exp 1
1 page
CSS C31 149 RittikaRijhwani 2
No ratings yet
CSS C31 149 RittikaRijhwani 2
3 pages
CSS C31 149 RittikaRijhwani 3
No ratings yet
CSS C31 149 RittikaRijhwani 3
5 pages
DBDAL LAB - MANUAL - Final
No ratings yet
DBDAL LAB - MANUAL - Final
93 pages
Methods For Describing Sets of Data
No ratings yet
Methods For Describing Sets of Data
114 pages
Jama Gonzalez 2021 LD 210039 1626283704.68423
No ratings yet
Jama Gonzalez 2021 LD 210039 1626283704.68423
2 pages
Querying and Measurement of Geographic Data Lecture Slides
No ratings yet
Querying and Measurement of Geographic Data Lecture Slides
93 pages
Statistics 2002 2011
No ratings yet
Statistics 2002 2011
55 pages
Output Hasil Uji Normalitas Data Teh Erna
No ratings yet
Output Hasil Uji Normalitas Data Teh Erna
7 pages
Rita B
No ratings yet
Rita B
9 pages
Pearson and Correlation
No ratings yet
Pearson and Correlation
8 pages
Question Bank Fds
No ratings yet
Question Bank Fds
6 pages
0580 Statistics Lesson4 Cumulative v1
No ratings yet
0580 Statistics Lesson4 Cumulative v1
31 pages
Assignment Elementary Statistics Done
No ratings yet
Assignment Elementary Statistics Done
14 pages
Exploring Measures of Position and Its Real-life Application
No ratings yet
Exploring Measures of Position and Its Real-life Application
24 pages
Mco 22
No ratings yet
Mco 22
26 pages
Lecture Notes Ma12003 PDF
100% (1)
Lecture Notes Ma12003 PDF
105 pages
File Acc Praktikum
No ratings yet
File Acc Praktikum
51 pages
Unit 2.exercises
No ratings yet
Unit 2.exercises
8 pages
PythonTraining MD Saiful Azad UMP
No ratings yet
PythonTraining MD Saiful Azad UMP
54 pages
Cirugía Española 2023 Power 4
No ratings yet
Cirugía Española 2023 Power 4
13 pages
01.CSEC Maths JANUARY 2005. Solutions PDF
No ratings yet
01.CSEC Maths JANUARY 2005. Solutions PDF
30 pages
Question On Box Plot 1
No ratings yet
Question On Box Plot 1
7 pages
IGCSE Stats Probability
No ratings yet
IGCSE Stats Probability
12 pages
Unec 1711787818
No ratings yet
Unec 1711787818
6 pages
Fractura de Tobillo
No ratings yet
Fractura de Tobillo
10 pages
2b.data Visualization
No ratings yet
2b.data Visualization
7 pages
2 Maths 10th First Term Revision Paper 4
No ratings yet
2 Maths 10th First Term Revision Paper 4
35 pages
Review Question - C3 - SACR3080
No ratings yet
Review Question - C3 - SACR3080
10 pages
4_Outliers_+Transformaations ML
No ratings yet
4_Outliers_+Transformaations ML
28 pages
asp2sam
No ratings yet
asp2sam
10 pages
2019-3-Ked-Ktek-Marking Scheme Marking Scheme Trial Exam 2019 954/3 Mark
No ratings yet
2019-3-Ked-Ktek-Marking Scheme Marking Scheme Trial Exam 2019 954/3 Mark
5 pages

ADS EXP 1

Uploaded by

ADS EXP 1

Uploaded by

EXPERIMENT NO.

1. Measures of Central Tendency

Min & Max

The sum is the total of all data points in a given variable.

First Quartile (Q1) & Third Quartile (Q3)

Interquartile Range (IQR)

Correlation measures the relationship between two variables. It ranges from -1 to 1,

●​ A value close to 1 indicates a strong positive relationship.

4. Statistical Measures for Data Quality

Standard Error of Mean (SE of Mean)

Coefficient of Variation (CV)

CV is the ratio of standard deviation to mean, expressed as a percentage. It is useful for

●​ N missing: The number of missing values in the dataset.

Cumulative N & Cumulative Percent

●​ Cumulative N: A running total of observations as values increase.

Skewness measures the asymmetry of a dataset.

●​ High kurtosis: More extreme outliers.

A boxplot visually represents data distribution, highlighting quartiles, median, and

A correlation matrix is a heatmap that displays the strength and direction of

Inferential statistics help make predictions or generalizations about a population based

Population Parameters & Sampling Errors

●​ Population Parameters: Characteristics of the entire population, such as mean

Confidence Intervals (CI)

Hypothesis testing is used to make statistical decisions about a population based on

●​ One-sample t-test: Compares a sample mean to a known value.

Type I & Type II Errors

●​ Type I Error (False Positive): Rejecting a true null hypothesis.

ANOVA (Analysis of Variance)

In this experiment, we learn about differential and inferential statistics.

Understanding these concepts is essential for data-driven decision-making, particularly

You might also like

● A value close to 1 indicates a strong positive relationship.

● N missing: The number of missing values in the dataset.

● Cumulative N: A running total of observations as values increase.

● High kurtosis: More extreme outliers.

● Population Parameters: Characteristics of the entire population, such as mean

● One-sample t-test: Compares a sample mean to a known value.

● Type I Error (False Positive): Rejecting a true null hypothesis.