0% found this document useful (0 votes)
2 views

Statistic Module 2

The document provides an overview of descriptive statistics, including measures of central tendency (mean, median, mode) and measures of dispersion (range, variance, standard deviation, interquartile range). It also discusses measures of association such as covariance and correlation, along with concepts of skewness and probability basics. Additionally, it explains uniform distribution and provides examples for calculating probabilities in various scenarios.

Uploaded by

mauryaayush1511
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Statistic Module 2

The document provides an overview of descriptive statistics, including measures of central tendency (mean, median, mode) and measures of dispersion (range, variance, standard deviation, interquartile range). It also discusses measures of association such as covariance and correlation, along with concepts of skewness and probability basics. Additionally, it explains uniform distribution and provides examples for calculating probabilities in various scenarios.

Uploaded by

mauryaayush1511
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Characterizing Data (descriptive statistics)

Descriptive statistics refers to a set of methods used to summarize and


describe the main features of a dataset, such as its central tendency,
variability, and distribution. These methods provide an overview of the
data and help identify patterns and relationships
Mean:

A) Mean is the average of the given numbers and is calculated by


dividing the sum of given numbers by the total number of
numbers
B) Formula - x̄ =∑ x/n

Median
The median is the middle value when data is arranged in
ascending order. If there an even number of observation the median is
the average of the two middle values
Example:
Data:(10,20,30,40,50)
Median : 30(middle value)
For an even dataset: [10.,20,30,40]
Median = 20+30/2=25
Mode
The mode is the most frequently occurring value in a dataset.
Ex:data[2,3,4,5] – mode = 3(since it appear twice)
2. measure of dispersion
Dispersion measure how spread out the data is.
A) Range
The range is the difference between the maximum and minimum
values.
Range = Max – Min
Example : data :[10,20,30,40,50]
Range =50-10=40
b) Variance (o² for population, s² for sample)

Variance measures how much each data point deviates from the
mean

Population Variance Formula:


02-Σ(Χ – μ) /n

where u is the population mean, and N is the population size.

Sample Variance Formula:

Σ(Χ – Χ)’2/n-1
Where x is the sample mean.
c) Standard Deviation :
Standard deviation is the square root of variance and provides a
measure of spread in the same units as the data.

Example:

If the test scores are (10, 20, 30, 40, 50), we calculate the standard
deviation to measure the spread from the mean.
QUE :A company tracks the daily sales (in thousands) : 5,8,12,18,22.
Find the range and standard deviation
 Range 22-5-17

 Standard Deviation Calculation:

 Mean overline X = (5 + 8 + 12 + 18 + 22)/5 = 13

 Variance sigma ^ 2 = ((5 - 13) ^ 2 + (8 - 13) ^ 2 + (12 - 13) ^ 2 + (18


- 13) ^ 2 + (22 - 13) ^ 2)/5 = (64 + 25 + 1 + 25 + 81)/5 = 196/5 =
39.2

 Standard deviation = √39.26.26

d) Interquartile Range (IQR)


IQR measures the range within the middle 50% of the data and is
calculated as:
IQRQ-Q3-Q1
where Q1 is the first quartile (25th percentile) and Qs is the third
quartile (75th percentile).
Q3-1.5* IQR< OUTLIER----40<100
Q1-1.5* IQR> OUTLIER---70<100

3. Measure of Association :

Association measures how two or more variables relate to each other.

a) Covariance (σχγ)

Covariance indicates the direction of the relationship between two


variables.

Σ(Χi - Χ) (Υi - Y)/n

Positive covariance: Variables move together (increase or decrease


together).

Negative covariance: One variable increases while the other decreases.


b) b) Correlation (Pearson's r)

Correlation measures the strength and direction of the relationship


betwe

XY= σχ/ογoy

R = 1-Perfect positive correlation.

R= --1-Perfect negative correlation


R = --0 – no correlation
Example : height and weight usauallyhave a positive correlation.
A researcher wants to study the relationship between study hours and
test scores. The collected data is
Study hour test score
2,4,6,8,10 50,60,70,80,90
Solution :
The mean of study hour = X =2+4+6+8+10/5=30/5=6
The mean of test score : 50+60+70+80+90/5=350/5=70.

Step 3: Compute (X – X)(Y - Y)


Student

Y-Y

-20

Product (XX)(YY)

-10

(-4)x(-20)-80

(-2)x(-10)-20

0x0-0

10

20
2x10-20

4x20-80

Σ(Χ - Χ) (-) = 80+20+0+20+80 = 200


110

Step 4: Compute (XX) and (YP)

Student

(X-X)

(-8)

(-4)-16

(-20)-400

(-2)-4

(-10)-100

0-0
0-0

2-4

10-100

4-16

20-400

Σ(Χ - Χ) - 16+4+0+4+16-40

ΣΥ - Ρ) - 400+ 100+0+100+400-1000

110

Step 4: Compute (XX) and (YP)

Student

1
(X-X)

(-8)

(-4)-16

(-20)-400

(-2)-4

(-10)-100

0-0

0-0

2-4

10-100

4-16
20-400

Σ(Χ - Χ) - 16+4+0+4+16-40

ΣΥ - Ρ) - 400+ 100+0+100+400-1000

4. Skewness

Skewness measures the symmetry of dataset's detitution

Positive Skew (Right-skewed Most values are concentrated on the left


with a long right tall , e.g. income distributions
Negative Skew (Lets skewed Most values are concentrated the right
with left tall.
Zero Skewness-The data is symmetric.
5. Probability Basics

Probability quantifies the likelihood of an event occuming.

Key Probability Rules:

1. Probability of an event: 0≤P(A) ≤1


2. Sum of probabilities of all possible outcomes 1.
3. Independent Events -The occurrence of one event does not affect the
probability of another.
4. Conditional Probability-The probability of an event occuring given
another event has occured.
Formula:
P(a)B)= p(a UNION B)/P(B)
QUESTION:
A bag contains 5red ball,3blue ball and 2 green ball.if one ball is
randomly drawn, what is the probability that is :
a) A red ball b) a blue ball c) green ball
Answer : total ball = 5+3+2=10
Probability of a red ball = 5/10=0.5
Probability of bluw ball = 3/10=0.3
Probability of green ball = 2/10=0.2
Question :
A fair dice is rolled what is the probability of getting not a 6?
Answer :
Probability of rolling a 6 = 1/6
Probability of not rolling a 6 = 1-1/6 = 5/6 or 83.33%.
6. Uniform distribution
A uniform distribution has all outcomes equally likely.
Discrete uniform distribution:
If an event has n possible outcome each outcome has probability.
P(x=x) = 1/n
Continuos uniform distribution:
A continuos variable x is uniformly distributed between a and b if :
F(x) = 1/b-a, a<x<b
Example : randomly picking a number between
A train arrives at a station at a random time between 12:00 PM and
12:30 PM. If a passenger arrives at the station at a random time
between 12:00 PM and 12:30 PM, what is the probability that they wait
for less than 10 minutes for the train?

Solution:

The time of train arrival follows a uniform distribution between 0 to 30


minutes.

* The total possible outcomes = 30 minutes

* The event of waiting less than 10 minutes happens if the train arrives
within 10 minutes after the passenger arrives.

* The probability is given by:

10 1 P(wait < 10) 10/30=1/3=0.3333=33.33%


2. Discrete Uniform Distribution (Dice Roll)
Question:

A fair 6-sided die is rolled. Since each outcome is equally likely, what is
the probability of rolling

a) A number greater than 47

b) An even number?

Solution:

The die outcomes are (1, 2, 3, 4, 5, 6)

Each number has an equal probability of occurring: P(X)

(a) Probability of rolling a number greater than 4

Favorable outcomes: (5,6)


A) Probability of rolling a number greater than 4
P(x>4) = favorable outcomes / total outcomes = 2/6 = 1/3=33.33%
b) Probability of rolling an even no.
P(x is even) = favorable outcomes/total outcomes = 3/6=1/2 =
50%.

You might also like