chapter 3 Statistical Description of Data (2)
chapter 3 Statistical Description of Data (2)
Chapter 3
Objectives
3.1 Introduction
Measures of central tendency (average) helps to condense a
mass of data into a single representative value
An average is a single value intended to represent a data set
as a whole
Best averages are:
based on all the observations
simple to understand and easy to interpret
easily manipulated algebraically
little affected by fluctuations of sampling
should not be influenced by extreme values
1-4
Measures of Central
Tendency
Median
Mode
Mean
1-5
3.3.1 Mean (x
Arithmetic mean
Let x1, x2,…xn are the values of a variable X. The
arithmetic mean is:
n
n
f i xi
x i
OR
x i 1
n
x i 1 fi
n i 1
• The population mean is:
X i
N
1-6
Example 1: The weight (in Kg) of eight youths: 32, 37, 41, 39, 36, 43,
48 and 36. The mean weight is:
n
x i
x i 1
= 312/8 = 39
n
Example 2: Obtain the mean of the following number(class work)
2, 7, 8, 2, 7, 3, 7
1-7
wx i i
xw i 1n
w
i 1
i
1-8
Example:
A student obtained the following percentage in an examination:English 60,
Biology 75, Mathematics 63, Physics 59, and chemistry 55.Find the students
weighted arithmetic mean if weights 1, 2, 1, 3, 3 respectively are allotted to the
subjects.
5
X W i i
60*1 75* 2 63*1 59* 3 55* 3 615
Xw i15 61.5
1 2 1 3 3 10
W
i 1
i
1-9
Merits:
It is rigidly defined.
It is based on all observation.
It is suitable for further mathematical treatment.
It is stable average, i.e. it is not affected by fluctuations of
sampling to some extent.
It is easy to calculate and simple to understand.
1-10
Demerits:
The harmonic mean of X1, X2 , X3 …Xn is denoted by H.M and given by:
1-13
Example: A cyclist pedals from his house to his college at speed of 10 km/hr and back
from the college to his house at 15 km/hr. Find the average speed.
Solution: Here the distance is constant
The simple H.M is appropriate for this problem.
X1= 10km/hr X2=15km/hr
2
H.M 12km/ hr
1 1
10 15
1-14
n 1 th
obsr , if n is odd
2
1 n
th
n 2
th
Properties of median
obsr , if n is even
2 2 2
• It is an
average of position
• It is affected by the number of observations than by extreme values
1-15
Example
Example 1: Odd Number of Observations
Suppose we have the data: 3,7,8,12,15(5 observations).
1. Order the data (already ordered here).
2. Since there are 5 observations (an odd number), the median is the middle
value:
Median = 8 (the third value).
Example 2: Even Number of Observations
Suppose we have the data: 2,5,8,12,14,18(6 observations).
1. Order the data (already ordered here).
2. Since there are 6 observations (an even number), take the average of the two
middle values (8 and 12)= 10
1-16
f i xi
x i 1
n
fi
i 1is the class mark (class mid point)
Where
xi
1-18
Median
n
cf
2
Lmed - LCB of median class;
w
•
•
~
fmd – frequency of median class;
x Lmed
w – class width;
f med
• cf – sum of frequencies lower than the median class
• Median class is the class containing the (n/2) th observation
1-21
Mode
1
x Lmod W
• The modal class is the class with the highest frequency
1 2
1
1-23
1-24
Cont.…
Measures of Variation
(Dispersion)
1-26
3.1 Introduction
The scatter or spread of items of a distribution is known as
dispersion or variation.
The degree to which numerical data tend to spread about an
average value is called dispersion or variation of the data.
In accounting, variation or dispersion refers to the degree
of spread or fluctuation in financial data, which can indicate
inconsistency or variability in performance or outcomes.
1-27
Range: The difference between the highest and lowest values in a data set, It
provides a basic measure of variability or dispersion, helping to understand the
spread of financial figures such as revenues, costs, profits, or expenses over a
period.
• Example: Analyzing the highest and lowest monthly revenues of a company.
Example in Accounting:
• If a company’s monthly revenues for a year range from $50,000 (lowest) to
$120,000 (highest),
• the range is: 120,000−50,000=70,000120,000 - 50,000 =
70,000120,000−50,000=70,000
• This indicates a $70,000 variation in revenue between the highest and lowest
months.
1-29
1-30
Cont.…
1-31
Distribution 1: 32 35 36 36 37 38 40 42 42 43 43 45
Distribution 2: 32 32 33 33 33 34 34 34 34 34 35 45
1-32
(x ) 2 (x x) 2
s
2 i1
2 i1
N
n 1
( x) ( )
2 2
N n
x
N n
i1 i1
x 2
x 2
i1 N
i1
n
N n 1
2
s s
2
1-33
( X i X )2
n 1
1-35
s
Less coefficient of variation – is said to be less variability or more
CV 100%
CV (Biology) = 29.11%
CV (Chemistry) = 17.2%
Chemistry students are more homogenous.
1-37
1-38
1-39
1-40
1-41
Con…
Examples:
1. Suppose the mean, the mode, and the standard deviation of a certain distribution are 32,
30.5 and 10 respectively. What is the shape of the curve representing the distribution?
Solution
1-42
Kurtosis
platykurtic.
The normal distribution which is not very high peaked or flat
Cont.…
1-44
Cont..
The moment coefficient of kurtosis:
1-45
Cont..
Example:
1-46
Cont.….
Correlation is statistical measures that describes the strength and
direction of a relationship between two variables
Correlation is measured using a correlation coefficient, typically
Cont.……
The measure of the degree of relationship between two continuous variables is known
as Pearson correlation coefficient.
The population correlation coefficient is represented by and its estimator by r.
simple correlation analysis:- is to measure the strength and direction of the linear
relationship between two variables. It determines how closely the variables are related.
It measures the strength and direction of the linear association between two variables
variables.
Suppose we have two variables
1, When higher values of X are associated with higher values of Y and lower values of X
are associated with lower values of Y, then the correlation is said to be positive or
direct.
1-49
Cont.……
Examples:
- Income and expenditure
- Number of hours spent in studying and the score obtained
- Height and weight
- Distance covered and fuel consumed by car.
2, When higher values of X are associated with lower values of Y and lower values of X
are associated with higher values of Y, then the correlation is said to be negative or
inverse.
Examples: Speed and Travel Time :- As the speed of a vehicle (X) increases, the time
taken to travel a fixed distance (Y) decreases. Slower speeds result in longer travel
times.
Demand and Price (for Normal Goods):- For many normal goods, as the price (X)
increases, the demand (Y) decreases. Conversely, as the price decreases, demand tends
to rise.
Unemployment Rate and Consumer Spending : When unemployment rates (X)
Cont.…
The presence of correlation between two variables may be due to
three reasons:
1. One variable being the cause of the other. The cause is called
“subject” or “independent” variable, while the effect is called
“dependent” variable.
2. Both variables being the result of a common cause.
Let X1= be ESLCE result
Y1=be rate of surviving in the University
Y2=be the rate of getting a scholar ship.
Cont.…
3. Chance:
The correlation that arises by chance is called spurious
correlation.
Examples:
Price of teff in Addis Ababa and grade of students in USA.
Weight of individuals in Ethiopia and income of individuals in
Kenya.
One common cause is the presence of a third variable, known
Cont.….
For example1, Smoking and Lung Cancer (with Age as a Confounder)
IV: Smoking
DV: Lung cancer risk
Confounder: Age
Older individuals may smoke more frequently and also have a higher risk of
developing lung cancer, making age a potential confounding factor.
Example2 :- Coffee Consumption and Heart Disease (with Stress as a
Confounder)
IV: Coffee consumption
DV: Risk of heart disease
Confounder: Stress
People who are stressed may drink more coffee and also be at higher risk for
heart disease, creating a confounding effect.
1-53
Cont.……
The correlation coefficient between X and Y denoted by r is given by
1-54
Cont.……
Example Calculate the simple correlation between mid semester and final
exam scores of 10 students (both out of 50)
Cont.……
Solution