0% found this document useful (0 votes)
0 views

chapter 3 Statistical Description of Data (2)

Chapter 3 covers numerical descriptive techniques, focusing on measures of central tendency (mean, median, mode) and dispersion (range, variance, standard deviation). Students will learn to calculate and interpret these measures to analyze data variability and relationships between variables. The chapter also discusses the merits and demerits of different means, including arithmetic, geometric, and harmonic means.

Uploaded by

aliebraahim04
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

chapter 3 Statistical Description of Data (2)

Chapter 3 covers numerical descriptive techniques, focusing on measures of central tendency (mean, median, mode) and dispersion (range, variance, standard deviation). Students will learn to calculate and interpret these measures to analyze data variability and relationships between variables. The chapter also discusses the merits and demerits of different means, including arithmetic, geometric, and harmonic means.

Uploaded by

aliebraahim04
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 55

1-1

Chapter 3

Numerical Descriptive Techniques


1-2

Objectives

 At the end of this chapter the students will be able to


 calculate and interpret measures of central tendency (mean, median, mode) for
the dataset to identify typical values representing the data.
 assess and analyze measures of dispersion (range, variance, standard
deviation) to understand the spread or variability within the dataset.
 compare two or more groups of numbers in terms of their variability.
 investigate the statistical association between key variables using measures
such as correlation and contingency tables, establishing relationships and
patterns within the data.
1-3

3.1 Introduction

Measures of central tendency (average) helps to condense a
mass of data into a single representative value

An average is a single value intended to represent a data set
as a whole

Best averages are:

based on all the observations

simple to understand and easy to interpret

easily manipulated algebraically

little affected by fluctuations of sampling

should not be influenced by extreme values
1-4

3.3 Types of central tendency


Measures of Central
Tendency
 Median

 Mode

 Mean
1-5

3.3.1 Mean (x
Arithmetic mean

Let x1, x2,…xn are the values of a variable X. The
arithmetic mean is:
n
n
 f i xi
x i
OR
x  i 1
n
x  i 1  fi
n i 1
• The population mean is:

  X i

N
1-6


Example 1: The weight (in Kg) of eight youths: 32, 37, 41, 39, 36, 43,
48 and 36. The mean weight is:
n

x i
x i 1
= 312/8 = 39
n
 Example 2: Obtain the mean of the following number(class work)
2, 7, 8, 2, 7, 3, 7
1-7

Weighted Arithmetic mean


 used to calculate the average when the relative importance

of the observations differs. This relative importance is


known as weight
 If x1, x2, …, xn have weights w1, w2, …, wn, respectively,

then the weighted arithmetic mean becomes:


n

wx i i
xw  i 1n
w
i 1
i
1-8

 Example:
 A student obtained the following percentage in an examination:English 60,
Biology 75, Mathematics 63, Physics 59, and chemistry 55.Find the students
weighted arithmetic mean if weights 1, 2, 1, 3, 3 respectively are allotted to the
subjects.
5

X W i i
60*1 75* 2 63*1 59* 3 55* 3 615
Xw  i15   61.5
1 2 1 3 3 10
W
i 1
i
1-9

Merits and Demerits of Arithmetic Mean

 Merits:
 It is rigidly defined.
 It is based on all observation.
 It is suitable for further mathematical treatment.
 It is stable average, i.e. it is not affected by fluctuations of
sampling to some extent.
 It is easy to calculate and simple to understand.
1-10

Demerits:

 It is affected by extreme observations.


 It can not be used in the case of open-end classes.
 It can not be determined by the method of inspection.
 It can not be used when dealing with qualitative characteristics, such as
intelligence, honesty, beauty.
 It can be a number which does not exist in a serious.
 Some times it leads to wrong conclusion if the details of the data from which it
is obtained are not available.
 It gives high weight to high extreme values and less weight to low extreme
values.
1-11

The Geometric Mean


1-12

The Harmonic Mean

The harmonic mean of X1, X2 , X3 …Xn is denoted by H.M and given by:
1-13

 Example: A cyclist pedals from his house to his college at speed of 10 km/hr and back
from the college to his house at 15 km/hr. Find the average speed.
 Solution: Here the distance is constant
 The simple H.M is appropriate for this problem.
 X1= 10km/hr X2=15km/hr

2
H.M  12km/ hr
1 1

10 15
1-14

3.3.2 The Median ~


x

It is the middle value for data arranged in order of magnitude

Median for raw (ungrouped) data:

 n  1  th
  obsr , if n is odd
 2 

 1   n 
th
 n  2 
th


Properties of median
    obsr , if n is even
2   2   2  
• It is an 
 average of position
• It is affected by the number of observations than by extreme values
1-15

Example
 Example 1: Odd Number of Observations
 Suppose we have the data: 3,7,8,12,15(5 observations).
1. Order the data (already ordered here).
2. Since there are 5 observations (an odd number), the median is the middle
value:
Median = 8 (the third value).
 Example 2: Even Number of Observations
 Suppose we have the data: 2,5,8,12,14,18(6 observations).
1. Order the data (already ordered here).
2. Since there are 6 observations (an even number), take the average of the two
middle values (8 and 12)= 10
1-16

3.3.3 The Mode x̂



It is the most frequent value in the data set
Properties of mode

is easy to calculate and understand

is not affected by extreme values

is not based on all observations

is not used in further analysis of data

is not unique
 Example Suppose we have the data: 4,5,5,6,8

• The mode is 5, as it appears most frequently.


1-17

The Mean, Median and Mode for


grouped data
Mean

 f i xi
x  i 1
n

 fi
i 1is the class mark (class mid point)
Where

xi
1-18

Example: Calculate the mean time spent by automobile workers.

Time Class mark Number of


workers
15.5-16.5 18.5 3
16.5-27.5 24.5 6
27.5-33.5 30.5 8
33.5-39.5 36.5 4
39.5-45.5 42.5 3
The mean time spent is 30.74 minutes.
45.5-51.5 48.5 1
1-19
1-20

Median

n 
  cf 
2
Lmed - LCB of median class;

 w


~
fmd – frequency of median class;

x Lmed
w – class width;

f med
• cf – sum of frequencies lower than the median class
• Median class is the class containing the (n/2) th observation
1-21

Example: Find the median for the following frequency distribution.

The median 24.5. Class boundary Frequency


5.5 – 10.5 1
10.5 – 15.5 2
15.5 – 20.5 3
20.5 – 25.5 5
25.5 – 30.5 4
30.5 – 35.5 3
35.5 – 40.5 2
1-22

Mode

  1 
x Lmod   W
• The modal class is the class with the highest frequency

 1  2 

1
1-23
1-24

Example: Calculate the modal time spent by the automobile


workers.
Time (in Minutes) Number of workers
16-21 3
22-27 6
28-33 8
34-39 4
40-45 3
46-51 1
Total 25
• The mode is 29.5
1-25

Cont.…

Measures of Variation
(Dispersion)
1-26

3.1 Introduction

The scatter or spread of items of a distribution is known as
dispersion or variation.

The degree to which numerical data tend to spread about an
average value is called dispersion or variation of the data.

In accounting, variation or dispersion refers to the degree
of spread or fluctuation in financial data, which can indicate
inconsistency or variability in performance or outcomes.
1-27

Feature Absolute Measure Relative Measure


Definition Measures actual spread of Measures spread relative to the mean
values
Examples Range, variance, standard Coefficient of variation, relative range
deviation
Unit of Measure Same as data or its square Unitless (often a percentage)
Best Used For Single dataset with the Comparing variability across datasets
same scale with different scales
Influence of Mean Independent of mean Depends on the mean value
Value

1-28

Types of measures of variation

 Range: The difference between the highest and lowest values in a data set, It
provides a basic measure of variability or dispersion, helping to understand the
spread of financial figures such as revenues, costs, profits, or expenses over a
period.
• Example: Analyzing the highest and lowest monthly revenues of a company.
 Example in Accounting:
• If a company’s monthly revenues for a year range from $50,000 (lowest) to
$120,000 (highest),
• the range is: 120,000−50,000=70,000120,000 - 50,000 =
70,000120,000−50,000=70,000
• This indicates a $70,000 variation in revenue between the highest and lowest
months.
1-29
1-30

Cont.…
1-31

Relative Range (RR)

Distribution 1: 32 35 36 36 37 38 40 42 42 43 43 45
Distribution 2: 32 32 33 33 33 34 34 34 34 34 35 45
1-32

Variance and Standard Deviation

Population Variance Sample Variance


n
N

(x  ) 2  (x  x) 2

s 
2 i1

 2  i1
N
n  1
( x) ( )
2 2
N n
 x
N n
i1 i1
x 2
 x  2

 i1 N 
i1
n
N n  1
  2

s s
2
1-33

Variance – is average of the squared deviations from the mean


Standard deviation - is the square root of the variance
Example: The height of nine students was measured in inches and the data is
presented below.
Height(x): 69 66 67 69 64 63 65 68 72
Calculate the population variance and standard deviations.
Variance = 7.11 inch; S.D = 2.66 inch

1-34

Special properties of Standard deviations

 ( X i  X )2 
n 1
1-35

Coefficient of variation (CV)



it is the corresponding relative measure of standard deviation

It is used to compare the variability of two or more different groups

s
Less coefficient of variation – is said to be less variability or more
CV  100%

consistent or more uniform or more homogeneous.


x
1-36

Example: The students of Biology and Chemistry took Stat


273 course. The following information was recorded

Departments Mean S.D


Biology 79 23
Chemistry 64 11

CV (Biology) = 29.11%
CV (Chemistry) = 17.2%

Chemistry students are more homogenous.
1-37
1-38
1-39
1-40
1-41

Con…
 Examples:
1. Suppose the mean, the mode, and the standard deviation of a certain distribution are 32,
30.5 and 10 respectively. What is the shape of the curve representing the distribution?
Solution
1-42

Kurtosis

 Kurtosis is the degree of peakdness of a distribution,usally taken


relative to a normal distribution.
 A distribution having relatively high peak is called leptokurtic.
 If a curve representing a distribution is flat topped, it is called

platykurtic.
 The normal distribution which is not very high peaked or flat

topped is called mesokurtic.


1-43

Cont.…
1-44

Cont..
 The moment coefficient of kurtosis:
1-45

Cont..
 Example:
1-46

Statistical measures of association

The purpose of this session is to answer the following questions statistically:


1. Are two or more variables linearly related?
2. If so, what is the strength of the relationship?
3. What type of relationship exists?
4. What kind of predictions can be made from the relationship?
Regression Analysis: is a statistical technique that can be used to develop a

mathematical equation showing how variables are related.


Correlation Analysis: deals with the measurement of the closeness of the

relation ship which are described in the regression equation.


1-47

Cont.….
 Correlation is statistical measures that describes the strength and
direction of a relationship between two variables
 Correlation is measured using a correlation coefficient, typically

between -1 and +1.


 A value close to +1 indicates a strong positive relationship,

meaning as one variable increases, the other tends to increase as


well.
 A value close to -1 indicates a strong negative relationship,

meaning as one variable increases, the other tends to decrease.


 Correlation analysis is applicable when examining relationships

between specific types of variables, particularly quantitative


variables, and can also apply to certain types of ordinal variables.
1-48

Cont.……
 The measure of the degree of relationship between two continuous variables is known
as Pearson correlation coefficient.
The population correlation coefficient is represented by  and its estimator by r.
 simple correlation analysis:- is to measure the strength and direction of the linear

relationship between two variables. It determines how closely the variables are related.
 It measures the strength and direction of the linear association between two variables

without considering the influence of other variables.


 Simple correlation is also known as bivariate correlation because it involves only two

variables.
 Suppose we have two variables

 X=(X1 X2 ….. Xn) and Y=( Y1 Y2……Yn)

1, When higher values of X are associated with higher values of Y and lower values of X
are associated with lower values of Y, then the correlation is said to be positive or
direct.
1-49

Cont.……
 Examples:
- Income and expenditure
- Number of hours spent in studying and the score obtained
- Height and weight
- Distance covered and fuel consumed by car.
2, When higher values of X are associated with lower values of Y and lower values of X
are associated with higher values of Y, then the correlation is said to be negative or
inverse.
 Examples: Speed and Travel Time :- As the speed of a vehicle (X) increases, the time

taken to travel a fixed distance (Y) decreases. Slower speeds result in longer travel
times.
 Demand and Price (for Normal Goods):- For many normal goods, as the price (X)

increases, the demand (Y) decreases. Conversely, as the price decreases, demand tends
to rise.
 Unemployment Rate and Consumer Spending : When unemployment rates (X)

increase, consumer spending (Y) often decreases. When unemployment decreases,


consumer spending tends to increase.
1-50

Cont.…
The presence of correlation between two variables may be due to
three reasons:
1. One variable being the cause of the other. The cause is called
“subject” or “independent” variable, while the effect is called
“dependent” variable.
2. Both variables being the result of a common cause.
Let X1= be ESLCE result
Y1=be rate of surviving in the University
Y2=be the rate of getting a scholar ship.

Both X1&Y1 and X1&Y2 have high positive correlation, likewise


Y1 & Y2 have positive correlation but they are not directly related, but they are
related to each other via X1.
1-51

Cont.…

3. Chance:
 The correlation that arises by chance is called spurious

correlation.
 Examples:
 Price of teff in Addis Ababa and grade of students in USA.
 Weight of individuals in Ethiopia and income of individuals in

Kenya.
 One common cause is the presence of a third variable, known

as a confounding variable, which influences both the


Apparently correlated variables. Failing to account for this
confounding variable can lead to a false impression of a causal
relationship between the original two variables.
1-52

Cont.….
 For example1, Smoking and Lung Cancer (with Age as a Confounder)
 IV: Smoking
 DV: Lung cancer risk
 Confounder: Age
 Older individuals may smoke more frequently and also have a higher risk of
developing lung cancer, making age a potential confounding factor.
 Example2 :- Coffee Consumption and Heart Disease (with Stress as a
Confounder)
 IV: Coffee consumption
 DV: Risk of heart disease
 Confounder: Stress
 People who are stressed may drink more coffee and also be at higher risk for
heart disease, creating a confounding effect.
1-53

Cont.……
 The correlation coefficient between X and Y denoted by r is given by

1-54

Cont.……
 Example Calculate the simple correlation between mid semester and final
exam scores of 10 students (both out of 50)

Student Mid Sem. Exam Final Sem. Exam


(X) (Y)
1 31 31
2 23 29
3 41 34
4 32 35
5 29 25
6 33 35
7 28 33
8 31 42
9 31 31
10 33 34
1-55

Cont.……
 Solution

You might also like