0% found this document useful (0 votes)

40 views79 pages

217 - Chapter 3 - Descriptive Statistics - Numerical Measures

Uploaded by

ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views79 pages

217 - Chapter 3 - Descriptive Statistics - Numerical Measures

Uploaded by

ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 79

King Khalid University

College of Sciences and Arts of Tanumah

Department of Computer Science

217CSM-3: Statistical Programming

Chapter 3 - Descriptive Statistics: Numerical Measures

Dr. Moheddine Imsatfia

King Khalid University
College of Sciences and Arts of Tanumah
Department of Computer Science

217CSM-3: Statistical Programming

Chapter 3 - Descriptive Statistics: Numerical Measures

THE MEAN, MEDIAN AND MODE

NUMERICAL MEASURES

● If the measures are computed for data from a sample,

they are called sample statistics.
● If the measures are computed for data from a population,
they are called population parameters.
● A sample statistic is referred to as the point estimator of
the corresponding population parameter.

3 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE MEAN, MEDIAN AND MODE

● When given a set of raw data one of the most useful ways
of summarising that data is to find an average of that set
of data.
● An average is a measure of the centre of the data set.
● There are three common ways of describing the centre of
a set of numbers.
● They are the mean, the median and the mode and are
calculated as follows.

4 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE MEAN, MEDIAN AND MODE

● The mean − add up all the numbers and divide by how

many numbers there are.
● The median − is the middle number. It is found by putting
the numbers in order and taking the actual middle number
if there is one, or the average of the two middle numbers if
not.
● The mode − is the most commonly occurring number.

5 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE MEAN, MEDIAN AND MODE

● Let’s illustrate these by calculating the mean, median and

mode for the following data.
● Weight of luggage presented by airline passengers at the
check-in (measured to the nearest kg).

6 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE MEAN, MEDIAN AND MODE

7 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE MEAN, MEDIAN AND MODE

● Central tendency describes the tendency of the

observations to bunch around a particular value, or
category.
● The mean, median and mode are all measures of central
tendency.
● They are all measures of the ‘average’ of the distribution.
● The best one to use in a given situation depends on the
type of variable given.

8 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE MEAN, MEDIAN AND MODE

● For example, suppose a class of 20 students own among

them a total of 17 pets as shown in the following table.

● Which measure of central tendency should we use here?

9 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE MEAN, MEDIAN AND MODE

● If our focus of interest were on the type of pet owned, we

would use the mode as our average. → ‘Cat’ would be
described as the ‘modal category’, as this is the category
that occurs most often.
● If, on the other hand, we were not interested in the type of
pet kept but the average number of pets owned then the
mean would be an appropriate measure of central
tendency. → Here the mean is 17/20 = 0.85.

10 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE MEAN, MEDIAN AND MODE

● The mean has some advantages over the median as a

measure of central tendency of quantity variables.
● One of them is that all the observed values are used to
calculate the mean. However, to calculate the median, while
all the observed values are used in the in the ranking, only
the middle or middle two values are used in the calculation.
● Another is that the mean is fairly stable from sample to
sample.
● This means that if we take several samples from the same
population their means are less likely to vary than their
medians.

11 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE MEAN, MEDIAN AND MODE

● However, the median is used as a measure of central

tendency if there are a few extreme values observed.
● The mean is very sensitive to extreme values and it may
not be an appropriate measure of central tendency in
these cases.
● With the exception of cases where there are obvious
extreme values, the mean is the value usually used to
indicate the centre of a distribution.
● We can also think of the mean as the balance point of a
distribution.

12 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE MEAN, MEDIAN AND MODE

● For example, consider the distribution of students’ marks

on a test :

● Without doing any calculation, we would guess the

balance point of the distribution to be approximately 58.
(Think of it as the centre of a see-saw.)

13 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE MEAN, MEDIAN AND MODE

● Exercises 1 :
Ten patients at a doctor’s surgery wait for the following
lengths of times to see their doctor.
5mins 9mins 17mins 22mins 8mins 11mins
2mins 16 mins 55 mins 5mins
(1) What are the mean, median and mode for these data?
(2) What measure of central tendency would you use here?

14 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE MEAN, MEDIAN AND MODE

● Exercises 1 :
1. Mean =15 mins, Median = (9+11)/2 = 10 mins, Mode = 5
mins.
2. The median would be the preferred measure of central
tendency to use here and not the mean, since there is an
outlier of 55 mins. This is making the assumption that the
outlier is a freak value and should be disregarded. The
mode would not be suitable, because it is just chance that
two people waited for the same period of time, and all the
others waited for different time periods.

15 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE MEAN, MEDIAN AND MODE

● Exercises 2 :
2. What is the appropriate measure of central tendency to
use with these data?

16 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE MEAN, MEDIAN AND MODE

● Exercises 2 :
The mode is the only possible measure of central tendency
to use here, since we are dealing with category data. The
modal category is ‘train’.

17 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE MEAN, MEDIAN AND MODE

● Exercises 3 :
Which measure of central tendency is best used to measure
the average house price in KSA?

18 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE MEAN, MEDIAN AND MODE

● Exercises 3 :
• The median is used to indicate average house prices in
KSA.
• The inclusion of the very expensive houses (those worth
millions of SAR) in the calculation of the mean would
make the ‘average’ house price too high to be
representative of the general market.
• Nor is the mode suitable because it could happen by
chance that a very large number of houses all had the
same non-representative value.

19 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE MEAN, MEDIAN AND MODE

● Exercises 4 :
Without doing any calculation, estimate the mean of the
distribution in figure below :

20 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE MEAN, MEDIAN AND MODE

● Exercises 4 :
• The actual value for the mean is 56.
• How close to this value did you get with your guess?

21 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

King Khalid University
College of Sciences and Arts of Tanumah
Department of Computer Science

217CSM-3: Statistical Programming

Chapter 3 - Descriptive Statistics: Numerical Measures

MEASURES OF DISPERSION
MEASURES OF DISPERSION

● The mean is the value usually used to indicate the centre

of a distribution.
● If we are dealing with quantity variables our description of
the data will not be complete without a measure of the
extent to which the observed values are spread out from
the average.
● We will consider several measures of dispersion and
discuss the merits and pitfalls of each.

23 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE RANGE

● One very simple measure of dispersion is the range.

● Lets consider the two distributions given in A and B.

● They represent the marks of a group of thirty students on

two tests. B

24 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE RANGE

● Here it is clear that the marks on test A are more spread

out than the marks on test B, and we need a measure of
dispersion that will accurately indicate this.
• On test A, the range of marks is 70 − 45 = 25.
• On test B, the range of marks is 65 − 45 = 20.
● Here the range gives us an accurate picture of the
dispersion of the two distributions.

25 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE RANGE

● However, as a measure of dispersion the range is

severely limited.
● Since it depends only on two observations, the lowest and
the highest, we will get a misleading idea of dispersion if
these values are outliers.

26 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE RANGE

● This is illustrated very well if the students’ marks are

distributed as in C and D :

27 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE RANGE

● On test C, the range is still 70 − 45 = 25.

● On test D, the range is now 72 − 40 = 32,
● Apart from the outliers, the distribution of marks on test D
is clearly less spread out than that of C.
● We want a measure of dispersion that will accurately give
a measure of the variability of the observations.
● We will concentrate now on the measure of dispersion
most commonly used, the standard deviation.

28 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

STANDARD DEVIATION

● Suppose we have a set of data where there is no

variability in the observed values.
● Each observation would have the same value, say 3, 3, 3,
3 and the mean would be that same value, 3.
● Each observation would not be different or deviate from
the mean.
● Now suppose we have a set of observations where there
is variability. The observed values would deviate from the
mean by varying amounts.

29 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

STANDARD DEVIATION

● The standard deviation is a kind of average of these

deviations from the mean.
● This is best explained by considering the following example.
• Take, for example, the following grades of 6 students:
56 48 63 60 51 52.
• Mean = 55.
• To find how much our observed values deviate from the
mean, we subtract the mean from each.
• Observed values 56 48 63 60 51 52
• Deviations from Mean +1 −7 +8 +5 −4 −3

30 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

STANDARD DEVIATION

● We cannot, at this stage, simply take the average of the

deviations as their sum is zero.
(+1) + (−7) + (+8) + (+5) + (−4) + (−3) = 0
● We get around this difficulty by taking the square of the
deviations.
● This gets rid of the minus signs.
● Deviations +1 −7 +8 +5 −4 −3
● Squared deviations 1 49 64 25 16 9

31 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

STANDARD DEVIATION

● We can now take the mean of these squared deviations.

This is called the variance.
• Variance = (1 + 49 + 64 + 25 + 16 + 9)/6 = 27.33.
● The variance is a very useful measure of dispersion for
statistical inference, but for our purposes it has a major
disadvantage. Because we squared the deviations, we
now have a quantity in square units. So to get the
measure of dispersion back into the same units as the
observed values, we define standard deviation as the
square root of the variance.
• Standard Deviation = √Variance =√27.33 = 5.228.

32 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

STANDARD DEVIATION

● Standard deviation for samples

● The variance and the standard deviation of samples are
computed as follows:
● The variance of samples of size n is represented by s2
and is given by

● The standard deviation of samples of size n is

represented by s2 and is given by

33 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

STANDARD DEVIATION

● Standard deviation for a population

● The variance and the standard deviation of a population are
computed as follows:
The variance of a population of size N is represented
by 𝛔2 and is given by
●

The standard deviation of a population of size N is

represented by 𝛔 and is given by
●

34 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

STANDARD DEVIATION

● The mean for a sample consisting of n observations is

● and the mean for a population consisting of N

observations is

35 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

STANDARD DEVIATION

● The standard deviation may be thought of as the ‘give or

take’ number.
● That is, on average, the student’s grade will be 55, give or
take 5 marks.
● The standard deviation is a very good measure of
dispersion and is the one to use when the mean is used
as the measure of central tendency.

36 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

COEFFICIENT OF VARIATION

● The coefficient of variation is a helpful statistic in comparing

the degree of variation from one data series to the other,
although the means are considerably different from each
other.
● The standard deviation of an exponential distribution is
equivalent to its mean, the making its coefficient of variation
to equalize 1.
● Distributions with a coefficient of variation to be less than 1
are considered to be low-variance, whereas those with a CV
higher than 1 are considered to be high variance.

37 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

COEFFICIENT OF VARIATION

● The coefficient of variation is equal to the standard

deviation divided by the mean. The result is usually
multiplied by 100 to express it as a percent.
● The coefficient of variation for a sample is given by

● and the coefficient of variation for a population is given

38 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

COEFFICIENT OF VARIATION

● Example:
▪ A national sampling of prices for new and used cars found that the mean
price for a new car is 20,100 SAR and the standard deviation is 6,125 SAR
and that the mean price for a used car is 5,485 SAR with a standard
deviation equal to 2,730 SAR.
▪ In terms of absolute variation, the standard deviation of price for new cars is
more than twice that of used cars.
▪ However, in terms of relative variation, there is more relative variation in the
price of used cars than in new cars.
 The CV for used cars is 2.730 / 5.485 = 49.8%
 and the CV for new cars is 6.125 / 20.100 = 30.5%

39 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

Z SCORES

● A z score is the number of standard deviations that a given

observation, x, is below or above the mean.
● For sample data, the z score is

● and for population data, the z score is

40 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

Z SCORES

● Example:
▪ The mean salary for deputies in Douglas County is $27,500 and the standard
deviation is $4,500.
▪ The mean salary for deputies in Hall County is $24,250 and the standard
deviation is $2,750.
▪ A deputy who makes $30,000 in Douglas County makes $1,500 more than a
deputy does in Hall County who makes $28,500. Which deputy has the
higher salary relative to the county in which he works?

41 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

Z SCORES

● Example:
▪ For the deputy in Douglas County who makes $30,000, the z score is

▪ For the deputy in Hall County who makes $28.500, the z score is

▪ When the county of employment is taken into consideration, the $28,500

salary is a higher relative salary than the $30,000 salary.

42 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

King Khalid University
College of Sciences and Arts of Tanumah
Department of Computer Science

217CSM-3: Statistical Programming

Chapter 3 - Descriptive Statistics: Numerical Measures

THE INTERQUARTILE RANGE

● The interquartile range is another useful measure of

dispersion or spread.
● It is used when the median is used as the measure of
central tendency.
● It gives the range in which the middle 50% of the
distribution lies.
● In order to describe this in detail, we first need to discuss
what we mean by quartiles.

44 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

QUARTILES

● Suppose we start with a large set of data, say the heights

of all adult males in Sydney.
● We can represent these data in a graph, which if
smoothed out a bit, may look like:

● As the name ‘quartile’ suggests, we want to divide the

data into four equal parts.
● In the above example, we want to divide the area under
our curve into four equal areas.

45 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Progra

mming
THE SECOND QUARTILE OR MEDIAN

● It is easy to see how to divide the area in the last figure

into two equal parts, since the graph is symmetric.
● The point which gives us 50% of the area to the left of it
and 50% to the right of it is called the second quartile or
median.
● This is illustrated as follows:

● This exactly corresponds to our previous idea of median

as the middle value.

46 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Progra

mming
THE FIRST QUARTILE

● The first quartile is the point which gives us 25% of the

area to the left of it and 75% to the right of it.
● This means that 25% of the observations are less than or
equal to the first quartile and 75% of the observations
greater than or equal to the first quartile.
● The first quartile is also called the 25th percentile.
● This is illustrated as follows :

47 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE THIRD QUARTILE

● The third quartile is the point which gives us 75% of the

area to the left of it and 25% of the area to the right of it.
● This means that 75% of the observations are less than or
equal to the third quartile and 25% of the observation are
greater than or equal to the third quartile.
● The third quartile is also called the 75th percentile.
● This is illustrated as follows :

48 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

SUMMARY

● The first (Q 1 ), second (Q 2 ) and third (Q 3 ) quartiles

divide the distribution into four equal parts.
● This is illustrated as follows :

49 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Progra

mming
QUARTILES FOR SMALL DATA SETS

● Suppose now we have a small data set of twelve

observations which we can write in ascending order as
follows.
● A data set, where the number of observations is a multiple
of four, has been chosen to avoid some technical
difficulties.
15 18 19 20 20 20 21 23 23 24 24 25

● In this case, we want to divide the data into four equal

sets, so that there are 25% of the observations in each.

50 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

QUARTILES FOR SMALL DATA SETS

● First, we find the median just as we did previously.

● The median is 20.5 (half way between the 6th and 7th
observations), and divides the data into two equal sets
with exactly 50% of the observations in each: the 1st to
the 6th observations in the first set and the 7th to 12th
observations in the other.

51 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Progra

mming
QUARTILES FOR SMALL DATA SETS

● To find the first quartile we consider the observations less

than the median.
15 18 19 20 20 20
● The first quartile is the median of these data. In this case,
the first quartile is half way between the 3rd and 4th
observations and is equal to 19.5.
● Now, we consider the observations which are greater than
the median.
21 23 23 24 24 25

52 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Progra

mming
QUARTILES FOR SMALL DATA SETS

● The third quartile is the median of these data and is equal

to 23.5.
● So, for our small data set of 12 observations, the quartiles
divide the set into four subsets.

● We will now use the quartiles to define a measure of

spread called interquartile range.

53 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Progra

mming
THE INTERQUARTILE RANGE

● The interquartile range quantifies the difference between

the third and first quartiles.
● If we were to remove the median (Q2 ) from the previous
figure, we would have a graph as follows :

● We see that 50% of the area is between the first and third
quartiles.
● This means that 50% of the observations lie between the
first and third quartiles.

54 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Progra

mming
THE INTERQUARTILE RANGE

● We define the interquartile range as:

The interquartile range = Third quartile − First quartile.
● For our small data set, the first quartile was 19.5 and our
third quartile was 23.5.
● So, the interquartile range is 23.5 − 19.5 = 4.
● We will use the interquartile range later to draw a box-plot.
● For now we are interested in it as a measure of spread.

55 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Progra

mming
THE INTERQUARTILE RANGE

● The interquartile range is particularly useful to describe

data sets where there are a few extreme values.
● Unlike the range, and to a lesser extent the standard
deviation, it is not sensitive to extreme values as it relies
on the spread of the middle 50% of the distribution.
● So, if there are data sets which have extreme values, it
can be more appropriate to use the median to describe
central tendency and the interquartile range to describe
the spread.

56 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Progra

mming
EXERCISES

● For the following data sets, calculate the quartiles and find
the interquartile range.
 1 - The following numbers represent the time in minutes
that twelve employees took to get to work on a particular
day. 18 34 68 22 10 92 46 52 38 29 45 37
 2 - The number of people killed in road traffic accidents
in New South Wales from 1989 to 1996 is given below.
960 797 663 652 560 619 623 583
Source: Statistics–A Powerful Edge, Australian Bureau of Statistics, 1998.

57 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Progra

mming
EXERCISES
 3 - The following data are the final marks of 40 students for the
University Preparation Course, ‘Preparatory Mathematics’ in 1998.
61 77 51 85 55 77 70 56 41 61 28 87 23 22 86 63 99 94 38 25
90 59 87 53 29 86 33 87 75 50 59 77 77 71 99 78 70 93 78 93
Source: Mathematics Learning Centre, 1998.
 4 - The curve below represents the marks of a large number of students
on an English exam. Estimate the quartiles and calculate the
interquartile range.

58 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Progra

mming
EXERCISES

 Solutions :
1. First quartile = 25.5, Median = 37.5, Third quartile = 49, IQR = 23.5.
2. First quartile = 601, Median = 637.5, Third quartile = 730, IQR = 129.
3. First quartile = 52, Median = 70.5, Third quartile = 86, IQR = 34.
4. Our estimate puts the first quartile at 40, the median at 50 and the third
quartile at 60. This gives an interquartile range of 20. This means that
the middle 50% of marks lie within 20 marks of each other.

59 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Progra

mming
King Khalid University
College of Sciences and Arts of Tanumah
Department of Computer Science

217CSM-3: Statistical Programming

Chapter 3 - Descriptive Statistics: Numerical Measures

ESTIMATES OF THE MEAN AND VARIANCE

ESTIMATES OF THE MEAN AND VARIANCE
● We have, so far, concerned ourselves with the mean, variance,
and standard deviation of a population.
● However, in statistics we are mainly concerned with analysing
data from a sample taken from a population, in order to make
inferences about that population.
● Our data sets are usually random samples drawn from the
population.
● When we have a random sample of size n, we use the sample
information to estimate the population mean and population
variance in the following way.

61 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Progra

mming
ESTIMATES OF THE MEAN AND VARIANCE
● The mean of a sample of size n is written as (read x bar).
● To find the sample mean we add up all the sample scores and
divide by the number of sample scores.
● This can be written using sigma notation as:

62 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Progra

mming
ESTIMATES OF THE MEAN AND VARIANCE
● The sample mean is used to estimate the population mean.
● If we took many samples of size n from the population,
calculated their sample means, and then averaged them, we
would get a value very close to the population mean.
● We say that the sample mean is an unbiased estimator of the
population mean.
● An estimate of the population variance of a sample of size n is
given by s2 Where

63 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Progra

mming
ESTIMATES OF THE MEAN AND VARIANCE

● Notice that we are dividing by n−1 instead of n as we did to

find the population variance.
● We need to do this because the value obtained if we divide by
n, tends to underestimate the population variance.
● Calculated in this way, s2 is an unbiased estimator of
population variance.
● In fact, s2 can be described as the estimated population
variance.
● It is sometimes called the ‘sample variance’ but this is strictly
speaking not accurate.

64 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Progra

mming
EXERCISES

1) The following values are the number of customers a restaurant

served for lunch on ten consecutive days : 46 50 51 60 62 64
72 41 53 55
We Suppose that this data set is a random sample of 10 days
taken from a restaurant’s records. Calculate the estimated
population variance, s2 for these data.
2) The raw scores that eight students got on a history test were:
69 84 93 61 79 88 57 67
Suppose that this data are a random sample of scores on a
history test. Calculate the mean, (x bar), and the estimated
population standard deviation, s, of these data.

65 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Progra

mming
EXERCISES

● Solution :
1) s2 = 84.93.
2) (x bar) = 74.75, s = 13.14.

66 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Progra

mming
King Khalid University
College of Sciences and Arts of Tanumah
Department of Computer Science

217CSM-3: Statistical Programming

Chapter 3 - Descriptive Statistics: Numerical Measures

THE BOX-PLOT
THE BOX-PLOT
● The box-plot is another way of representing a data set
graphically.
● It is constructed using the quartiles, and gives a good indication
of the spread of the data set and its symmetry (or lack of
symmetry).
● It is a very useful method for comparing two or more data sets.
● The box-plot consists of a scale, a box drawn between the first
and third quartile, the median placed within the box, whiskers
on both sides of the box and outliers (if any).
● This is best illustrated using a diagram such as in the following
figure.

68 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE BOX-PLOT

69 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

THE BOX-PLOT
● The two dashed vertical lines in Figure 28 are the lower and
upper outlier thresholds and are not normally included in a box-
plot.
● The following data set was used to construct the box-plot in the
above figure :
57 46 61 66 48 59 55
56 60 49 44 53 68 57
55 54 49 50 52 54 62
59 51 52 53 54 47 53

70 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

CONSTRUCTING A BOX-PLOT

● Step 1:
▪ Order the data and calculate the quartiles.
44 46 47 48 49 49 50
51 52 52 53 53 53 54
54 54 55 55 56 57 57
59 59 60 61 62 66 68
▪ Now we calculate the median, the first quartile and the third quartile.
▪ For these data, median = 54, the first quartile = 50.5 and the third quartile
= 58.
▪ With this information we can begin to construct the box-plot.

71 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

CONSTRUCTING A BOX-PLOT

● Step 2:
▪ Draw the scale and mark on the quartiles.
▪ Mark the median at the correct place above the scale with a asterix, draw
a box around this asterix with the left hand side of the box at the first
quartile, 50.5, and the right hand side of the box at the third quartile, 58.
▪ This is illustrated in the following figure.

72 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

CONSTRUCTING A BOX-PLOT

● Step 3:
▪ Calculate the interquartile range and determine the position of the outlier
thresh1olds.
Interquartile range = third quartile − first quartile = 58 − 50.5 = 7.5.
▪ The position of the lower outlier threshold is found by subtracting the
interquartile range from the first quartile, 50.5 − 7.5 = 43.
▪ The position of the upper outlier threshold is found by adding the
interquartile range to the third quartile, 58 + 7.5 = 65.5. (Some texts add
or subtract 1.5 × interquartile range.)
▪ We now add the outlier thresholds to our diagram. This is illustrated in the
follwoing figure.

73 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

CONSTRUCTING A BOX-PLOT

74 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

CONSTRUCTING A BOX-PLOT

● Step 4:
▪ Use the outlier thresholds to draw the whiskers.
▪ To draw the left hand whisker, we need the smallest data value that lies
inside the outlier thresholds.
▪ In this example, it is the value 44. This is drawn on our diagram with a
small cross level with the asterix. A horizontal line is now drawn to the left
hand side of the box.
▪ To draw the right hand whisker, we find the largest data value that lies
inside the outlier thresholds.
▪ In this example, the value is 62. This is drawn on the right hand side of
the box with a small cross and connected to the box by a horizontal line.

75 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

CONSTRUCTING A BOX-PLOT

This is illustrated in the following figure :

76 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

CONSTRUCTING A BOX-PLOT

● Step 5:
▪ Determine the outliers and remove the outlier thresholds.
▪ Values (if any) that lie outside the outlier thresholds are called outliers. In
this example, 66 and 68 are outliers. These are placed on the diagram
using a small square or circle.
▪ Finally, the outlier thresholds are removed. The completed box-plot is
illustrated in the following figure :

77 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

USING BOX-PLOTS TO COMPARE DATA SETS
● Box-plots are frequently used to compare data sets as the
differences in shape, spread and location are easily seen.
● For example, the following gives box-plots for the final marks of an
university preparation course, Preparatory Mathematics for the years 1996,
1997 and 1998.

78 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

USING BOX-PLOTS TO COMPARE DATA SETS
● The marks from all years are left-skewed, but those from 1996
and 1997 quite markedly so. 1996 had the highest median
score but the least spread.
● The marks from 1998 vary more than those in 1996 and 1997.
● In all years, over 75% of students passed the course.

79 Chapater 3 - Descriptive Statistics: Numerical Measures 217CSM-3: Statistical Programming

Aashto M 251 06 R2016 PDF
80% (5)
Aashto M 251 06 R2016 PDF
17 pages
MMW Unit IV Statistics
No ratings yet
MMW Unit IV Statistics
62 pages
Data Management
100% (1)
Data Management
51 pages
Descriptive Statistics
100% (3)
Descriptive Statistics
41 pages
Safari
No ratings yet
Safari
385 pages
Introduction To Descriptive Statistics
100% (1)
Introduction To Descriptive Statistics
43 pages
Chapter 2 Descriptive Statistics
100% (2)
Chapter 2 Descriptive Statistics
15 pages
Introduction To Descriptive Statistics: Jackie Nicholas
No ratings yet
Introduction To Descriptive Statistics: Jackie Nicholas
41 pages
UNIT II - Statistics For Data Science - New
No ratings yet
UNIT II - Statistics For Data Science - New
153 pages
STAT - Lec.2 - Measures of Centeral Tendency - Measures of Dispersion.
100% (1)
STAT - Lec.2 - Measures of Centeral Tendency - Measures of Dispersion.
33 pages
Interpreting Test Score: Online Workshop 8602 Aiou
100% (1)
Interpreting Test Score: Online Workshop 8602 Aiou
39 pages
Lesson 5 (Descriptive Statistics Part 1) - Oct 2024
No ratings yet
Lesson 5 (Descriptive Statistics Part 1) - Oct 2024
72 pages
4 Measures PDF
No ratings yet
4 Measures PDF
292 pages
Applied Statistical Methods (ASM) : "The True Logic of This World Is in The Calculus of Probabilities"
No ratings yet
Applied Statistical Methods (ASM) : "The True Logic of This World Is in The Calculus of Probabilities"
90 pages
Math 8 Q1 Lesson 1
No ratings yet
Math 8 Q1 Lesson 1
74 pages
MIDTERM - 1. Measures of Central Tendency and Position
No ratings yet
MIDTERM - 1. Measures of Central Tendency and Position
57 pages
Research Ii: Whole Brain Learning System Outcome-Based Education
No ratings yet
Research Ii: Whole Brain Learning System Outcome-Based Education
16 pages
MCS Lecture 3
No ratings yet
MCS Lecture 3
57 pages
Data Management
No ratings yet
Data Management
66 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
53 pages
Measures of Central Tendency Position and Dispersion 1.Pptx 20241015 145631 0000
No ratings yet
Measures of Central Tendency Position and Dispersion 1.Pptx 20241015 145631 0000
44 pages
Chapter 3 A
No ratings yet
Chapter 3 A
62 pages
3 Summarizing Data
No ratings yet
3 Summarizing Data
64 pages
Lecture-2-Data With Numerical Values
No ratings yet
Lecture-2-Data With Numerical Values
48 pages
Lesson 1
No ratings yet
Lesson 1
37 pages
Topic 3 - Data Presentation, Summarization, Measure of Central Tendency&Spread.
No ratings yet
Topic 3 - Data Presentation, Summarization, Measure of Central Tendency&Spread.
48 pages
Measures of Location and VARIATION For 1 Variable
No ratings yet
Measures of Location and VARIATION For 1 Variable
44 pages
Introduction To Statistics Lecture 7
No ratings yet
Introduction To Statistics Lecture 7
32 pages
المحاضرة رقم 3
No ratings yet
المحاضرة رقم 3
44 pages
Chapter 3
No ratings yet
Chapter 3
17 pages
Introduction To Descriptive Statistics
No ratings yet
Introduction To Descriptive Statistics
42 pages
Topic1 3
No ratings yet
Topic1 3
41 pages
FCMS Biostat Descriptive Statsitics-Part 2w
No ratings yet
FCMS Biostat Descriptive Statsitics-Part 2w
48 pages
Pre - Week 3vs4 - Updated
No ratings yet
Pre - Week 3vs4 - Updated
34 pages
Measures Of: Central Tendenc Y
No ratings yet
Measures Of: Central Tendenc Y
43 pages
CENTRAL TENDENCY MEASURES Lectures 3+4+5
No ratings yet
CENTRAL TENDENCY MEASURES Lectures 3+4+5
35 pages
Central Tendency: Mode, Median, and Mean
No ratings yet
Central Tendency: Mode, Median, and Mean
15 pages
3 Elementary Statistics 2024 Mba
No ratings yet
3 Elementary Statistics 2024 Mba
31 pages
Biostatistics 1
No ratings yet
Biostatistics 1
19 pages
Numerical Descriptive Measures (Week2) : in This Chapter, The Students Should Be Able To
No ratings yet
Numerical Descriptive Measures (Week2) : in This Chapter, The Students Should Be Able To
37 pages
Measures of Central Tendancy
No ratings yet
Measures of Central Tendancy
24 pages
Stat 309
No ratings yet
Stat 309
25 pages
IMS 504-Week 4&5 New
No ratings yet
IMS 504-Week 4&5 New
40 pages
SM - MH8.6 Tier3 - EN
No ratings yet
SM - MH8.6 Tier3 - EN
783 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
38 pages
Topic 3
No ratings yet
Topic 3
49 pages
ICS Week 2 - Handouts
No ratings yet
ICS Week 2 - Handouts
20 pages
MTW Maytag Washer Top Load
No ratings yet
MTW Maytag Washer Top Load
84 pages
2 Measures of Central Tendency
No ratings yet
2 Measures of Central Tendency
29 pages
Week 3 - Review Topic - Measures of Central Tendency and Dispersion - NEUVLE
No ratings yet
Week 3 - Review Topic - Measures of Central Tendency and Dispersion - NEUVLE
13 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
31 pages
Chapter 01
No ratings yet
Chapter 01
24 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
18 pages
Encoders and Multiplexer Circuits: by Dr. Nermeen Talaat
No ratings yet
Encoders and Multiplexer Circuits: by Dr. Nermeen Talaat
22 pages
Statistical Analysis
No ratings yet
Statistical Analysis
15 pages
Measures of Central Tendency and Dispersion
No ratings yet
Measures of Central Tendency and Dispersion
14 pages
s4 Premock Chem 1 2025
No ratings yet
s4 Premock Chem 1 2025
7 pages
Mean Median Mode
No ratings yet
Mean Median Mode
6 pages
Before I Loved You - Ashley Elizabeth
No ratings yet
Before I Loved You - Ashley Elizabeth
473 pages
What Is Central Tendency
No ratings yet
What Is Central Tendency
8 pages
DSBDL Asg 3 Write Up
No ratings yet
DSBDL Asg 3 Write Up
6 pages
03 - BIOE 211 - Basic Demog and Health Indicator Formula
No ratings yet
03 - BIOE 211 - Basic Demog and Health Indicator Formula
29 pages
01 Introduction
No ratings yet
01 Introduction
26 pages
241 CSM-4 - Digital Logic - Lab Manual - Course Specification - 1
No ratings yet
241 CSM-4 - Digital Logic - Lab Manual - Course Specification - 1
48 pages
Central Tendancy Mac1
No ratings yet
Central Tendancy Mac1
7 pages
Finals. Fmch. Measure of Central Tendency Shape of The Distribution of Dispe
No ratings yet
Finals. Fmch. Measure of Central Tendency Shape of The Distribution of Dispe
5 pages
CSM 131 Chapter 5 to Chapter 7-الهندسة PDF
No ratings yet
CSM 131 Chapter 5 to Chapter 7-الهندسة PDF
67 pages
Name: Date: Class: Instruction: For Each of The Questions 1 To 30, Choose The Correct Answer For Each Question From The Four Options A, B, C and D
No ratings yet
Name: Date: Class: Instruction: For Each of The Questions 1 To 30, Choose The Correct Answer For Each Question From The Four Options A, B, C and D
5 pages
Aerospace Seal Ants
No ratings yet
Aerospace Seal Ants
9 pages
3 - Data Warehousing and Business Intelligence
No ratings yet
3 - Data Warehousing and Business Intelligence
58 pages
s2.002 Kreedo Preschool Package 2 Aug 2013
No ratings yet
s2.002 Kreedo Preschool Package 2 Aug 2013
3 pages
Solid Fuels Report CDI6
No ratings yet
Solid Fuels Report CDI6
21 pages
0001
No ratings yet
0001
52 pages
241 CSM-4 - Digital Logic-Course Specification
No ratings yet
241 CSM-4 - Digital Logic-Course Specification
7 pages
Comparator & Decoders: by Dr. Nermeen Talaat
No ratings yet
Comparator & Decoders: by Dr. Nermeen Talaat
31 pages
IFM-Chap 12
No ratings yet
IFM-Chap 12
19 pages
217 - Chapter 4 REGRESSION AND CORRELATION
No ratings yet
217 - Chapter 4 REGRESSION AND CORRELATION
69 pages
217 - Chapter 5 Probability
No ratings yet
217 - Chapter 5 Probability
58 pages
217 - Chapter 2 - Organization of Data
No ratings yet
217 - Chapter 2 - Organization of Data
53 pages
To Determine Conductivity Using KELVIN's BRIDGE
0% (1)
To Determine Conductivity Using KELVIN's BRIDGE
16 pages
241 CSM-4-Digital Logic-Lecture 5
No ratings yet
241 CSM-4-Digital Logic-Lecture 5
51 pages
Rajya Sabha Q&A BESS
No ratings yet
Rajya Sabha Q&A BESS
17 pages
Enhancement of Cell Decomposition Path-Planning Algorithm For Autonomous Mobile Robot Based On An Intelligent Hybrid Optimization Method
No ratings yet
Enhancement of Cell Decomposition Path-Planning Algorithm For Autonomous Mobile Robot Based On An Intelligent Hybrid Optimization Method
16 pages
Chapter 1 The Database Environment and Development Process (Part 2)
No ratings yet
Chapter 1 The Database Environment and Development Process (Part 2)
27 pages
Altera DE2 Board Pin - Table
No ratings yet
Altera DE2 Board Pin - Table
1 page
Basic Concepts: Multiple Choice Questions
No ratings yet
Basic Concepts: Multiple Choice Questions
27 pages
Optimal Sizing of Battery Energy Storage Systems For Dynami Frequency Control in An Islanded Microgrid
No ratings yet
Optimal Sizing of Battery Energy Storage Systems For Dynami Frequency Control in An Islanded Microgrid
25 pages
Lucas Equine Brochure Equipment
No ratings yet
Lucas Equine Brochure Equipment
44 pages
Chapter - 1
No ratings yet
Chapter - 1
28 pages
Thickness Determination For Spray-Applied Fire Protection Materials
No ratings yet
Thickness Determination For Spray-Applied Fire Protection Materials
6 pages
241 CSM-4 Digital Logic Lecture Notes-6
No ratings yet
241 CSM-4 Digital Logic Lecture Notes-6
24 pages
Comparison Between The Diffuse Interface and Volume of Fluid Methods For Simulating Two-Phase Flows
No ratings yet
Comparison Between The Diffuse Interface and Volume of Fluid Methods For Simulating Two-Phase Flows
28 pages
High Performance Vane Pump: VP SP
No ratings yet
High Performance Vane Pump: VP SP
3 pages
Forrest Najjaj - Is Sipping Sin Breaking Fast. Chocolate Controversy
No ratings yet
Forrest Najjaj - Is Sipping Sin Breaking Fast. Chocolate Controversy
23 pages
Lab 4 ARP and DNS Cache Poisoning PDF
No ratings yet
Lab 4 ARP and DNS Cache Poisoning PDF
16 pages
GD Knowledge 2 Worksheets
No ratings yet
GD Knowledge 2 Worksheets
16 pages
Jacklyn Conley - Thermobaric Weapons: They Will Blow Your Mind
No ratings yet
Jacklyn Conley - Thermobaric Weapons: They Will Blow Your Mind
7 pages
05 Assimpler2
No ratings yet
05 Assimpler2
33 pages
Ax90022 I001 Scx10 Honcho RTR
No ratings yet
Ax90022 I001 Scx10 Honcho RTR
52 pages
Sampling & Quantization (Cont) Delta Modulation, Quantization Error
No ratings yet
Sampling & Quantization (Cont) Delta Modulation, Quantization Error
13 pages
Lab5 SQL
No ratings yet
Lab5 SQL
6 pages
Cloud-Based Multi-Robot Path Planning in Complex A
No ratings yet
Cloud-Based Multi-Robot Path Planning in Complex A
18 pages
Crystal Growth and Wafer Preparation
No ratings yet
Crystal Growth and Wafer Preparation
18 pages
Coincidentia Oppositorum in Nicholas of Cusa'S Sermons
No ratings yet
Coincidentia Oppositorum in Nicholas of Cusa'S Sermons
9 pages
Telit WE866C3 Datasheet-1
No ratings yet
Telit WE866C3 Datasheet-1
2 pages
Assignment 5 MA 407 - 607
No ratings yet
Assignment 5 MA 407 - 607
2 pages
The Avalanche Disaster
No ratings yet
The Avalanche Disaster
4 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet