0% found this document useful (0 votes)
4 views

Chapter 2 Measures of Location

Notes for computer science.

Uploaded by

zarahrasheed1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Chapter 2 Measures of Location

Notes for computer science.

Uploaded by

zarahrasheed1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

WEEK 2

GRAPHICAL PRESENTATON OF DATA

A statistical graph is a tool used to learn about the shape or distribution of a sample or a

population. A graph can be a more effective way of presenting data than a mass of numbers

because we can see where data are clusters and where there are only a few data values.

Newspapers and the Internet use graphs to show trends and to enable readers to compare facts

and figures quickly. Statisticians often graph data first to get a picture of the data.

Graphs for Qualitative Data

There are no strict rules concerning which graphs to use. Two graphs that are used to display

qualitative data are pie charts and bar charts.

PIE CHART: Data are represented by sectors in a circle and are proportional in size to the

percent of individuals in each category. When creating a pie chart, each slice should be labeled

with the category name and their respective percent.

Example:

The nutrients contain in a packet of healthy crisps is given below:

Nutrients Per 100g

Protein 6.1g

Fats 34.2g

Carbohydrate 48.1g

Dietary Fibre 11.6g

Vitamin and Minerals 5.5g

1
Pie Chart showing the Nutrients per 100g

Nutrients Per 100g

5% 6%
11%
Protein
Fats
32%
Carbohydrate
Dietary Fibre
46% Vitamin and Minerals

BAR CHART: The length of the bar for each category is proportional to the number or percent

of individuals in each category. Bars may be vertical or horizontal. For vertical bars, the

categories are on the x-axis and frequency or relative frequency on the y-axis.

Example: The data below relates people taking out mortgages. Draw an appropriate bar chart for

all buyers’ information in each case.

Type All Buyers

Bungalow 10

Detached house 19

Semi- detached 31

Terraced house 31

Purpose built flat 7

Converted flat 3

2
Bar Chart Showing the Buyers information

All Buyers
35
30
25
20
15 All Buyers
10
5
0
Bungalow Detached Semi- Terraced Purpose Converted
house detached house built flat flat

HISTOGRAM

A histogram consists of very close bars. It has both a horizontal axis and a vertical axis. The

horizontal axis is labeled with what the data represents (for instance, distance from your home to

school). Horizontal axis uses the class boundaries. The vertical axis is labeled either frequency or

relative frequency (percent or probability).

How to construct a histogram:

1. Create class boundaries on the grouped frequency distribution. Choose a starting point for the

first interval to be less than the smallest data value. A convenient starting point is a lower value

carried out to one more decimal place than the value with the most decimal places. For example,

if the value with the most decimal places is 6.1 and this is the smallest value, a convenient

starting point is 6.05 i.e (6.1 – 0.05 = 6.05). We say that 6.05 has more precision. If all the data

happen to be integers and the smallest value is two, then a convenient starting point is 1.5 i.e (2 –

3
0.5 = 1.5). Also, when the starting point and other boundaries are carried to one additional

decimal place, no data value will fall on a boundary.

2. Place frequency or relative frequency on the y-axis. Scale is important.

3. Draw bars as high as the frequency for each class interval within each boundary

Note: The plot of frequency against interval (class boundary) is called a histogram

Example: 100 people were asked to record how many T.V programme they watched in a week.

The result obtain are shown below. Draw a histogram to illustrate the data.

Example

No of T.V programs No of Viewers


5-9 3

10- 14 16

15- 19 36

20- 24 21

25- 29 12

30- 34 9

35- 39 3

40- 44 0

4
Histogram showing numbers of Viewers
No of Viewers
40
35
30
25
20
15
10
5
0
4.5 - 9.5 9.5 - 14.5 14.5 - 19.5 - 24.5 - 29.5 - 34.5 - 39.5 -
19.5 24.5 29.5 34.5 39.5 44.5

No of Viewers

Stem-and-Leaf Graph:

The stem-and-leaf graph or stem-plot, comes from the field of exploratory data analysis. It is a

good choice when the data sets are small.

To create the stem-and-leaf plot:

1. Arrange the data in ascending order of magnitude.

2. Divide each observation of data into a stem and a leaf. The leaf consists of a final

significant digit.

For instance, number 23 has stem two and leaf three. The number 432 has stem 43 and leaf two.

Likewise, the number 5,432 has stem 543 and leaf two. The decimal 9.3 has stem nine and

leaf three.

3. Write the stems in a vertical line from smallest to largest. Draw a vertical line to the right of

the stems. Then write the leaves in increasing order next to their corresponding stem.

5
Example 1

Consider the following data which give the marks scored by 14 pupils in a quiz test.

27, 36, 24, 17, 35, 18, 23, 25, 34, 25, 41, 18, 22, 24

Solution

Arranging in order of magnitude: 17, 18, 18, 22, 23, 24, 24, 25, 25, 27, 34, 35, 36, 41

Stem Leaf

1 7, 8, 8

2 2, 3, 4, 4, 5, 5, 7

3 4, 5, 6

4 1

Example 2

The data are the distances (in km) from a home to local supermarkets. Create a stem-plot using

the data:

1.1; 1.5; 2.3; 2.5; 2.7; 3.2; 3.3; 3.3; 3.5; 3.8; 4.0; 4.2; 4.5; 4.5; 4.7; 4.8; 5.5; 5.6; 6.5; 6.7;

12.3

Do the data seem to have any concentration of values?

6
Solution

The given data is arranged in ascending order of magnitude. To plot Stem and leaf diagram,

Stem Leaf

1 1, 5

2 3, 5, 7

3 2, 3, 3, 5, 8

4 0, 2, 5, 5, 7, 8

5 5, 6

6 5,7

10

11

12 3

From the diagram, 3 and 4 kilometers have more concentration of values also value 12.3 is an

outlier (outliers are extreme values of a distribution)

MEASURES OF LOCATION/ CENTRAL TENDENCY

Measure of location also known as measures of central tendency is used to describe the centre

of a data set. The widely used measures of the "center" of a data set are the MEAN i.e

(Average), MEDIAN and MODE. To calculate the mean weight of 10 students, add the weight

of 10 students together and divide by 10. To find the median weight of the 10 students, order the

7
data and find the number that splits the data into two equal parts. The median is generally a

better measure of the center when there are extreme values or outliers because it is not affected

by the precise numerical values of the outliers. The mean is the most common measure of the

center. The Mode is the data value with the highest frequency / occurs most.

ARITHMETIC MEAN

Adding all the observations and dividing the sum by the number of observation gives the

arithmetic mean. The symbol used to represent the sample mean is an x with a bar over it

(pronounced “x bar”) i.e . The Greek letter (pronounced "mew") represents the population

mean. Symbolically, the arithmetic mean, also called mean is given as

Where = values of observation, and = number of samples of observation.

Examples:

The number of books checked out from the library by 15 students is as follows:

7, 0, 5, 1, 2, 6, 1, 2, 4, 0, 3, 5, 6, 3, 8

= = = 3.533

The formula given above is the basically for the definition of arithmetic mean for ungrouped

data.

GROUPED DATA-ARITHMETIC MEAN

For grouped data, arithmetic mean may be calculated using

= ,

where x is the mid-point of various classes, f is the frequency of each class. The calculation of

arithmetic mean is shown below

8
Example

The table below gives the marks of 58 students in Statistics. Calculate the average marks of this

group.

Marks No of Students

0–9 4

10 -19 8

20 – 29 11

30 – 39 15

40 – 49 12

50 – 59 6

60 – 69 2

70 – 79 1

Solution:

Marks Midpoint (x) No of Students (f) fx

0-9 4.5 3 13.5

10 – 19 14.5 8 116

20 – 29 24.5 11 269.5

30 - 39 34.5 15 517.5

40 – 49 44.5 12 534

50 - 59 54.5 6 327

60 – 69 64.5 2 129

70 - 79 74.5 1 74.5

Total = 58 1981

9
= = = 34.155, approximately 34 marks

It may be noted that the mid-point of each class is taken as a good approximation of

the true mean of each class interval.

CALCULATION OF ARITHMETIC MEAN USING ASSUMED MEAN

When the values are extremely large and/or in fractions, the use of the above formula or method

could be cumbersome. The arithmetic mean formula using the assume method is given below:

=A+

Where, A = Assumed Mean,

f = frequency, and

d = deviation from assumed mean

To determine the value for the assumed mean, one may choose any value as assumed mean, it

would be ideal to avoid extreme values, that is, too small or to high to simplify calculations. A

value close to the arithmetic mean should be chosen.

Example:

Consider the example given above; calculate the average marks obtained by 58 students using

assumed mean method.

10
Marks Mid point(x) frequency d= x-A fd

0 -9 4.5 3 -30 -90

10 – 19 14.5 8 -20.5 -164

20 – 29 24.5 11 -10.5 -115.5

30 – 39 34.5 15 -0.5 -7.5

40 – 49 44.5 12 9.5 114

50 – 59 54.5 6 19.5 117

60 – 69 64.5 2 29.5 59

70 – 79 74.5 1 39.5 39.5

Total 58

Note that we have taken arbitrary assumed mean A= 35 (being the midpoint value of the class

with highest frequency) and deviations from midpoints (d= x-A). In other words, the arbitrary

mean has been subtracted from each value of mid-point and the result is shown in column

labeled d.

=A+

35 +

35 + (-0.8190)

= 34. 181 approximately 34 marks

11
MEDIAN

Median is defined as the value that occupies the middle item (or the mean of the values of the

two middle items) when the data are arranged in an ascending or descending order of magnitude.

Thus, in an ungrouped frequency distribution if the n values are arranged in ascending or

descending order of magnitude, the median is the middle value if n is odd. When n is even, the

median is the mean of the two middle values. It is denoted by

Example:

Suppose we have the following data:

15, 19, 20, 7, 10, 3, 18, 25 and 5

We have to first arrange it in either ascending or descending order. These figures are arranged in

an ascending order as follows:

3, 5, 7, 10, 15, 18, 19, 20, 25

The number of observation is odd number of items, to find out the value of the middle item, we

use the formula:

Where n is the number of items. In this case, n = 9 and n + 1= 9+1 = 10

= 10/ 2 = 5

It implies that the 5th item among that which are arranged in ascending order is the median. This

happens to be 15.

Suppose the observation consists of: 3, 5, 7, 10, 15, 18, 19, 20, 22, 25,

Thus, the observation has been arranged in ascending order of magnitude.

Applying the above formula, the median is that which occupies the middle position i.e 15+18/2

5 Here, we have to take the average of the values of 5th and 6th item. This means an average of 15

and 18, which gives the median as 16.5.

12
Note that n+1 is not the formula for the median; it merely indicates the position of the median,

namely, the number of items we have to count until we arrive at the item whose value is the

median.

MEDIAN FROM A GROUPED DATA

The formula below is used to calculate median from a grouped data

= L1 + ( )

Where, L1 = Lower class boundary

N = Total frequency

CFb = Cumulative frequency before the median

Fm = Frequency of the Median class

C = Class size/ width

Example:

Using the data in the table below, Calculate the median

Class 4–9 10 - 14 15 - 19 20 - 24 25 - 29 30 - 34 35 - 39 40 - 44

frequency 3 8 11 15 12 6 2 1

13
Solution

Class Frequency(F) Cumulative frequency(CF)

5–9 3 3

10 – 14 8 11

15 – 19 11 22

20 – 24 15 37

25 – 29 12 49

30 -34 6 55

35 – 39 2 57

40 - 44 1 58

Total N= 58

First is to determine the class where the median class interval lies i.e

= = 29

From the cumulative frequency column locate where 29 lies, it is observe that it lies in the

interval 20 – 24 class. Thus, the lower class boundary of the class boundary of the class is

calculated as

L1 = 20 – 0.5 = 19.5

C.Fb = 22

Fm = 15

C = 5 and is calculated using this simple counting technique i.e from the interval 5 -9 we have 5,

6, 7, 8, 9 .this implies we have 5 observations and this is applicable to other classes.

14
= 19.5 +

= 19.5 + 2.333

= 21.833

MODE

It is another measure of central tendency. It is the value that occur most in a distribution or data

set.

Example: considering the following data, find the mode

6, 8, 5, 11, 7, 9, 4, 9

There are eight observations in the data above but, 9 appears most. Therefore the mode is 9

The data above is an ungrouped data.

MODE FROM GROUPED DATA

To determine the mode from grouped data, the formula is given as

= +

Where

Lmod = Lower class boundary of the modal class

= Difference between the frequency of the modal class and the class before it

= Difference between the frequency of the modal class and the class after it.

C = class size/ width

Example

Class 11 - 20 21 - 30 31 - 40 41 - 50 51 - 60 61 - 70

Frequency 6 20 12 10 9 9

Using the data above, calculate the mode.

15
Solution:

To determine the modal class interval, it is that with the highest frequency i.e class 21 – 30 with

frequency of 20. Thus, the lower class boundary is calculated

Lmod = 21 – 0.5 = 20.5

= 20 – 6 = 14

= 20 -12 = 8

C = 10 observations i.e between 11-20 we have 10 observation.

= 20.5 +

= 22.9

16

You might also like