2. Describing data with tables
2. Describing data with tables
Medical Statistics from Scratch: An Introduction for Health Professionals (3rd Ed.) by David Bowers
Descriptive statistics: What
can we do with raw
• Raw data, as in Figure data?
1.1 (birthweight) or Figure 1.2 (gender), could not
tell their story, and of course, the more the data are, the harder this
becomes for us to answer questions that we may have.
• We are going to describe some methods for organizing and presenting the
data, so that we can answer more easily the questions of interest.
• An important consideration in this process is the type of data you are working
with.
• Some types of data are best described with a table, some with a chart and
some perhaps with both, whereas with other types of data, a numeric
summary might be more appropriate.
Descriptive statistics: What
can we do with raw
frequency table.
data?
• In this chapter, we focus on organising raw data into what is known as a
• It will be easier if we take each data type in turn, starting with nominal
data.
1. Frequency tables –
nominal
• Consider Figure 1.2 in data
Chapter 1, with a count of male and female
babies.
• Male = 265
• Female = 235
The label at the top of the first (left-hand) column indicates the variable being described in the table. The remainder of
the first column is a list of the categories for this variable.
The second (right-hand) column is the frequency column. Frequency is another word for ‘count’ and lists, in this example,
the number of babies in each category, that is, males and females.
a) The frequency
distribution
• Consider another example: Figure 1.9 contains data from a nit lotion
study that compared two types of treatment for nits, malathion or d-
phenothrin, with a sample of 95 children.
• For each child, data were collected on nine variables, one being the child’s
hair colour: blonde, brown, red and dark.
• The frequency table (extracted from Figure 1.9) for the four colour
categories is shown in Figure 2.2.
a) The frequency
distribution
a) The frequency
• Notice distribution
that total frequency (n=95) is shown at the top of
the frequency column.
• You should always do this – it is helpful to any reader.
• Taken as a whole, Figure 2.2 tells us how the hair colour of each of
the 95 children is distributed across the four colour categories.
• In other words, Figure 2.2 describes the frequency distribution of the hair
colour data.
• We can see that the most common hair colour is brown and the
least common red.
b) Relative or percentage
frequency
• Often of more use than the actual number of individuals in
each category are the percentages.
• Tables with this information are called relative or
percentage
frequency tables.
• The third column of Figure 2.3 shows the percentage of children
in each hair colour category.
b) Relative or percentage
frequency
Figure 2.3 tells us that over half of the children (51.6 per cent) had brown hair.
This seems to be more helpful than knowing that 49 out of 95 children had brown hair.
Students’ task…..!
Figure 2.4 shows the frequency distribution for cause of blunt injury to limbs in 75 patients,
taken from a study of the treatment of pain after limb injury. Calculate the relative
frequencies. What percentage of patients had crush injuries?
Here is the
answer…!
2. Frequency tables –
ordinal
• When data
the data in question are ordinal, we can allocate them into ordered
categories.
The frequency values indicate that more than half of the patients were happy with
their psychiatric nursing care, 282 patients (121+161) out of 475.
Much smaller numbers expressed dissatisfaction (51+52).
Students’ task…..!
In a study comparing two treatments for a whiplash injury, one group of patients received the usual emergency department
care (normal consultation plus an advice leaflet) and the other group received ‘active management’ care (normal consultation
plus additional help). Twelvemonths after the initial contact, the patients were asked to rate the benefits they felt from their
treatment. The results are shown in Figure 2.6 for each group (the group with missing values has been omitted).
A) What percentage of patients felt ‘much better’ in each group?
B) What percentage felt ‘much worse’?
C) How do you think that the missing values might affect the reliability of results in general?
Here is the answer…!
A. Much better: 26.3 and 30.3
• The answer to the question, ‘What was the percentage of patients receiving
the biolimus-eluting stent that had lesions that required fewer than three
stents?’, is thus 89.38 percent.
• We can also easily calculate how many patients had lesions that required
three or more stents as 100 – 89.38=10.62%.
a) Cumulative
frequency
4. Frequency tables with
continuous metric data –
grouping the rawmetric data
• Constructing frequency tables for continuous metric data is often more of a
problem than constructing with discrete data because, as we saw in
Chapter 1, the number of possible values which the data can take is infinite
(recall the clock-face analogy).
• Organising raw metric continuous data (such as the birthweight data shown in
Figure 1.1) into a frequency table is usually impractical because there are such a
large number of possible values. Indeed, there may well be no value that occurs
more than once – particularly true if the values have decimal places.
• This means that the corresponding frequency table is likely to have a large, and
thus unhelpful, number of rows. Not of much help in uncovering any pattern in
the data therefore!
4. Frequency tables with
continuous metric data –
grouping
• The most the
useful approach with metric raw
continuousdata
data is to group them first and
then construct a frequency distribution of the grouped data. Let’s see how this
works.
• The choice of the number of groups is arbitrary but you do not want too few groups
(too much information is lost) or too many (not much more helpful than the raw
data). Experience will help but as a very rough rule of thumb, no fewer than five
groups and no more than 10.
• Of course, particular circumstances may cause these values to vary. For the first 100
values of the birthweight data in Figure 1.1, I have chosen seven groups as shown in
column 1 and determined the number of birthweights in each group – these values
are shown in column 2 (see Figure 2.12).
4. Frequency tables with
continuous metric data –
grouping the raw data
Open-ended
• One problem arises groups
when one or two values are a long way from the general
mass of the data, either much lower or much higher.
• These values are called outliers.
• Their presence can mean having a lot of empty or near-empty rows at one or both ends
of the frequency table.
• Sometimes, however, you will want to examine the association between two
variables, within a single group of individuals.
• You can do this by putting the data into a contingency table, also called a table of cross-
tabulations.
• In these tables, the rows represent the categories of one variable, usually an ‘outcome’
of some sort (e.g. a diagnosis of lung cancer – Yes or No), and the columns represent
the groups within a second variable (e.g. smokers and non-smokers).
Cross-tabulation –
• To contingency tables
illustrate this idea, look at Figure 2.16. This is a contingency table of the
cross-tabulation of the variable ‘smoked while pregnant’ (Yes or No), against
three categories of the variable ‘birthweight’: <2500 g, 2500 g–3999 g, and
≥4000 g, for a random sample of 500 newborn babies.
• Here, the outcome (the rows) is birthweight, and the groups (the columns) are the
mothers who smoked while pregnant, and those who didn’t.
• This table would be called a 2 ×2 table because there are two rows and two
columns, although tables with more rows and columns are not unusual.
Cross-tabulation –
contingency tables
Thank
you