0% found this document useful (0 votes)
11 views

2. Describing data with tables

biostatistics lecture describing data with tables

Uploaded by

sa0346799
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

2. Describing data with tables

biostatistics lecture describing data with tables

Uploaded by

sa0346799
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

Describing data with

tables Aamir Raoof Memon


DPT (RIU, Isb), Mphil* (IIRS, Isb)
Assistant Professor, IPRS, PUMHS SBA

Medical Statistics from Scratch: An Introduction for Health Professionals (3rd Ed.) by David Bowers
Descriptive statistics: What
can we do with raw
• Raw data, as in Figure data?
1.1 (birthweight) or Figure 1.2 (gender), could not
tell their story, and of course, the more the data are, the harder this
becomes for us to answer questions that we may have.

• We are going to describe some methods for organizing and presenting the
data, so that we can answer more easily the questions of interest.

• Collectively, these methods are called descriptive statistics


Descriptive statistics: What
can we do with raw
• These methods are a setdata?of procedures that we can apply to raw data, so that
its principal characteristics and main features are revealed.
• This might include sorting the data by size, putting it into a table,
presenting it as a chart, or summarising it numerically.

• An important consideration in this process is the type of data you are working
with.
• Some types of data are best described with a table, some with a chart and
some perhaps with both, whereas with other types of data, a numeric
summary might be more appropriate.
Descriptive statistics: What
can we do with raw
frequency table.
data?
• In this chapter, we focus on organising raw data into what is known as a

• It will be easier if we take each data type in turn, starting with nominal
data.
1. Frequency tables –
nominal
• Consider Figure 1.2 in data
Chapter 1, with a count of male and female
babies.
• Male = 265
• Female = 235

• We can express this information in a more conventional form of a


frequency table, as in Figure 2.1.
1. Frequency tables –
nominal data

The label at the top of the first (left-hand) column indicates the variable being described in the table. The remainder of
the first column is a list of the categories for this variable.
The second (right-hand) column is the frequency column. Frequency is another word for ‘count’ and lists, in this example,
the number of babies in each category, that is, males and females.
a) The frequency
distribution
• Consider another example: Figure 1.9 contains data from a nit lotion
study that compared two types of treatment for nits, malathion or d-
phenothrin, with a sample of 95 children.
• For each child, data were collected on nine variables, one being the child’s
hair colour: blonde, brown, red and dark.

• The frequency table (extracted from Figure 1.9) for the four colour
categories is shown in Figure 2.2.
a) The frequency
distribution
a) The frequency
• Notice distribution
that total frequency (n=95) is shown at the top of
the frequency column.
• You should always do this – it is helpful to any reader.

• Taken as a whole, Figure 2.2 tells us how the hair colour of each of
the 95 children is distributed across the four colour categories.
• In other words, Figure 2.2 describes the frequency distribution of the hair
colour data.
• We can see that the most common hair colour is brown and the
least common red.
b) Relative or percentage
frequency
• Often of more use than the actual number of individuals in
each category are the percentages.
• Tables with this information are called relative or
percentage
frequency tables.
• The third column of Figure 2.3 shows the percentage of children
in each hair colour category.
b) Relative or percentage
frequency

Figure 2.3 tells us that over half of the children (51.6 per cent) had brown hair.
This seems to be more helpful than knowing that 49 out of 95 children had brown hair.
Students’ task…..!

Figure 2.4 shows the frequency distribution for cause of blunt injury to limbs in 75 patients,
taken from a study of the treatment of pain after limb injury. Calculate the relative
frequencies. What percentage of patients had crush injuries?
Here is the
answer…!
2. Frequency tables –
ordinal
• When data
the data in question are ordinal, we can allocate them into ordered
categories.

• As an example, 475 psychiatric in-patients were questioned about


their
level of satisfaction with their psychiatric nursing care.
• ‘Level of satisfaction’ is clearly an ordinal variable.
• ‘Satisfaction’ cannot be properly measured, and has no units, but the categories can
be meaningfully ordered, as they have been ordered here.

• The resulting data is shown in Figure 2.5.


2. Frequency tables –
ordinal data

The frequency values indicate that more than half of the patients were happy with
their psychiatric nursing care, 282 patients (121+161) out of 475.
Much smaller numbers expressed dissatisfaction (51+52).
Students’ task…..!

•Calculate the relative frequencies for the frequency data


shown in Figure 2.5. What percentage of patients were ‘very
dissatisfied’ with their care?
Here is the
answer…!
Another students’
task…..!

In a study comparing two treatments for a whiplash injury, one group of patients received the usual emergency department
care (normal consultation plus an advice leaflet) and the other group received ‘active management’ care (normal consultation
plus additional help). Twelvemonths after the initial contact, the patients were asked to rate the benefits they felt from their
treatment. The results are shown in Figure 2.6 for each group (the group with missing values has been omitted).
A) What percentage of patients felt ‘much better’ in each group?
B) What percentage felt ‘much worse’?
C) How do you think that the missing values might affect the reliability of results in general?
Here is the answer…!
A. Much better: 26.3 and 30.3

B. Much worse: 0.6 and 0.5

C. Missing values, if there are a sizeable number can lead to inaccurate


conclusions.
Today’s
lecture………!
3. Frequency tables –
metric data
• We have to consider two situations here, one with discrete metric
data and the other continuous.

• We will start with the discrete data case.


3. Frequency tables with
discrete metric
data
• Remember Chapter 1: discrete metric data result from counting.
• This means that the number of possible values is limited; the number of cells in the
human body may be very large, but it is not infinite.

• Parity, for example, is a discrete metric variable and is counted as 0, 1, 2, 3


and so on.
3. Frequency tables with
discrete metric
The parity data shown in Figure 1.6, and reproduced below as Figure 2.8 (for
data
convenience), have values that range from 0 to 10 (i.e. there are 11 different
possible values).
3. Frequency tables with
discrete metric
• If our question is: data
• ‘How many women in the sample had a parity of 0?’
or
• ‘How many a parity of 1?’,
• we can very easily answer these questions, and similar questions, if we
arrange these data into a frequency table.

• The result is shown in Figure 2.9.


3. Frequency tables with
discrete metric
data

Students’ task: What percentage of mothers had a parity


of either 0 or 1?
a) Cumulative
frequency
• Suppose that we want to know what was the percentage of lesions among
the patients receiving the biolimus-eluding stent that required fewer than
three stents?

• A question like this is more easily answered if we add a percentage


cumulative frequency column to the respective frequency table.
a) Cumulative
frequency
a) Cumulative
• The procedure,frequency
using data from Figure 2.10, is as follows:

• Step 1. Calculate the cumulative frequencies by adding up successively the


values in the frequency column: 1805, 1805+553=2358, 2358+168=2526,
and so on.

• Step 2. Calculate the percentage cumulative frequencies by dividing each


cumulative frequency value by the total (2638) and then multiplying by 100.
a) Cumulative
• The results are frequency
shown in Figure 2.11.

• The answer to the question, ‘What was the percentage of patients receiving
the biolimus-eluting stent that had lesions that required fewer than three
stents?’, is thus 89.38 percent.

• We can also easily calculate how many patients had lesions that required
three or more stents as 100 – 89.38=10.62%.
a) Cumulative
frequency
4. Frequency tables with
continuous metric data –
grouping the rawmetric data
• Constructing frequency tables for continuous metric data is often more of a
problem than constructing with discrete data because, as we saw in
Chapter 1, the number of possible values which the data can take is infinite
(recall the clock-face analogy).

• Organising raw metric continuous data (such as the birthweight data shown in
Figure 1.1) into a frequency table is usually impractical because there are such a
large number of possible values. Indeed, there may well be no value that occurs
more than once – particularly true if the values have decimal places.

• This means that the corresponding frequency table is likely to have a large, and
thus unhelpful, number of rows. Not of much help in uncovering any pattern in
the data therefore!
4. Frequency tables with
continuous metric data –
grouping
• The most the
useful approach with metric raw
continuousdata
data is to group them first and
then construct a frequency distribution of the grouped data. Let’s see how this
works.

• The choice of the number of groups is arbitrary but you do not want too few groups
(too much information is lost) or too many (not much more helpful than the raw
data). Experience will help but as a very rough rule of thumb, no fewer than five
groups and no more than 10.

• Of course, particular circumstances may cause these values to vary. For the first 100
values of the birthweight data in Figure 1.1, I have chosen seven groups as shown in
column 1 and determined the number of birthweights in each group – these values
are shown in column 2 (see Figure 2.12).
4. Frequency tables with
continuous metric data –
grouping the raw data
Open-ended
• One problem arises groups
when one or two values are a long way from the general
mass of the data, either much lower or much higher.
• These values are called outliers.
• Their presence can mean having a lot of empty or near-empty rows at one or both ends
of the frequency table.

• One possible solution is to use open-ended categories. Take as an example the


parity data in Figure 2.9. We see that there are three rows with zero
frequencies. The frequency table can be re-designed to display the data more
economically if we use an open-ended category as shown in Figure 2.15.
Open-ended
groups
Cross-tabulation –
contingency
• Each tables
of the frequency tables above provides us with a description of the
frequency distribution of a single variable.

• Sometimes, however, you will want to examine the association between two
variables, within a single group of individuals.
• You can do this by putting the data into a contingency table, also called a table of cross-
tabulations.
• In these tables, the rows represent the categories of one variable, usually an ‘outcome’
of some sort (e.g. a diagnosis of lung cancer – Yes or No), and the columns represent
the groups within a second variable (e.g. smokers and non-smokers).
Cross-tabulation –
• To contingency tables
illustrate this idea, look at Figure 2.16. This is a contingency table of the
cross-tabulation of the variable ‘smoked while pregnant’ (Yes or No), against
three categories of the variable ‘birthweight’: <2500 g, 2500 g–3999 g, and
≥4000 g, for a random sample of 500 newborn babies.
• Here, the outcome (the rows) is birthweight, and the groups (the columns) are the
mothers who smoked while pregnant, and those who didn’t.

• This table would be called a 2 ×2 table because there are two rows and two
columns, although tables with more rows and columns are not unusual.
Cross-tabulation –
contingency tables
Thank
you 

You might also like