0% found this document useful (0 votes)
158 views34 pages

Biostatistics Lecture Notes - 1 - 4

Biostatistics note

Uploaded by

Oma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
158 views34 pages

Biostatistics Lecture Notes - 1 - 4

Biostatistics note

Uploaded by

Oma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

BIOSTATISTICS LECTURE NOTES

1. Definition of Statistical Terms


1.1 Meaning of Statistics

The need to have accurate of pieces of information about our patients, health care
systems and the society is of great concern to a sane nurse, health workers and the
body of medical practitioners and allied studies. These pieces of information can
help in taking at least roughly right decisions. This informs the use of statistics.
The word statistics is derived from the Italian word stato which means state and
statista refers to a person involved with the affairs of state. Therefore, statistics
originally means the collection of facts useful to statista (Aczel 1999:9). Statistics
in this sense was widely used across Europe and the world to gather information
not just about the state but other areas of human endervour like health care system.

Statistics writes Levine, Stephen, Krehbiel and Berenson (2013:32) is the branch
of mathematics that transforms numbers into useful information for decision
makers. Statistics lets you know about the risk associated with making a decision
and allows you to understand and reduce the variation in the decision-making
process. The Ethiopia Public Health Training Initiative (EPHTI) manual 2005
defines statistics in two folds-statistical data and statistical methods. Statistical data
refers to numerical descriptions of things. These descriptions may take the form of
counts or measurements. Thus statistics of malaria cases in one of malaria
detection and treatment posts of Ethiopia include fever cases, number of positives
obtained, sex and age distribution of positive cases, etc

In one breath, Afonja, Olubusoye, Ossai and Arinola (2014: 3) opine that the use
of statistics has permeated almost every facet of human life. It is about making
everyday important decisions and choices, turning numbers to useful information
and understanding uncertainties and risks. Aminu (1999:1) posits that the word
statistics is a plural, meaning a collection of more than one figure. The singular of
the word is called statistic. Therefore statistics is the science of collecting,
organizing, pre-selling, analyzing and interpreting data to assist in making a better
decision under a condition of understanding. Without doubt, today’s world is
complex hence; there are basically two senses in which the word statistics is
commonly used. They are either numerical record of pieces of information or as a
discipline (subject) of study. In this lecture, our concern is on numerical record
keeping in our hospitals. Statistics here is concerned with abstracting data,
classifying it and then comparing it with data obtained from similar sources so that
plans and control mechanism can be implemented in our health care system.

1.2 Types of Statistics

There exist two types of statistics: Descriptive statistics and inferential statistics.

(a) Descriptive Statistics: Descriptive statistics is the statistics that involves


organizing, summarizing and presenting data in a meaningful form or usable
format. For a data to be meaningful, the data can be organized into a frequency
distribution (Aminu 1999:2). The procedure of constructing frequency distribution
is covered in sample problem one. Thereafter, we can use such data to compute
mean, median, mode and standard deviation, etc. This is commonly called
computation of measures of central tendency. According to the EPHTI manual
2005, one branch of descriptive statistics of special relevance in medicine is that of
vital statistics – vital events: birth, death, marriage, divorce, and the occurrence of
particular disease. They are used to characterize the health status of a population.
Coupled with results of periodic censuses and other special enumeration of
populations, the data on vital events relate to an underlying population and yield
descriptive measures such as birth rates, morbidity rates, mortality rates, life
expectancies, and disease incidence and prevalence rates that pervade both medical
and lay literature.

(b) Inferential Statistics: Inferential statistics method writes Aminu (1999:2) is


used to find out something about a population, based on a sample. A sample is
usually drawn from a population in this sense. This is why the word Population
and Sample in the explanation of inferential statistics is very important. Population
also called universe does not necessarily mean number of people but can be
defined as a collection of all possible individuals, objects or measurement of
interest. For instance, all patients at the University of Port Harcourt Teaching
Hospital (UPTH) in June 2021 constitute a population. A population may also
consist of a group of measurements. Any object that can be measured is referred to
as variable such as weight of patients, height of nurses, etc. While the
characteristics of an object that cannot be measured but counted is referred to as
attributes. Examples are colour, religion, language, habit and more.

On the Other hand, sample according to Aminu (1999:2).means a portion, or part


of the population of interest. It is a subset of the population. From the example of
population about all patients at the University of Port Harcourt Teaching Hospital
(UPTH) in June 2021 constitute a population, the sample can be Cancer patient,
HIV patients, etc. Note that most studies are normally based on samples of a
population instead of study every member.

The branch of modern statistics expressed in the EPHTI manual 2005 that is most
relevant to public health and clinical medicine is statistical inference. This branch
of statistics deals with techniques of making conclusions about the population.
Inferential statistics builds upon descriptive statistics. The inferences are drawn
from particular properties of sample to particular properties of population. These
are the types of statistics most commonly found in research publications.

1.3 Steps in Statistical Methods

It is evident from the discussion above that statistics is concerned about making
use of data. Data is a collection of pieces of information to help with the
administration of the state, hospitals, companies, etc. Its domain covers data
collection, data management, data processing, data analysis, and report writing.
This process is aptly captured in a circular form by Afonja et al’s (2014: 9) work in
figure.1.

Data
Collection
Data
Checking
Use of
and
Data
Verificatio
n

Data
Data
Transfer
Dissemina
and
tion and
Compilati
Reporting
on

Data Data
Analysis Quality
and Assessme
Packaging nt

Figure 1: The Cycle of data collection, management, analysis, dissemination and


use.
Data are the facts and figures that are collected, analyzed, and summarized when
you conduct a study. Most definitions of statistics tend to hover around these
statistical activities. By and large, either descriptive or inferential statistics Afonja
et al (2014: 11) researchers do adopt the following schematic steps in their
research works as shown in Figure 2.

Decision/Conclusi
Analysing on
Collection (storage/dissemin
(process) ation/use)

Figure 2: Steps in Statistical Research

1.4 The Concept of Biostatistics

The National Cancer Institute defines biostatistics as “the science of collecting and
analyzing biologic or health data using statistical methods.” The use of statistics in
health care dates back more than a century to the earliest application of the
scientific method in medical research. Many health care decisions are based in
small or large part on the results of biostatistical research. What has changed in
recent years is the amount of health-related data available to researchers, the
technology available to translate the information into knowledge, and the need to
improve the quality and efficiency of health care.

The application of statistics to biological and medical data promises to have a


tremendous impact on the provision of health care and prevention of disease. The
accurate interpretation of biostatistical data can serve as the foundation for efforts
to improve public health and the quality of patient care. As with many burgeoning
technologies, however, there is much uncertainty among nursing professionals
about the role of biostatistics in health care

A biostatistics definition from a practical nursing perspective must incorporate the


specific training and experience required to ensure that nurse practitioners are able
to apply the technology in their daily work. Understanding how biostatistics data
translates into improved health care delivery and patient outcomes requires strong
analytical and technological competencies to gather and interpret the data. Well-
developed communication skills enable accurate deployment of custom patient
care strategies because nursing staff must be trained to ensure that the individual
care plans created for their patients are being followed and are having the desired
effect (Refer to: https://online.regiscollege.edu/blog/what-is-biostatistics-
definition-and-application-of-a-key-medical-term/)
2. Scale of Measurement

2.1 Types of Data

2.1.1 Quantitative Data (or Numerical Data)

The field of statistics deals with measurements.-some qualitative and others


quantitative. A quantitative variable can be described by a number of for which
arithmetic operations such as averaging makes sense. According to Brooks
(2008:5), there are broadly three types of data that can be employed in quantitative
analysis: time series data, cross-sectional data, and panel data.

(a)Time Series Data: Time series data, as the name suggests, are data that have
been collected over a period of time on one or more variables. Time series data
have associated with them a particular frequency of observation or collection of
data points. The frequency is simply a measure of the interval over, or the
regularity with which, the data are collected or recorded. Examples are hourly
injects for a patient, daily /routine drug monitoring for patients, monthly checkups,
etc. It is generally required that all time series data used in a model must be of the
same frequency of observation.

(b) Cross-sectional Data: Cross-sectional data are data on one or more variables
collected at a single point in time. For example, the data might be on: A poll of
usage of polio vaccination, measles, cholera, etc.

(c) Panel Data: Panel data have the dimensions of both time series and cross-
sections, e.g. the daily prices of a number of blue chip stocks over two years. The
estimation of panel regressions is an interesting and developing area.
2.1.2 Qualitative Data (or Categorical Data)

A qualitative (categorical) variable simply records a quality. If a number is used


for distinguishing members of different categories of a qualitative variable, number
assignment is arbitrary (Aczel 1999:10). The EPHTI Manual 2005 posits that
qualitative variable(or data) is a variable or characteristic which cannot be
measured in quantitative form but can only be identified by name or categories, for
instance place of birth, ethnic group, type of drug, stages of breast cancer (I, II, III,
or IV), degree of pain (minimal, moderate, severe or unbearable), etc.

According to Afonja et al (2014: 24-26), measurement is the assignment of


numbers or symbols to objects or events in a systematic fashion. In order to
understand your variables, it is important to know their level of measurement For
example, the number 3 might indicate a score of three; it might indicate that the
object was ranked third in the class; or it might indicate that the object was in third
category. To help understand these differences, types or levels of variables have
been identified. Measurement exists in several levels depending on what is to be
measured, the instrument to be employed, the degree of accuracy or precision
desired and the method of measurement. Data come in four different levels or
scales Nominal, Ordinal, Interval, and Ratio.

(a) Nominal Scale: Nominal variables are categorical variables that have three
or more possible levels with no natural ordering. In a nominal scale, no
quantitative information is conveyed and no ordering of the items is
implied. The nominal scale is the simplest, and it involves only assignment
to classes and does not imply magnitude. It tells us the category or the
names without no specific order in mind. They are used as labels for groups
or classes. Examples are the classification of respondents into male and
female, nurses classified into trained and untrained. We can thus code the
various categories as Male 1, female 0; Trained 1, Untrained 0. With
nominal scale data, the obvious and intuitive descriptive summary measure
is the proportion or percentage of subjects who exhibit the attribute. Table 1
shows a hypothetical example on the application of say survival status of
patients on the administration of propanolol.
Table 1: Hypothetical Nominal Scale Data for Survival Status of
Propanolol - treated and control patients with Myocardial
Infarction(MI)

Status after 28 Propanolol Control


Days of Hospital treated Patients
admission Patients

Dead 11 27

Alive 31 15

Total Survival 73.81% 35.7%


Rate

Table 1 shows the hypothetical placement of the MI patients. It shows clearly how
the individuals are simply placed in the proper category or group, and the number
in each category is counted. Each item fits into exactly one category. It shows a
total of 35.7% Survival rate for the control patients while 73.81% were for the
experimental patients. This analysis indicates that one group actually receives the
propranolol while other does not-dichotomous groups.
(b) Ordinal Scale: Here, data elements are usually ordered according to their
relative quality or size. It is a level of measurement which classifies data into
categories that can be ranked. Differences between the ranks do not exist. The
ordinal scale is based on the natural order property of real numbers which says the
one real number may be greater than or equal to or less than another real number. It
allows for classification and indication of size of some predefined basis. We can
rank patients according to magnitude of ailment. An example is the degree of
injuries of patients during accidents and a host of others. Equal intervals on it do
not represent equal quantities, e.g., fatal/very severe- I, not too severe- 2, severe- 3,
minor- 4.

(c) Interval Scale: It is a level of measurement which classifies data that can be
ranked, and differences are meaningful. However, there is no meaningful zero, so
ratios are meaningless. The interval scale possesses the order property of the
ordinal scale. The magnitude of difference between adjacent intervals on the scale
is equal. Arithmetic operations permissible on this scale include all those allowed
on the ordinal scale, in addition, measurement can be added, subtracted, divided
and multiplied by a constant and yield an interpretable result. Comparison between
interva1s on this scale is meaningful, and is independent of the unit of
measurement or the system of assigning scores. Examples of these are height,
weight and temperature.

(d) Ratio Scale: This is the strongest scale of measurement. It is a level of


measurement which classifies data that can be ranked, differences are meaningful,
and there is a true zero. The zero here indicates absence of quality or attributes
being assessed. True ratios exist between the different units of measure. The ratio
scale is a scale that has equal intervals, as well as new marks which indicate
complete lack of quantity being measured. Examples are mass weight (in grammes,
pounds), mortality rate (in % per 1000).Table 2 contains the various characteristics
and examples of the four levels of measurement scales.

Table 2: Characteristics and Examples of the Four Levels Measurement

Scale Characteristics Example


Nominal  Three or more unordered  Gender (male, female) is
categories. a dichotomous variable
 Two ordered or unordered  Ethnicity
categories (dichotomous).  State of origin
 Arithmetic relations (addition,  Nationality
subtraction, division and  Religion
multiplication) have no  Colour
meaning.

Ordinal  Three or more ordered  Competence scale.


categories.  Intelligence quotient
 Difference in magnitude (IQ).
between levels is not equal.  Case of degree.
 Arithmetic relations (addition,  Educational status
subtraction, division and
multiplication) have no
meaning.

Interval  Ordered levels.  Temperature Scale


 Difference between levels is  Psychological Scale
equal. No true zero.
 Arithmetic relations (addition,
subtraction, division and
multiplication) have meaning.
Ratio  Ordered levels  Length
 Difference between levels is  Weight
equal.  Temperature
 True zero exists.
 Arithmetic relations (addition,
subtraction, division and
multiplication) have meaning.

Source: Adapted from Afonja et al (2014: 26)

3. Basic Concepts, Principles and Methods of


Collecting Data
3.1 Population and Samples

A population is the set of all objects (units) or observations about which


conclusion; are to be drawn.

It may be made up of anything from one observation or unit to an infinite number.


When a population contains a definite number of observations or units no matter
how large the number, the population is said to be finite. When no upper limit can
be put on the number in the population, it is said to be infinite. A sample is
necessarily finite.
With the definition of variables previously given, population can be seen as the set
of all the possible values of a variable under study. It may be made up of anything
from one possible observation or unit to an infinite number.

3.1.2 Population Census

The term census means a complete enumeration of all units in the population. It is
a survey which includes every item or element in the population. It is sometimes
called a 100% count.

A familiar example of a census is that of human population carried out periodically


in every country. The topics on which information is generally sought include:

Household and family information

 Personal characteristics such as age, sex, etc.


 Economic and occupationa1characteristics
 Cultural characteristics like language, nationality and ethnic groupings
 Education.

Apart from human population, other types of censuses include: houses, farms,
patients, road traffic, etc.

3.1.3 Sample Survey: This is an examination of part of a population about which


we want to draw inference. It is a practical alternative to the complete count;
hence, it is generally used in place of census, the steps in carrying out a sample
survey consist of the following:

 specification of the population of interest the unit of observation,


measurement to make
 questionnaire design and administration
 determination of the sample size
 the procedure for the selection of the sample.

3.1.4 Questionnaire Design: The issue to consider relate to the number, type,
order and arrangement of the questions. The EPHTI Manual 2005 states that
“designing a good questionnaire always takes several drafts. In the first draft we
should concentrate on the content. In the second, we should look critically at the
formulation and sequencing of the questions. Then we should scrutinize the format
of the questionnaire. Finally, we should do a test-run to check whether the
questionnaire gives us the information we require and whether both the
respondents and we feel at ease with it. Usually the questionnaire will need some
further adaptation before we can use it for actual data collection”. This fivefold
steps are shown in Figure 3.

Content
_objecti
ve

Formulating
Translation of
Questions

Formating
Sequencing
the
of Questions
questions

Figure3: Steps in Questionnaire Design


3.1.5 Questionnaire Administration: There are two ways of administering a
questionnaire. When the respondent (the information supplier) personally records
his/her responses to prepared questions, the questionnaire is said to be self-
administered. But when the interviewer (the information seeker) asks the questions
and records the respondent’s answers, the administration is an interview type. The
se1f-administered questionnaire could be conducted by mail, telephone or internet
(online). Similarly, the interview could be by telephone or person.

3.1.6 Sample Design: Sample size determination and the selection procedure are
sometimes regarded as constituting the sample design.

3.1.7 Survey Size: Both the questionnaire size and the sample size make up the
survey size.

3.1.8 Sample Selection: The list of all the units of the population from which a
sample is to be selected is called a frame. The units are called sampling units.
Common examples of frames are list of members of staff at University of Port
Harcourt teaching Hospital (UPTH), here, we can select list of doctors, nurses,
others.

Sequel to the above discussions, one can infer that sample selection may be
broadly classified into two types: random and non- random sampling or probability
and non-probability sampling. Any sampling procedure differing from random
sampling will be regarded as non-random. In particular, any such non-random
procedure will not employ or attempt to employ chance devices make the selection
of units. We mention the more easily-recognizable ones.
(i) Haphazard Selection in which the selector thinks that he/she is making a
random selection. A good example is the sort of selection in public places often
made by various press agents. By stopping to interview people ‘at random’ without
following any prescribed sampling rules like those previously discussed, a
journalist makes a claim at random sample of public opinion. You can think of
many possible biases that can come into such haphazard selection.

(ii) Systematic Sampling is one in which the sampling units are selected at fixed
intervals from the frame. Suppose it is decided to take a sample of 100 medical
practitioners using a medical directory as a frame, if the directory has 800 names in
it, then a systematic sample will be obtained by taking even’ eighth name.

Similarly, instead of selecting a random sample of ten students from your school,
you may select a systematic sample in which every ninth student on the school list
of ninety names is picked.

The main disadvantage of systematic sampling is the possibility of the selected


units having some resemblance because they coincide with some unexpected
periodic variation in the frame. Suppose the school list was arranged by classes and
based on the final examination results. If there are ten classes each -having nine
students, then picking every ninth student on the list will mean picking the worst
student in- each class. Surely the resulting sample cannot be said to be by any
means a representative one. Though there are a few limited situations where some
advantages of systematic sampling are claimed.

(iii) Quota sampling attempts a fair representation of different classes that may
exist in a given population. It is commonly used in public opinion and market
research surveys. In such surveys, the interviewer is required to ensure that
specified number of units in various classes like sex, age, income group,
geographic location is included in the sample.

(iv) Expert Selection is, in the opinion of the expert, the, one that produces a truly
representative sample. The procedure is totally devoid of any standard obec1ive
rule and leaves too much to the personal judgment of the enumerator. It is hardly
recommended in the statistical world.

(v) Experiments: An experiment study involves taking measurements of the


system under study, manipulating the system, and then taking additional
measurements using the same procedure to determine if the manipulation has
modi1ed the values of the measurements. Experiments have been the main weapon
for building and testing scientific theories. In the experimental type, the
investigator (experimenter) is very often able to control some factors which are not
relevant to the problem under study; such factors are to be controlled so as to avoid
bias in the conclusions.

4. Collection and Organization of Data


4.1 Methods of Data Collections

In any field of enquiry, the objective of the investigator determines the type and the
nature of data to be collected. This objective must be clearly stated. The statistical
objective therefore is to devise the best methods of collecting the data in order to
achieve the investigator’s objective bearing in mind: the cost of collecting the data;
practicability of the proposed methods of collection and representativeness of
samples, if sampling is done; time spent in collecting data and accuracy of
observations; and the ability to assess and minimize the various errors and biases
which could lead to uncertainty in the conclusion.
According to the EPHTI Manual 2005, data collection techniques allow us to
systematically collect data about our objects of study (people, objects, and
phenomena) and about the setting in which they occur. In the collection of data we
have to be systematic. If data are collected haphazardly, it will be difficult to
answer our research questions in a conclusive way. Various data collection
techniques can be used such as:

(a) Observation: This is a technique that involves systematically selecting,


watching and recoding behaviors of people or other phenomena and aspects of the
setting in which they occur, for the purpose of getting (gaining) specified
information. It includes all methods from simple visual observations to the use of
high level machines and measurements, sophisticated equipment or facilities, such
as radiographic, biochemical, X-ray machines, microscope, clinical examinations,
and microbiological examinations. Outline the guidelines for the observations prior
to actual data collection. Advantages: Gives relatively more accurate data on
behavior and activities Disadvantages: Investigators or observer’s own biases,
prejudice, desires, and etc. and needs more resources and skilled human power
during the use of high level machines (EPHTI Manual 2005).

(b) Face-to-face and self-administered: Interviews and self-administered


questionnaires as explained by the EPHTI Manual 2005 are probably the most
commonly used research data collection techniques. Therefore, designing a good
“questioning tools” form an important and time consuming phase in the
development of most research proposals. Once the decision has been made to use
these techniques, the following questions should be considered before designing
our tools: What exactly do we want to know, according to the objectives and
variables we identified earlier? Is questioning the right technique to obtain all
answers, or do we need additional techniques, such as observations or analysis of
records?

(c ) Postal or mail method and Telephone: Under this method, the investigator
prepares a questionnaire containing a number of questions pertaining the field of
inquiry. The questionnaires are sent by post to the informants together with a polite
covering letter explaining the detail, the aims and objectives of collecting the
information, and requesting the respondents to cooperate by furnishing the correct
replies and returning the questionnaire duly filled in. In order to ensure quick
response, the return postage expenses are usually borne by the investigator (EPHTI
Manual 2005).

In sum, EPHTI Manual 2005 categorically states that Face-to-face and telephone
interviews have many advantages. A good interviewer can stimulate and maintain
the respondent’s interest, and can create a rapport (understanding, concord) and
atmosphere conducive to the answering of questions. If anxiety aroused, the
interviewer can allay it. If a question is not understood an interviewer can repeat it
and if necessary (and in accordance with guidelines decided in advance) provide an
explanation or alternative wording. Optional follow-up or probing questions that
are to be asked only if prior responses are inconclusive or inconsistent cannot
easily be built into self-administered questionnaires. In face-to-face interviews,
observations can be made as well. In general, apart from their expenses, interviews
are preferable to self-administered questionnaire, with the important proviso that
they are conducted by skilled interviewers.

• Using available information

• Focus group discussions (FGD)


• Other data collection techniques – Rapid appraisal techniques, Nominal group
techniques, Delphi techniques, life histories, case studies, etc (EPHTI Manual,
2005).

(d) Use of documentary sources: Clinical and other personal records, death
certificates, published mortality statistics, census publications, etc. Examples
include:

1. Official publications of Central Statistical Authority

2. Publication of Ministry of Health and Other Ministries

3. News Papers and Journals.

4. International Publications like Publications by WHO, World Bank, UNICEF

5. Records of hospitals or any Health Institutions. During the use of data from
documents, though they are less time consuming and relatively have low cost, care
should be taken on the quality and completeness of the data. There could be
differences in objectives between the primary author of the data and the user etc
(EPHTI Manual, 2005).

4.2 Data Types

Data are facts, observations, arid information that are obtained from investigations.
When a particular characteristic or an attribute is measured or observed in an
object, the resulting value is regarded as an observation. Such characteristic or
attribute is a variable. This is because it varies from one object to another. For
instance, age of students in a class is a variable. The classifications of data/variable
are illustrated in figure 3. There are two kinds of data/variables. They are
quantitative and qualitative data.
4.1.1 Quantitative variable is one which is measured on numerical scale. Data
collected on a quantitative variable ate often referred to as metric data. For
example, heights, weights, school marks, market prices, daily temperatures and
many others that can be measured on an object are - quantitative. They can be
further divided into discrete and continuous. They are said to be discrete, if the
observed values take whole numbers only. An example of such is family size. On
the other hand, they are said to be continuous if they can take any value within a
range of values. An example is height. This category of variables could be
subjected to arithmetic operations such as addition, subtraction, division and
multiplication.

4.1.2 Qualitative variable involves non-numerical items which are classified into
groups or categories. A qualitative variable describes observations as belonging to
one of a set of categories. Qualitative data such as gender, eye colour, etc. of a
group of individuals are not computable by arithmetic relations.

Data/Variable

Qualitative
Quantitative Data/Variable [Non-
Data/Variable Numerical]e.g
[Numerical] gender, religion,
colour

Continous
Discrete Data/ Data/Variable e.g
Variable height, weight,
temperature

Figure 3: Classification of Data


4.3 Sources of Data

The first step in the use of data for evidence-based decision making is the search
for any available data. The data user may then decide to us some existing ones or
coiled fresh data. The existing ones could be found from published documents or
unpublished administrative documents while fresh data may come from censuses,
sample surveys, electronic sources such as internet and experiments.

(a) Unpublished Sources: These are sometimes referred to as administrative


sources. Data in their original form exist in the files, log-books and various
registration forms of many government and non-government departments.
Some organizations keep records of their daily activities, monthly and yearly
operating and accounting records. Such routine data are kept in computer
data files for efficient entry, storage and retrieval of such information.
Example is the number of children admitted into UPTH in 2020.

(b) Published Sources: Published data are naturally more readily accessible than
unpublished ones. Listed below are the main sources of published data:

(i) Statistical abstracts, bulletins and reports issued by the government


departments, especially their statistical units

(ii) Research reports and learned journals

(iii) Daily newspapers, magazines and miscellaneous periodicals

(iv) Miscellaneous reports of government and non-government agencies.

(c) Electronic Sources: The electronic sources are widely used because of the
tremendous growth in information technology (IT) globally. The electronic device
include World Wide Web, internet/intranet, direct data capturing machine, Global
System of Mobile phones (GSM), etc. Huge data are now available for
downloading on internet at relatively no cost, with little or no restrictions. All these
are in vogue now, and they make the collection of -data fast/accurate and less
costly.

(d) Statistical Survey: A statistical survey is a process whereby an investigation is


carried out in order to solve some social, business, academic and economic
problem. It consists of finding facts in particular fields of inquiry. The following
are three important types of surveys (in which the data collected are of statistical
nature): social surveys, market research and public opinion polls. Surveys may be
administered in a variety of ways, e.g. personal interview, telephone interview, self
administered questionnaire, and internet.

4.4 Tabular Presentation

This is an orderly and precise arrangement of numerical information in columns


and rows. Tabulation of data forms the basis for reducing and simplifying the
details given in a mass of data into such a form that the main features may be
brought out to make the assembled data easily understood. Among such specific
points are: clustering, variation, trends and relationships. This will facilitate the
interpretation of the assembled data.

4.4. 1 Classification

This is the process of arranging observations into logical, meaningful, useful


categories in accordance with the nature of the property under study. The idea is to
group like things together to facilitate comparison of likes with like and also to
reduce a large number of data to a form suitable format statistical analysis and
processing. Ideally, a group or class must be homogenous, that is, it should include
all items and only those items with definite characteristics of data. Each item must
have a class to which it belongs (Exhaustive). Each item must belong to only one
class (Exclusive).

Classification can be qualitative when items are sorted in groups, each possessing
some attributes that cannot be expressed numerically, e.g. gender can be grouped
as male or female, etc. It can also be quantitative when items vary in respect of
some measurable characteristics. Most quantitative classifications form frequency
distributions.

There are geographical classifications (by state, regions and entities) which are
good for administrative purposes. Also, there are chronological classifications or
time ‘series which give figures concerning a particular phenomenon at various
specified times. Table 3 is an example of time series on poverty incidence,
estimated population and population in poverty in Nigeria between 1980 and 2010.

Table 3: Road Accidents in Nigeria by Number of Reported Cases and


Persons Involved, 2002-2007

Year Fatal Serious Minor Total Killed Injured Total


2002 7206 8411 6778 22395 9240 22790 32210
2003 5401 7432 4373 17208 7697 16171 23868
2004 6362 8509 4740 19611 8161 20925 29086
2005 6132 7849 4676 18659 8980 16888 25868
2006 5806 8052 4804 18663 9131 19200 28331
2007 5789 7223 4785 17797 9390 17413 27803
Source: National Bureau of Statistics
4.3.3 Guidelines for Construction of Statistical Tables

We have seen from Table 2 above most of the desirable features of a good table.
The construction of statistical tables does not require expert thoughts or a great
skill. All that is necessary is to pay attention to the more obvious and simple
points. The following guidelines for the drawing of tables should be noted:

i. The table should have a title and should be short and self-explanatory.
ii. The table should be simple and unambiguous. It must be easily interpreted.
iii. It should present the data clearly, highlighting important details.
iv. It should save space but attractively designed.
v. Self-sufficiency of the table in the sense of not requiring supporting tables or
evidence. This is because, quite often, a text table could be easily used by
another person as a reference material.
vi. Abbreviations and symbols should be avoided as much as possible.
Approximations and omissions should be explained in the footnote.
vii. Always indicate the source(s) of the data at the bottom of the table as to
know the origin of the table.

4.4 Frequency Table

Based on the purpose for which the table is designed and the complexity of the
relationship, a table could be either of simple frequency table or cross tabulation.
The simple frequency table is used when the individual observations involve only
to a single variable whereas the cross tabulation is used to obtain the frequency
distribution of one variable by the subset of another variable. In addition to the
frequency counts, the relative frequency is used to clearly depict the distributional
pattern of data. It shows the percentages of a given frequency count. For simple
frequency distributions, (like Table 1) the denominators for the percentages are the
sum of all observed frequencies (EPHTI Manual, 2005).

On the other hand, EPHTI Manual, 2005 reports that in cross tabulated frequency
distributions where there are row and column totals, the decision for the
denominator is based on the variable of interest to be compared over the subset of
the other variable.

In a set of data, the number of times that a particular value occurs is called the
frequency of that value. If the ages of ten students are 4, 5, 6, 6, 7, 2, 2, 2, 2 and 2,
the frequency of the value 6 is 2 and that of 2 is 5. If we construct a table with the
age values and frequencies as entries, we have a frequency table as Table 3.

Table 3: Frequency Table

Frequency
Age
4 1
5 1
6 2
7 1
2 5

We can thus define a frequency table as one in which the variable of interest forms
the basis for classification and the entries are frequencies. A frequency table is
also called a frequency distribution because such a table shows how the values of
the variable are distributed. However, we need not use exact values as the classes.
We could use the intervals as we shall see very soon.
Frequency distribution is a tabular arrangement of data by classes together with the
corresponding class frequencies. The frequency distribution tables are of
fundamental importance in statistics. Apart from being a useful summary method
of presenting data, they form a basis for the discussion of probability ideas.

4.5 Diagram and Graphs

In medical research diagrams and graphs are used to present information or data
for the understanding of both users and patients in an attractive and colourful
manner. They are used for clearer display of pieces of information regarding the
research process. Examples of graph and diagrams are pictograph, bar chart, pie
charts, etc. we look at each of them in the next sub-section.

(a) Pictorgraph/Ideograph
The pictograph primarily looks at presentation of data using picture forms. In
pictograph, the number of scale of the picture bears a relationship to the
magnitude of the variable being presented. One of the advantages is that it’s easy
to understand and easy to draw. However, one of its major disadvantages is its
inaccuracy for a complex statistical presentation and interpretation (Aminu,
1995; 8).
Example 1: suppose, following number of children suffers pneumonia in
Obio/Akpor Local Government Area, Rivers State for the last five years: 30, 60,
80, 40, 10. Present the information in a form of pictograph
Solution
1st Year 30

2ndYear 60
3rd Year 80

4th Year 20

5th Year 10

Note: each box represents 10 patients

(b) Bar Chart


Data generated from a research process can also be reported using bar
chart. It is a chart that uses rectangle to represents the data. In such case, it is
expected that each rectangular height must be equal to the magnitude of the
data. A bar chart thus can take the form of either a horizontal bars or vertical
bars depending on the choice of the researcher. Bar chart can take simple,
multiple component and sometimes percentage component bar chart. Aminu
(1995;8) opine that a bar chart takes into account of positive and negative
values of data. It is also easy to understand and has no peculiar disadvantage.
Thus, it is not easy for the understanding of people.
Simple Bar chart
A simple bar chart primarily emphasis presents information about a
single variable in a concise manner. Here, the bars could be drawn either
horizontally or vertically for the research report.
Example 2: Suppose the following data is recorded for the number of
admitted students in the post basic school of Nursing University of port
Harcourt Teaching Hospital (UPTH)
Year Number of Students Male Female
2010 10 3 7
2011 15 5 10
2012 25 11 14
2013 26 16 10
2014 30 12 18
Draw a simple bar chart to represent the information generated by the
researcher.

Solution
Here, you can use either a horizontal or vertical bar chart.
Steps to follow
Step 1: Choose a desired scale for the axes
Step 2: Plot the simple bar chart

35
2014
30
25 2013
20
2012
15
10 2011

5 2010
0
2010 2011 2012 2013 2014 0 10 20 30 40

Panel a: Vertical Bar Chart Panel b: Horizontal Bar Chart

Multiple Bar Chart


In multiple bar charts variables that alike are usually plotted together.
We can plot the information in example 2 for male and female students
using multiple bar chart.
Class illustration needed.

Component Bar Chart


The component bar chart is like the multiple bar chart. However, in
the case of component bar chart, the bars are not departed rated both male
and female students are plotted on the some bar but separated by diverse
colours. It can be red for female and blue for male but on the some bar chart.

(c) Pie Chart


The pie chart is primarily a circular representation of data. The circle is
divide into degrees but nor more than 3600. to do this, each proportion of the
frequency is divided by the total and multiplied by 360 0. Using the data in
example 2, we can represent the number of students in a pie chart as follows.
Step 1: Sum the total number of students as; 10 + 15 + 25 + 26 + 30 = 106
Step 2: Convert the degrees
10
X 360 = 33.96
106
15
X 360 = 50.94
106
25
X 360 = 84.91
106
26
X 360 = 88.30
106
30
X 360 = 101.89
106

Step 3: You now construct a pie chart using the various degrees as follows

Key
2010 33.96 Yellow
2011 50.94 Blue
2012 84.91 Red
33.96

101.89 2013 88.30 Green


84.94
2014 101.89 Purple
50.94 88.3

Home Work
Using the hypothetical distribution of Doctors and Nurses in UPTH in the
table below
Year Doctors Nurses
2015 40 80
2016 46 102
2017 60 120
2018 80 170
Use the pie chart to describe the data above
4.6 Frequency Distributions
Aside the use of charts and pictures, another way to organize and group data
for decision making in the hospital is the frequency distribution table.
In kalu’s (2013:39) view, frequency distribution is used to compress data by
recording an observation and counting the number of times a particular
observation is repeated which is called the frequency in this sense is called
speed of occurrence of variable in the data set. The data is thus organized in
classes with specific width or size.
Example 3: In a hypothetical survey of 50 families in Alakahia,
Obio/AkporL.G.A, the number of children per family was recorded as:
10 80 15 70 15 10 30 40 50 20
15 20 30 40 10 5 10 20 15 30
50 30 20 40 50 79 38 71 10 80
40 5 15 20 15 65 43 68 15 42
30 10 10 15 30 45 55 70 19 45
You are required to represent the above data in the form of a frequency
distribution.

Solution
Step 1: To make it easier, we begin by either arranging the data in an array
form (I.e. Ascending or descending order of magnitude). You notice that the
lowest and the highest number in the data. That is,
5 5 10 10 10 10 10 10 10 15
15 15 15 15 15 15 15 19 20 20
20 20 20 30 30 30 30 30 30 38
40 40 40 40 42 43 45 45 50 50
50 55 65 68 70 70 71 79 80 80
Step 2: Form the frequency distribution of the number of children.
Number of Tallies Frequency
children
5 2
10 7
15 8
19 1
20 5
30 6
38 1
40 4
42 1
43 1
45 2
50 3
55 1
65 1
68 1
70 2
71 1
79 1
80 2
Total 50
Note: the Frequency distribution table above shows the speed of occurrence
of each of the data.

You might also like