FUNDAMENTALS OF
BIOSTATISTICS
S. a nikam
statistics:
- It refers to the subject of scientific activity dealing with the
theories and methods of collection, compilation, analysis
and interpretation of data. OR it is collection prediction
and interpretational of numerical data .
Bio-statistics:
medical field of statics
- An art & science of collection, compilation, analysis and
interpretation of data.
“BIOSTATISICS ”
Statistics arising out of biological sciences, particularly
from the fields of Medicine and public health.
(2) The methods used in dealing with statistics in the
fields of medicine, biology and public health for
planning, conducting and analyzing data which arise in
investigations of these branches.
Reasons to know about biostatistics:
Medicine is becoming increasingly quantitative.
The planning, conduct and interpretation of much of
medical research are becoming increasingly reliant on
the statistical methodology.
Statistics pervades the medical literature.
Data(sing. Datum):
- A set of observations, usually obtained by measurement
or counting
TYPES OF DATA
QUALITATIVE DATA
DISCRETE QUANTITATIVE
CONTINOUS QUANTITATIVE
Classification of data-
Qualitative/Attribute
Quantitative/Variable: Continuous & Discreet
Qualitative Data:
- Can not be expressed in number
- Not measurable
- Can only be categorized under different categories &
frequencies
- E.g., Religion is an attribute; can be categorized into Hindu,
Muslim, Christian
- Human Blood Group: A,B,AB or O
- Sex: M/F
Quantitative Data/variable:
- In statistical language, any character,
characteristic or quality that varies is called
variable
- It has got magnitude
Continuous variable:
- It is expressed in numbers & can be measured
- Can take up infinite no. of values in a certain
range
- E.g., weight, height, blood sugar
Discreet variable:
- Countable only
- Takes only some isolated values
- E.g., numbers of a family members, no. of workers in
a factory, no. of persons suffering from a particular
disease
According to source-
Primary Data
Secondary Data
Primary Data:
- Collected directly from the field of enquiry
- original in nature
- E.g., measurement of BP, weight, height, blood sugar
Secondary Data:
- Collected previously by some other agency/organization
- Used afterwards by another
- E.g., hospital records, census data
Scale of measurement
Qualitative variable:
A categorical variable
Nominal (classificatory) scale
- gender, marital status, race
Ordinal (ranking) scale
- severity scale, good/better/best
Quantitative variable:
A numerical variable: discrete; continuous
Interval scale :
Data is placed in meaningful intervals and order. The unit of measurement are
arbitrary.
- Temperature (37º C -- 36º C; 38º C-- 37º C are equal) and
No implication of ratio (30º C is not twice as hot as 15º C)
Ratio scale:
Data is presented in frequency distribution in logical order. A meaningful ratio exists.
- Age, weight, height, pulse rate
- pulse rate of 120 is twice as fast as 60
- person with weight of 80kg is twice as heavy as the one with weight of 40 kg.
Nominal scales
Ordinal Scales
Interval Scales
Ratio
Nominal Scales:
- Used when data are classified by major categories or
subgroups of population
- Religion can be assigned to following categories- Muslim,
Hindu, Christian
- Outcome of treatment: cured or not cured; died or survived
Ordinal Scales:
- Assign rank order to categories placed in an order
- E.g., students rank in a class; Grades A,B,C,D;
- Literacy status: illiterate, just literate, primary, secondary,
higher secondary, graduate, post graduate
- Disease condition: mild, moderate, severe
Interval Scale:
- Distance between two measurement is defined, not their ratio
- E.g., intelligence score in IQ tests, temperature in Centigrade
Ratio Scale:
- Both the distance & ratio between two measurements are
defined
- E.g., length, weight, incidence of disease, no. of children in a
family
Dichotomy/ Binary Scale:
- A scale with only two categories
- E.g., disease→ present/absent; sex→male /female
Population:
- An aggregate of objects, animate or inanimate, under study
- A group of units defined according to aims & objective of
the study
Sample:
- a finite subset of or part of population
- Every member of population should have equal chance to be
included in sample
Parameter:
- constant, describes the characteristics of population
Statistic:
- Function of observation, which describes a sample
Statistic Parameter
Mean x (x bar) µ(Mu)
Standard Deviation s s (sigma)
No. of Subject n N
Proportion P P
Main sources for collection of medical statistics are:
1. Experiments:
- Performed in the laboratories of physiology, biochemistry, pharmacology,, clinical
pathology
- Hospital words→ for investigations & fundamental research
- Used in preparation of thesis/dissertation, scientific paper for publication in
scientific journals & books
2. Surveys:
- Carried out for epidemiological studies in the field by trained teams to find out
incidence or prevalence of health or disease situations in a community
- Used in OR→ assessment of existing condition, how to follow a program, to study
merits of different methods adopted to control of a disease
- Provide trends in health status, morbidity, mortality, nutritional status, health
practices, environmental hazards
- Provide feedback needed to modify policy
- Provide timely earning of public health hazards
3. Records:
- Maintained as a routine in registers or books over a long
period of time
- Used for keeping vital statistics: births, deaths, marriage,
hospitalization following illness,
- Used in demography & public health practices
- Collected data are qualitative
DATA
INFORMATION
Statistical data is presented usually in tabular forms through
different types of tables and in pictorial forms; diagrams,
charts
Method of presentation:
A. Tabulation
B. Drawing
OR
By forming frequency distribution, we can summarize the data effectively. • It is a
method of presenting the data in a summarized form.
Class interval=
max mark-minimum mark/2
Tabular presentation:
- A form of presenting data from a mass of statistical data
- at first frequency distribution table is prepared
- Table can be simple or complex
Frequency distribution table or frequency table:
- All frequencies considered together form “frequency
distribution”
- No of person in each group is called the frequency of that
group
- Frequency distribution table of most biological variables
develop normal, binomial or Poisson distribution.
- Data needs consolidation by way of tabulation to
express some meaning
- Tabulation → a process of summarizing raw data &
displaying it in a compact form for further analysis
- Orderly management of data in columns & rows
Presentation of quantitative data is more cumbersome as
- Characteristic has a measured magnitude as well as
frequency
Height of groups in Cm Markings Frequency of each group
- Table x: presentation of quantitative data of height in
160-162 //// //// 10
markings
162-164 //// //// //// 15
164-166 //// //// //// // 17
166-168 //// //// //// //// 19
168-170 //// //// //// //// 20
170-172 //// //// //// //// //// / 26
172-174 //// //// //// //// //// 29
////
174-176 //// //// //// //// //// 30
////
176-178 //// //// //// //// // 22
178-180 //// //// // 12
Total 200
Or Discret f.d
OR CONTINEOUS F.D
1) Class interval =
max value-min
value/2
2) Class
boundaries'=lower
limit of 2nd class-upper
limit of 1st class/2
3) Mid point =sum of
class /2
Cumulative Frequency Distribution:
It is a form of frequency distribution that represents the sum of
a class and all classes below it. Remember that frequency
distribution is an overview of all distinct values (or classes of
values) and their respective number of occurrences.
CUMULATIVE RELATIVE FREQUENCY DISTRIBUTION
The cumulative relative frequency distribution of a quantitative
variable is a summary of frequency proportion below a given level.
The relationship between cumulative frequency and relative
cumulative frequency is:
Example: Jane is fond of playing games with dice. She throws the
dice and notes the observations each time. These are her observations:
4, 6, 1, 2, 2, 5, 6, 6, 5, 4, 2, 3 To know the exact number of times she
got each digit (1, 2, 3, 4, 5, 6) as the outcome, she classifies them into
categories. An easy way is to use tally marks. Note that a diagonal line
across 4 vertical lines counts as 5
The table is
known as a frequency distribution table. We can observe that all
the data that was gathered has been organized under two columns.
Thus, a frequency distribution table is a chart summarizing the
values and their frequencies. In other words, it is a tool to organize
data. This makes it easy for us to understand the given set of
information. Thus, frequency distribution in statistics helps us
condense data in a simpler form so that it is easy for us to observe
its features at a glance.
Example 2: The height
of 50 paracetamol tablets are between 480-520 mg.
The frequency distribution table shows measurement categories
and the number of observations in each category.
The range divided into intervals called “class interval”
The width of the class is determined by dividing the range of
observations by the number of classes.
Frequency distribution table gives the cumulative and relative
frequency that helps to interpret the data more easily.
General Principle in designing Table:
- Table should be numbered a
- Brief & self-explanatory title should be there mentioning
time, place, person
- Headings of columns & rows should be clear & concise
- Data to be presented according to size of importance
chronologically, alphabetically, geographically
- Data must be presented meaningfully
- Table should not be too large
- Foot notes given, if necessary
- Total no of observations ; the denominator should be
written
- Information obtained should be summarized in the table
Frequency distribution drawings:
- After class wise or group wise tabulation, the
frequencies of a characteristics can be presented by
two kinds of drawings
- Graphs & Diagrams
- May be shown by either lines, dots, figures
o Presentation of quantitative data is through graphs
o Presentation of qualitative, discreet, counted data is
through diagrams
1. Histogram
- Graphical presentation of frequency distribution
- Variable characters of different groups are indicated in
the horizontal line (x-axis) is called abscissa
- No. of observations marked on the vertical line (y-axis) is
called ordinate
- Frequency of each group forms a triangle
2. Frequency Polygon:
- An area diagram of frequency distribution developed over
a histogram
- Mid points of the class intervals at the height of frequency
are joined by straight lines
- It gives a polygon, figure with many angles
3. Frequency Curve:
- If no. of observation are very large & group interval
reduced
- Frequency polygon tends to loose its angulation
- Gives rise to a smooth curve → frequency curve
4. Line Chart or Graph:
- A frequency polygon presenting variation by line
- Shows trend of event occurring over a period of time
- Shows rise, fall or periodic fluctuations vertical axis may not
start from zero, but some point above frequency
5. Cumulative Frequency Diagram or “O give”
- Graph of the cumulative frequency distribution
- An ordinary frequency distribution table→ relative
frequency table
- Cumulative frequency: total no. of persons in each particular
range from lowest value of the characteristic up to &
including any higher group value
6. Scatter or Dot Diagram:
- Prepared after tabulation in which frequencies of at least
two variables have been cross classified
- Shows nature of correlation between two variable character
in same person(s)( e.g., height & weight)
- Also called correlation diagram
1. Bar Diagram:
- Graphically present frequencies of different categories
of qualitative data
- Vertical/ horizontal
- May be descending/ascending order
- Widths should be equal
- Spacing between bars should also be equal
i. Simple Bar Diagram:
- Each bar represents frequency of a single category with a
distinct gap from one another
ii. Multiple bar diagram:-
- Used to show comparison of two or more sets of related
statistical data
iii. Component/ proportional bar diagram:
- Used to compare sizes of different component parts
among themselves
- Also shows relation between each part & the whole
2. Pie/ sector Diagram:
- A circle whose area is divided into different segments by
different straight lines from centre to circumference
- Each segment express proportional components of the
attributes
- Angle (◦) of a sector is calculated by
Class frequency X 3.6 or
(Class frequency/total frequency)X 360
MEASURES OF CENTRAL TENDENCY
After the process of classification and tabulation the next
important objective of statistical analysis to determine various
numerical measures which measures inherent characteristics
of the data.
It can be achieved through the statistical techniques such
as measures of central tendency and dispersion.
The concentration of values around central value of a
distribution is known as Central tendency.
The different measures that are used to study the
characteristics of the distribution are known as Measures of
central tendency. The averages are one the most common
measures of central tendency, which condenses a set of
numerical data single value which is the representative of the
distribution
When a series of observations have been tabulated in the
form of frequency distribution
→→it is felt necessary to convert a series of observation in a
single value, that describes the characteristics of that
distribution,→ called Measure Of Central Tendency
All data or values are clustered round it
These values enable comparisons to be made between one
series of observations and another
Individual values may overlap, two distributions have
different central tendency
E.g., average incubation period of measles is 10 days and
that of chicken pox is 15 days.
Requisites for a good measure of central tendency:
It should be rigidly defined.
It should be based on all observations.
It should be suitable for further algebraic treatment.
It should not be affected too much by extreme observations or sampling
fluctuations.
The following are some common measures of central tendency.
Arithmetic Mean (AM) – Simple and Weighted
Geometric Mean (GM)
Harmonic Mean (HM)
Median (M)
Mode (Z)
Measures of Central tendency
Mean Mode
Median
Arithmetic Geometric Harmonic
Mean(AM) Mean(GM) Mean(HM)
Arithmetic Mean (AM)
Arithmetic Mean (AM) is also known as simple
average.
It is denoted by x .
AM can be calculated by dividing the sum (total) of
all observations /by number of observations.
i.e., Number of observations Sum of all observations .
N
Note: For calculating AM in case of continuous
frequency distribution the class intervals may be either
exclusive or inclusive type.
Arithmetic mean:
- Sum of all observations divided by number of
observations
- Mean(x)=Σx/n; x is a variable taking different
observational values & n= no. of observations
- DIRECT METHOD Mean(x)=Σx/n
- INDIRECT METHOD X=A+ΣD/N
- Exmp.
ESR of 7 subjects are 8,7,9,10,7,7, & 6 mm for 1st hr.
Calculate mean ESR.
- Mean(x)= (8+7+9+10+7+7+6)/7=54/7=7.7 mm
Applications If equal distances are covered at different speeds then the average speed
can be calculated using HM. If same amount of work is completed at different speeds
then the average rate of completion can be calculated using weighted HM.
Result:
For any set of positive values AM ≥ GM ≥ HM
For any two positive values, GM2 = AM X HM
Calculation of weighted arithmetic mean:
- Following methods are utilized in case of large no.
of observations
For Ungrouped Data:
- Suppose we have x₁, x₂, x₃,…nth observations with
corresponding frequencies f₁, f₂,f₃,…fn
𝑥1𝑓1+𝑥2𝑓2+𝑥3𝑓3+⋯+𝑥𝑛𝑓𝑛 σ 𝑓𝑥 𝑓𝑥
- Mean= = = σ
𝑓1+𝑓2+𝑓3+⋯+𝑓𝑛 σ𝑓 𝑛
For grouped Date:
- Data are arrange in groups & frequency
distribution table are prepared
- Mean value of each group is multiplied by
frequency
- Sum of product value is divided by total no of
observations
- Mean such obtained is called “ weighted mean”
𝑥1𝑓1+𝑥2𝑓2+𝑥3𝑓3+⋯+𝑥𝑛𝑓𝑛 σ 𝑓𝑥 𝑓𝑥
- Mean(x)= = = σ
𝑓1+𝑓2+𝑓3+⋯+𝑓𝑛 σ𝑓 𝑛
Geometric mean:
- Used when data contain a few extremely large or small
values
- It’s the nth root product of n observastions
GM=ⁿ√(x₁.x₂.x₃….xn)
Harmonic Mean:
- Reciprocal of the arithmetic mean of reciprocals of
observations
arithmetic mean of reciprocals of observations=S(⅟x)
- HM=n/S⅟x
- got limited use
- A.M>GM>HM