0% found this document useful (0 votes)
22 views61 pages

Fundamentals of Biostat (Notes)

Uploaded by

Sweta Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views61 pages

Fundamentals of Biostat (Notes)

Uploaded by

Sweta Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

FUNDAMENTALS OF

BIOSTATISTICS

S. a nikam
 statistics:
- It refers to the subject of scientific activity dealing with the
theories and methods of collection, compilation, analysis
and interpretation of data. OR it is collection prediction
and interpretational of numerical data .

 Bio-statistics:
 medical field of statics
- An art & science of collection, compilation, analysis and
interpretation of data.
“BIOSTATISICS ”
 Statistics arising out of biological sciences, particularly
from the fields of Medicine and public health.
 (2) The methods used in dealing with statistics in the
fields of medicine, biology and public health for
planning, conducting and analyzing data which arise in
investigations of these branches.
Reasons to know about biostatistics:
 Medicine is becoming increasingly quantitative.

 The planning, conduct and interpretation of much of


medical research are becoming increasingly reliant on
the statistical methodology.
 Statistics pervades the medical literature.
 Data(sing. Datum):

- A set of observations, usually obtained by measurement


or counting

 TYPES OF DATA

 QUALITATIVE DATA

 DISCRETE QUANTITATIVE

 CONTINOUS QUANTITATIVE
 Classification of data-
 Qualitative/Attribute
 Quantitative/Variable: Continuous & Discreet

 Qualitative Data:
- Can not be expressed in number
- Not measurable
- Can only be categorized under different categories &
frequencies
- E.g., Religion is an attribute; can be categorized into Hindu,
Muslim, Christian
- Human Blood Group: A,B,AB or O
- Sex: M/F
 Quantitative Data/variable:
- In statistical language, any character,
characteristic or quality that varies is called
variable
- It has got magnitude

 Continuous variable:

- It is expressed in numbers & can be measured

- Can take up infinite no. of values in a certain


range
- E.g., weight, height, blood sugar
 Discreet variable:
- Countable only

- Takes only some isolated values

- E.g., numbers of a family members, no. of workers in


a factory, no. of persons suffering from a particular
disease
According to source-
 Primary Data

 Secondary Data
 Primary Data:
- Collected directly from the field of enquiry
- original in nature
- E.g., measurement of BP, weight, height, blood sugar
 Secondary Data:
- Collected previously by some other agency/organization
- Used afterwards by another
- E.g., hospital records, census data
Scale of measurement
 Qualitative variable:
 A categorical variable

 Nominal (classificatory) scale


 - gender, marital status, race

 Ordinal (ranking) scale


 - severity scale, good/better/best
 Quantitative variable:
 A numerical variable: discrete; continuous

 Interval scale :
 Data is placed in meaningful intervals and order. The unit of measurement are
arbitrary.

 - Temperature (37º C -- 36º C; 38º C-- 37º C are equal) and


 No implication of ratio (30º C is not twice as hot as 15º C)

Ratio scale:
Data is presented in frequency distribution in logical order. A meaningful ratio exists.

- Age, weight, height, pulse rate


- pulse rate of 120 is twice as fast as 60
- person with weight of 80kg is twice as heavy as the one with weight of 40 kg.
 Nominal scales
 Ordinal Scales
 Interval Scales
 Ratio
 Nominal Scales:
- Used when data are classified by major categories or
subgroups of population
- Religion can be assigned to following categories- Muslim,
Hindu, Christian
- Outcome of treatment: cured or not cured; died or survived
 Ordinal Scales:
- Assign rank order to categories placed in an order
- E.g., students rank in a class; Grades A,B,C,D;
- Literacy status: illiterate, just literate, primary, secondary,
higher secondary, graduate, post graduate
- Disease condition: mild, moderate, severe
 Interval Scale:
- Distance between two measurement is defined, not their ratio
- E.g., intelligence score in IQ tests, temperature in Centigrade
 Ratio Scale:
- Both the distance & ratio between two measurements are
defined
- E.g., length, weight, incidence of disease, no. of children in a
family
 Dichotomy/ Binary Scale:
- A scale with only two categories
- E.g., disease→ present/absent; sex→male /female

 Population:

- An aggregate of objects, animate or inanimate, under study


- A group of units defined according to aims & objective of
the study
 Sample:
- a finite subset of or part of population
- Every member of population should have equal chance to be
included in sample
 Parameter:
- constant, describes the characteristics of population
 Statistic:
- Function of observation, which describes a sample

Statistic Parameter
Mean x (x bar) µ(Mu)
Standard Deviation s s (sigma)

No. of Subject n N
Proportion P P
 Main sources for collection of medical statistics are:
1. Experiments:
- Performed in the laboratories of physiology, biochemistry, pharmacology,, clinical
pathology
- Hospital words→ for investigations & fundamental research
- Used in preparation of thesis/dissertation, scientific paper for publication in
scientific journals & books
2. Surveys:
- Carried out for epidemiological studies in the field by trained teams to find out
incidence or prevalence of health or disease situations in a community
- Used in OR→ assessment of existing condition, how to follow a program, to study
merits of different methods adopted to control of a disease
- Provide trends in health status, morbidity, mortality, nutritional status, health
practices, environmental hazards
- Provide feedback needed to modify policy
- Provide timely earning of public health hazards
3. Records:
- Maintained as a routine in registers or books over a long
period of time
- Used for keeping vital statistics: births, deaths, marriage,
hospitalization following illness,
- Used in demography & public health practices
- Collected data are qualitative
 DATA
INFORMATION

 Statistical data is presented usually in tabular forms through


different types of tables and in pictorial forms; diagrams,
charts
 Method of presentation:

A. Tabulation
B. Drawing
OR

By forming frequency distribution, we can summarize the data effectively. • It is a


method of presenting the data in a summarized form.
Class interval=
max mark-minimum mark/2
 Tabular presentation:
- A form of presenting data from a mass of statistical data
- at first frequency distribution table is prepared
- Table can be simple or complex
 Frequency distribution table or frequency table:
- All frequencies considered together form “frequency
distribution”
- No of person in each group is called the frequency of that
group
- Frequency distribution table of most biological variables
develop normal, binomial or Poisson distribution.
- Data needs consolidation by way of tabulation to
express some meaning
- Tabulation → a process of summarizing raw data &
displaying it in a compact form for further analysis
- Orderly management of data in columns & rows
 Presentation of quantitative data is more cumbersome as
- Characteristic has a measured magnitude as well as

frequency
Height of groups in Cm Markings Frequency of each group
- Table x: presentation of quantitative data of height in
160-162 //// //// 10
markings
162-164 //// //// //// 15
164-166 //// //// //// // 17
166-168 //// //// //// //// 19
168-170 //// //// //// //// 20
170-172 //// //// //// //// //// / 26
172-174 //// //// //// //// //// 29
////
174-176 //// //// //// //// //// 30
////
176-178 //// //// //// //// // 22
178-180 //// //// // 12
Total 200
Or Discret f.d
OR CONTINEOUS F.D
1) Class interval =
max value-min
value/2

2) Class
boundaries'=lower
limit of 2nd class-upper
limit of 1st class/2

3) Mid point =sum of


class /2
Cumulative Frequency Distribution:

It is a form of frequency distribution that represents the sum of


a class and all classes below it. Remember that frequency
distribution is an overview of all distinct values (or classes of
values) and their respective number of occurrences.
CUMULATIVE RELATIVE FREQUENCY DISTRIBUTION
The cumulative relative frequency distribution of a quantitative
variable is a summary of frequency proportion below a given level.

The relationship between cumulative frequency and relative


cumulative frequency is:

Example: Jane is fond of playing games with dice. She throws the
dice and notes the observations each time. These are her observations:
4, 6, 1, 2, 2, 5, 6, 6, 5, 4, 2, 3 To know the exact number of times she
got each digit (1, 2, 3, 4, 5, 6) as the outcome, she classifies them into
categories. An easy way is to use tally marks. Note that a diagonal line
across 4 vertical lines counts as 5
The table is

known as a frequency distribution table. We can observe that all


the data that was gathered has been organized under two columns.
Thus, a frequency distribution table is a chart summarizing the
values and their frequencies. In other words, it is a tool to organize
data. This makes it easy for us to understand the given set of
information. Thus, frequency distribution in statistics helps us
condense data in a simpler form so that it is easy for us to observe
its features at a glance.
Example 2: The height

of 50 paracetamol tablets are between 480-520 mg.

The frequency distribution table shows measurement categories


and the number of observations in each category.

The range divided into intervals called “class interval”

The width of the class is determined by dividing the range of


observations by the number of classes.

Frequency distribution table gives the cumulative and relative


frequency that helps to interpret the data more easily.
 General Principle in designing Table:
- Table should be numbered a
- Brief & self-explanatory title should be there mentioning
time, place, person
- Headings of columns & rows should be clear & concise
- Data to be presented according to size of importance
chronologically, alphabetically, geographically
- Data must be presented meaningfully
- Table should not be too large
- Foot notes given, if necessary
- Total no of observations ; the denominator should be
written
- Information obtained should be summarized in the table
 Frequency distribution drawings:
- After class wise or group wise tabulation, the
frequencies of a characteristics can be presented by
two kinds of drawings
- Graphs & Diagrams

- May be shown by either lines, dots, figures

o Presentation of quantitative data is through graphs

o Presentation of qualitative, discreet, counted data is


through diagrams
1. Histogram
- Graphical presentation of frequency distribution
- Variable characters of different groups are indicated in
the horizontal line (x-axis) is called abscissa
- No. of observations marked on the vertical line (y-axis) is
called ordinate
- Frequency of each group forms a triangle
2. Frequency Polygon:
- An area diagram of frequency distribution developed over
a histogram
- Mid points of the class intervals at the height of frequency

are joined by straight lines


- It gives a polygon, figure with many angles
3. Frequency Curve:
- If no. of observation are very large & group interval
reduced
- Frequency polygon tends to loose its angulation
- Gives rise to a smooth curve → frequency curve
4. Line Chart or Graph:
- A frequency polygon presenting variation by line

- Shows trend of event occurring over a period of time

- Shows rise, fall or periodic fluctuations vertical axis may not

start from zero, but some point above frequency


5. Cumulative Frequency Diagram or “O give”
- Graph of the cumulative frequency distribution

- An ordinary frequency distribution table→ relative


frequency table
- Cumulative frequency: total no. of persons in each particular

range from lowest value of the characteristic up to &


including any higher group value
6. Scatter or Dot Diagram:
- Prepared after tabulation in which frequencies of at least
two variables have been cross classified
- Shows nature of correlation between two variable character

in same person(s)( e.g., height & weight)


- Also called correlation diagram
1. Bar Diagram:
- Graphically present frequencies of different categories
of qualitative data
- Vertical/ horizontal
- May be descending/ascending order
- Widths should be equal
- Spacing between bars should also be equal
i. Simple Bar Diagram:
- Each bar represents frequency of a single category with a
distinct gap from one another
ii. Multiple bar diagram:-
- Used to show comparison of two or more sets of related

statistical data

iii. Component/ proportional bar diagram:


- Used to compare sizes of different component parts

among themselves
- Also shows relation between each part & the whole
2. Pie/ sector Diagram:
- A circle whose area is divided into different segments by
different straight lines from centre to circumference
- Each segment express proportional components of the

attributes
- Angle (◦) of a sector is calculated by

 Class frequency X 3.6 or

 (Class frequency/total frequency)X 360


 MEASURES OF CENTRAL TENDENCY
 After the process of classification and tabulation the next
important objective of statistical analysis to determine various
numerical measures which measures inherent characteristics
of the data.
 It can be achieved through the statistical techniques such
as measures of central tendency and dispersion.
 The concentration of values around central value of a
distribution is known as Central tendency.
 The different measures that are used to study the
characteristics of the distribution are known as Measures of
central tendency. The averages are one the most common
measures of central tendency, which condenses a set of
numerical data single value which is the representative of the
distribution
 When a series of observations have been tabulated in the
form of frequency distribution
→→it is felt necessary to convert a series of observation in a
single value, that describes the characteristics of that
distribution,→ called Measure Of Central Tendency
 All data or values are clustered round it

 These values enable comparisons to be made between one

series of observations and another


 Individual values may overlap, two distributions have
different central tendency
 E.g., average incubation period of measles is 10 days and

that of chicken pox is 15 days.


 Requisites for a good measure of central tendency:
 It should be rigidly defined.
 It should be based on all observations.
 It should be suitable for further algebraic treatment.
 It should not be affected too much by extreme observations or sampling
fluctuations.
 The following are some common measures of central tendency.
 Arithmetic Mean (AM) – Simple and Weighted

 Geometric Mean (GM)

 Harmonic Mean (HM)


 Median (M)
 Mode (Z)
Measures of Central tendency

Mean Mode
Median

Arithmetic Geometric Harmonic


Mean(AM) Mean(GM) Mean(HM)
 Arithmetic Mean (AM)
Arithmetic Mean (AM) is also known as simple
average.
It is denoted by x .
AM can be calculated by dividing the sum (total) of
all observations /by number of observations.
i.e., Number of observations Sum of all observations .
N
 Note: For calculating AM in case of continuous
frequency distribution the class intervals may be either
exclusive or inclusive type.
 Arithmetic mean:
- Sum of all observations divided by number of
observations
- Mean(x)=Σx/n; x is a variable taking different
observational values & n= no. of observations
- DIRECT METHOD Mean(x)=Σx/n
- INDIRECT METHOD X=A+ΣD/N

- Exmp.
 ESR of 7 subjects are 8,7,9,10,7,7, & 6 mm for 1st hr.

Calculate mean ESR.


- Mean(x)= (8+7+9+10+7+7+6)/7=54/7=7.7 mm
Applications If equal distances are covered at different speeds then the average speed
can be calculated using HM. If same amount of work is completed at different speeds
then the average rate of completion can be calculated using weighted HM.
 Result:
 For any set of positive values AM ≥ GM ≥ HM
 For any two positive values, GM2 = AM X HM
 Calculation of weighted arithmetic mean:
- Following methods are utilized in case of large no.
of observations
 For Ungrouped Data:
- Suppose we have x₁, x₂, x₃,…nth observations with
corresponding frequencies f₁, f₂,f₃,…fn
𝑥1𝑓1+𝑥2𝑓2+𝑥3𝑓3+⋯+𝑥𝑛𝑓𝑛 σ 𝑓𝑥 𝑓𝑥
- Mean= = = σ
𝑓1+𝑓2+𝑓3+⋯+𝑓𝑛 σ𝑓 𝑛
 For grouped Date:
- Data are arrange in groups & frequency
distribution table are prepared
- Mean value of each group is multiplied by
frequency
- Sum of product value is divided by total no of
observations
- Mean such obtained is called “ weighted mean”
𝑥1𝑓1+𝑥2𝑓2+𝑥3𝑓3+⋯+𝑥𝑛𝑓𝑛 σ 𝑓𝑥 𝑓𝑥
- Mean(x)= = = σ
𝑓1+𝑓2+𝑓3+⋯+𝑓𝑛 σ𝑓 𝑛
 Geometric mean:
- Used when data contain a few extremely large or small
values
- It’s the nth root product of n observastions
 GM=ⁿ√(x₁.x₂.x₃….xn)
 Harmonic Mean:
- Reciprocal of the arithmetic mean of reciprocals of
observations
 arithmetic mean of reciprocals of observations=S(⅟x)
- HM=n/S⅟x
- got limited use
- A.M>GM>HM

You might also like