0% found this document useful (0 votes)
7 views76 pages

2dk9spxsgkmbj3llxgrw Signature Poli 170418194101

Biostatistics is a branch of statistics focused on biological data, involving concepts such as constants, variables, parameters, and statistics. It plays a crucial role in public health by evaluating data through various methodologies, including sampling and data presentation techniques. The document outlines the history, importance, and applications of biostatistics, as well as methods for data collection and analysis.

Uploaded by

Muzaffar Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views76 pages

2dk9spxsgkmbj3llxgrw Signature Poli 170418194101

Biostatistics is a branch of statistics focused on biological data, involving concepts such as constants, variables, parameters, and statistics. It plays a crucial role in public health by evaluating data through various methodologies, including sampling and data presentation techniques. The document outlines the history, importance, and applications of biostatistics, as well as methods for data collection and analysis.

Uploaded by

Muzaffar Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 76

BIOSTATICS

PRESENTED BY :
PRABLEEN ARORA
MDS STUDENT
STATISTICS- is a science of compiling, classifying, and
tabulating numerical data and expressing the results
in a mathematical and graphical form.

BIOSTATISTICS- is that branch of statistics concerned


with the mathematical facts and data related to
biological events.
• Constant
– Quantities that do not vary e.g. in biostatistics,
mean, standard deviation are considered constant
for a population

• Variable
– Characteristics which takes different values for
different person, place or thing such as height,
weight, blood pressure
• Parameter
– It is a constant that describes a population e.g. in a
college there are 40% girls. This describes the
population, hence it is a parameter.

• Statistic
– Statistic is a constant that describes the sample e.g. out
of 200 students of the same college 45% girls. This 45%
will be statistic as it describes the sample

• Attribute
• A characteristic based on which the population can
be described into categories or class e.g. gender,
HISTORY
• The science of statistics is said to have originated
from two main sources:
1 . Government records
2. Mathematics

• It developed from registration of heads of families


in ancient Egypt to the Roman census on military
strength , birth and deaths etc and found its
application gradually in the field of health and
medicine.
• John Graunt who is neither a physician nor a
mathematician is the FATHER OF HEALTH
STATISTICS.
WHAT IS STATISTICS ??
• The following essential features of statistics are evident from
various definitions of statistics:

a) principles and methods for the collection of presentation,


analysis and interpretation of numerical data of different
kinds. 1.
Observational data, qualitative data. 2.
Data that has been obtained by a repetitive operation.
3. Data affected to a
marked degree of a multiplicity of
causes.

b) The science and art of dealing with variation in such a way as


c) Controlled objective methods whereby group
trends are abstracted from observations on
many separate individuals.

d) The science of experimentation which may be


regarded as mathematics applied to
observational data.
WHY STATISTICS ??
• Variabilty in measurement can be handled using statistics. Eg:
investigator makes observations according to his judgement of the
situation.
(Depending upon his skills, knowledge, experience.)

• Epidemiology and Biostatistics are sister sciences or disciplines.

• Epidemiology collects facts relating to group of population in places,


times and situation.

• Biostatistics converts all the facts into figures and at the end
translates them into facts, interpreting the significance of their
results.
• Epidemiology and biostatistics both deal with the
facts-figures-facts

QUANITATIVE METHADOLOGY
USES OF BIOSTATISTICS
1. To test whether the difference between two populations is real or by
chance occurrence.

2. To study the correlation between attributes in the same population.

3. To evaluate the efficacy of vaccines.

4. To measure mortality and morbidity.

5. To evaluate the achievements of public health programs

6. To fix priorities in public health programs

7. To help promote health legislation and create administrative standards


for oral health.
COLLECTION OF DATA
• The collective recording of observations either
numerical or otherwise is called data.

• Demographic data comprises details of population


size, disrtibution, geographic distribution , ethnic
group , socio-economic factors and their trends
over time.
• It is obtained from census and other public service
reports.
• Depending upon the nature of the variable, data is
classified into:

1. Qualitative data- attributes or qualities.


a) discrete
b) continuous
2. Quantitative data- through measurements using
calipers.
Sources of statistical data

EXPERIMENTS SURVEYS RECORDS


Performed to collect Carried out for Epidemiological Records are maintained
data for investigations studies in the field by trained as a routine in registers
and research by one teams to find incidence or and books over a long
or more workers. prevalence of health or period of time provide
disease in a community. readymade data.

Data can be collected

PRIMARY SECONDARY
Data obtained by the investigator himself. Data has already recorded.
Eg: hospital records
Primary data can be obtained using any
one of the following methods:

Direct personal Oral health Questionnaire


interviews examination method

•Face-to-face contact with • When information is • List of Questions


the person. needed on health pertaining to the
•Subjective phenomena.
status. survey “questionnaire”
is prepared.
•Accurate and any • Cannot be used in
ambiguity can be clarified. extensive studies. • Various informants are
•Cannot be used in
requested to supply
extensive studies. • Includes treatment the information.
Sampling and sample design
• Population:- group of all individuals who are the focus of
the investigation is known as population.

• Cencus enumeration:- if the information is obtained from


each and every individual in the population.

• Sample means the group of individuals who actually


available for investigation.

• Sampling units: the individual entities that form the focus


of the study.
Sample selection

Purposive selection Random selection

•Representing the population as •Sample of units is selected


a whole. in such a way that all the
characteristics of the
•Great temptation to population is reflected in
deliberately or purposively the sample.
select the individual who seen
to represent the population •Random indicates the
under study. chance of the population
unit being selected in the
•Easy to carry out. sampe.

•Does not need the preperation


of sampling frame.
Sampling Design
BASED UPON TYPE AND NATURE OF THE POPULATION AND
THE OBJECTIVES OF THE INVESTIGATION.

1. Sample random sampling


2. Systematic random sampling
3. Stratified random sampling
4. Clusture sampling
5. Multiphase sampling pathfinder survey
Sample random sampling
• Each and every unit in the population has an equal
chance of being included in the sample.
• Selection of unit is by chance only.

Two methods

Lottery methods Table of random numbers

•Population units are •Random arrangement of


numbered on digits from 0-9 in rows and
separate slip. columns.

•Shuffled and •Selection is done either in a


blindfold selection. horizontal or vertical
direction
Systematic random sampling
• Select one unit at random and then selecting additional
units at evenly spaced interval till the sample of required
size has been drawn.

Stratified random selection


• Population to be sampled is subdivided into groups
(age/sex/genetic) known as Strata. ( i.e each group is
homogenous in characteristics.)

• Then a simple randon selection is done from each stratum.

• More representative, provide greater accuracy and


concentrate on wider geographical area.
Cluster sampling

• The population forms natural groups or clusters


such as village, wards blocks or children of a school.

• Sample of the clusters is selected and then all the


units in each of the selected cluster is surveyed.

• Simpler, less time and cost.

• High standard of errors.


Multiphase sampling
• Part of information is collected from the whole sample and
part from the sub sample.
• First phase: All the children in school are surveyed.

• Second phase: Only the ones with oral health problems.

• Third phase: section that needs treatment are selected.

• Sub-samples further becomes smaller and smaller.

• Adapted when the interest is in any specific disease.


Multistage sampling
• First stage is to select the groups or clusters.

• Then subsamples are taken in as many subsequent


stages as necessary to obtain the desired sample.
Errors in sampling

Sampling errors Non-Sampling errors

•Faulty sample design •Coverage errors- due to non-


response or non cooperation
•Small sample sie
of the informant.

•Observational errors: interview


bias, imperfect experimental
technique.

•Processing errors: statistical


Data presentation
Two main types of data presentation are:
• Tabulation
• Graphic representation - charts and diagrams
Tabulation
– Tables are simple device used for the presentation of statistical
data.
PRINCIPLES:
– Tables should be as simple as possible.(2-3 small tables).
– Data should be presented according to size or importance,
chronologically or alphabetically.
– Should be self explanatory.
– Each row and column should be labelled concisely and clearly.
– Specific unit of measure for the data should be given.

– Title should be clear, concise and to the point.

– Total should be shown.

– Every table should contain a title as to what is depiceted in the


table.

– In small table, vertical lines seperating the column may not be


necessary.

– If the data are not orignal, their source should be given in a


footnote.
TYPES OF TABLES
MASTER TABLE SIMPLE TABLE FREQUENCY DISTRIBUTION
TABLE

Contains all the One way tables which


data obtained supply the answer to Two column frequent table.
from a survey questions about one
characteristic of data First column list the classes
only. into which the data are
grouped.

Second column lists the


frequency for each
classification
Charts and diagrams
• Most convincing and appealing ways of depicting statistical
results.
Principles
1. Every diagram must be given a title that is self explanatory.
2. Simple and consistent with the data.
3. The values of the variable are presented on the horizontal or X-
axis and frequency on the vertical line Y-axis.
4. Number of lines drawn in any graph should not be many.
5. Scale of presentation for X-axis and Y- axis should be
mentioned.
6. The scale of division of both the axes should be proportional
and the divisions should be marked along the details of the
variable and frequencies presented on the axes.
Bar chart

• Represents qualitative data.


• Bars can be either vertical or horizontal.
• Suitable scale is chosen
• Bars are usually equally spaced
• They are of three types:
• simple bar chart- represents only one variable.
• multiple bar chart- each category of a variable
there are set of bars.
• component /proportional bar chart- individual bar
is divided into 2 or more parts
Pie chart

• Entire graph looks like a pie.


• It is divided into different sectors corresponding to
the frequencies.
Line diagram

Useful to study changes of values in the variable over time and is the
simplest type of diagram.

Time such as hours, days , weeks , months or years


Histogram

• Pictorial presentation of frequency distribution


• No space between the cells on a histogram.
• class interval given on vertical axis
• area of rectangle is proportional to the frequency
Frequency polygon

• Obtained by joining midpoints of histogram blocks


at the height of frequency by straight lines usually
forming a polygon.
Frequency curve

• when number of observations is very large and class


interval is reduced the frequency polygon losses its
angulations becoming a smooth curve known as
frequency curve
Pictogram
• Popular method of presenting data to the
common man through small pictures or
symbols.

Spot map/shaded map/Cartogram


• These maps are prepared to show geographic
distribution of frequencies of characteristics
Measures of statistical averages or central tendency

• central value around which all the other


observations are distributed.
• Main objective is to condense the entire mass of dat
and to facilitate the comparison.
• the most common measures of central tendency
that are used in sental sciences:
– mean
– median
– mode
Mean

• Refers to arithmetic mean


• It is obtained by adding the individual observations
divided by the total number of observations.
• Advantages – it is easy to calculate.
most useful of all the averages.
• Disadvantages – influenced by abnormal values.
Median

• When all the observation are arranged either in


ascending order or descending order, the middle
observation is known as median.

• In case of even number the average of the two


middle values is taken.

• Median is better indicator of central value as it is


not affected by the extreme values.
Mode
• Most frequently occurring observation in a data is called mode
• Not often used in medical statistics.

• EXAMPLE
• Number of decayed teeth in 10 children
• 2,2,4,1,3,0,10,2,3,8

• Mean = 34 / 10 = 3.4

• Median = (0,1,2,2,2,3,3,4,8,10) = 2+3 /2
• = 2.5
Types of variability

• There are three types of variability


– Biological variability
– Real variability
– Experimental variability
Biological variability

• It is the natural difference which occurs in


individuals due to age, gender and other
attributes which are inherent
• This difference is small and occurs by chance
and is within certain accepted biological limits
• e.g. vertical dimension may vary from patient
to patient
Real Variability

• Such variability is more than the normal


biological limits
• the cause of difference is not inherent or
natural and is due to some external factors
• e.g. difference in incidence of cancer among
smokers and non smokers may be due to
excessive smoking and not due to chance only
Experimental Variability
• It occurs due to the experimental study
• they are of three types
– Observer error
• the investigator may alter some information or not record the
measurement correctly
– Instrumental error
• this is due to defects in the measuring instrument
• both the observer and the instrument error are called non sampling
error
– Sampling error or errors of bias
• this is the error which occurs when the samples are not chosen at
random from population.
• Thus the sample does not truly represent the population.
MEASURES OF DISPERSION
• Dispersion is the degree of spread or variation of
the variable about a central value.
• Helps to know how widely the observations are
spread on either side of the average.

• Most common measures of dispersion are:


1. RANGE
2. MEAN DEVIATION
3. STANDARD DEVIATION
STANDARD
RANGE MEAN DEVIATION DEVIATION

•Defined as the •It is the average of the •Most important and


deviation from the widely used measure of
difference between arithematic mean. studying dispersion.
the value of the
largest item and the •M.D= Ʃ(X-Xi) •Greater the S.D , greater
n will be the magnitude of
smallest item. dispersion from the mean.
•Ʃ-sum of •Smaller S.D means a
•Gives no information •X- arithematic mean higher degree of
about the values that •Xi- value of each uniformity of the
lie between the observation in the data observations.
extreme values. •n- number of
observation in the data • S.D= Ʃ(X-Xi)²
n
Coefficient of variation

• It is used to compare attributes having two


different units of measurement e.g. height and
weight
• Denoted by CV
• CV = SD X 100 / Mean
• and is expressed as percentage
Normal distribution/normal curve/ Gaussian distribution

• When the data is collected from a very large number of


people and a frequency distribution is made with
narrow class intervals, the resulting curve is smooth and
symmetrical- NARROW CURVE.

• These limits on either side of measurement are called


confidence limits .
STANDARD NORMAL DEVIATION
• There may be many normal curves but only one standard
normal curve.
Characteristics
• Bell shaped
• Perfectly symmetrical
• Frequency increases from one side reaches its highest and
decreases exactly the way it had increased .
• Total area of the curve is one, its mean is zero and standard
deviation is one.
• The highest point denotes mean, median and mode which
coincide.
Z-TEST
• Used to test the significance of difference in means
for large samples.
Criteria:
1. Sample must be randomly selected.
2. Data must be quantitative.
3. The variable is assumed to follow a normal
distribution in the population.
4. Samples should be larger than 30.
Tests of significance

• When different samples are drawn from the same


population, the estimates might differ - sampling
variability.

• It deals with technique to know how far the difference


between the estimates of different samples is due to
sampling variation.
a) Standard error of mean
b) Standard error of proportion
c) Standard error of difference between two means
d) Standard error of difference between two proportion.
1. Standard error of mean: Gives the standard
deviation of the means of several samples from
the same population.
Example : Let us suppose, we obtained a random
sample of 25 males, age 20-24 years whose mean
temperature was 98.14 deg. F with a standard
deviation of 0.6. What can we say of the true mean
of the universe from which the sample was drawn?
Standard Error of Proportion
•Standard error of proportion may be defined as a unit that
measures variation which occurs by chance in the proportions of a
character from sample to sample or from sample to population or
vice versa in a qualitative data.
Standard Error of Difference Between two Means

•The standard error of difference between the two means is 7 .5.


•The actual difference between the two means is (370 - 318) 52, which is more than
twice the standard error of difference between the two means, and therefore
"significant".
Standard Error of Difference Between Proportions

•The standard error of difference is 6 whereas the observed difference (24.4 - 16.2)
was 8.2.
• In other words the observed difference between the two groups is less than twice
the S.E. of difference, i.e., 2 x 6.
• There was no strong evidence of any difference between the efficacy of the two
vaccines. Therefore, the observed difference might be easily due to chance.
• A null hypothesis or hypothesis of no difference
(H0) asserts that there is no real difference in
sample and the population in particular matter
under consideration and the difference found is
accidental and arised out of sampling variations.

• The alternative hypothesis of significant


difference (H1) stated that there is a difference
between the two groups compared.
• A test of significance such as Z-test is performed to
accept the null hypothesis H0 or to reject it and
accept the alternative hypothesis H1.
• To make minimum error in rejection or acceptance
of H0, we divide the sampling distribution or the
area under the normalcurve into two regions or
zone.
i. A zone of acceptance
ii. A zone of rejection.
• The distance from the mean at which H0 is rejected
is called the level of significance.

• It falls in the zone of rejection for H0, shaded areas


under the curves and it is denoted by letter P which,
indicates the probability or relative frequency of
occurrence of the difference by chance.
• Greater the Z value, lesser will be the P.
i. Zone of acceptance: If the result of a sample falls in the plain area, i.e. within the
mean ± 1.96 SE the null hypothesis is accepted, hence this area is called the zone of
acceptance for
null hypothesis.

ii. Zone of rejection: If the result of a sample falls in the shaded area, i.e. beyond mean
± 1.96 SE it is significantly different from the universe value. Hence, the H0 of no
difference is rejected and the alternate H1 is accepted. This shaded area, therefore, is
called the zone of rejection for null hypothesis.
• Degree of freedom:
Defined as the number of independent members in
the sample.

EXAMPLE:-
X+Y+Z/3=5
Out of 3 values, we can choose only 2 of them
freely, but the choice of the third depends upon
the fact that the total of the three values should be
15.
SIGNIFICANCE OF DIFFERENCE BETWEEN MEANS OF
SMALL SAMPLES BY STUDENT’S t-TEST
• Small samples or their Z values do not follow normal
distribution as the large ones do.

• So, the Z value based on normal distribution will not give


the correct level of significance or probability of a small
sample value occurring by chance.

• In case of small samples, t-test is applied instead of Z-test.

• It was designed by W.S.Gossett whose pen name was


Student. Hence, this test is also called Student’s t-test.
• There are two types of student t Test
Unpaired t test
Paired t test

Criteria for applying t-test


• 1. Random samples
• 2. Quantitative data
• 3. Variable normally distributed
• 4. Sample size less than 30.
Unpaired t test

• This test is applied to unpaired data of independent


observations made on individuals of two different
or separate groups or samples drawn from two
populations, to test if the difference between the
two means is real or it can be attributed to sampling
variability .
• EXAMPLE: between means of the control and
experimental groups.
Paired t test

• It is applied to paired data of dependent


observation from one sample only when each
individual given a pair of observations.

• The individual gives a pair of observation i.e.


observation before and after taking a drug
The CHI SQUARE TEST FOR QUALITATIVE DATA (X² TEST)

• Developed by Karl Pearson.

• Chi-square (x²) Test offers an alternate method of testing


the significance of difference between two proportions. It
has the advantage that it can also be used when more than
two groups are to be compared.

• It is most commonly used when data are in frequencies


such as in the number of responses in two or more
categories.
• Important applications in medical statistics as test
of:
• 1. Proportion
• 2. Association
• 3. Goodness of fit.

• Test of Proportions
• As an alternate test to find the significance of
difference in two or more than two proportions.
• Test of Association
• The test of association between two events in
binomial or multinomial samples is the most
important application of the test in statistical
methods. It measures the probability of association
between two discrete attributes.
• Two events can often be studied for their
association such as smoking and cancer, treatment
and outcome of a disease, vaccination and
immunity, nutrition and intelligence, etc.
• Test of Goodness of Fit
• Chi-square (χ2) test is also applied as a test of
“goodness of fit”, to determine if actual
numbers are similar to the expected or
theoretical numbers—goodness of fit to a
theory.
Analysis of Variance (ANOVA) Test

• Not confined to comparing two sample means, but


more than two samples drawn from corresponding
normal populations.

• Eg. In experimental situations where several different


treatments (various therapeutic approaches to a
specific problem or various drug levels of a particular
drug) are under comparison.

• It is the best way to test the equality of three or more


• Requirements
– Data for each group are assumed to be independent and
normally distributed
– Sampling should be at random

• One way ANOVA


– Where only one factor will effect the result between 2
groups

• Two way ANOVA


– Where we have 2 factors that affect the result or outcome

• Multi way ANOVA


– Three or more factors affect the result or outcomes between
CORRELATION AND REGRESSION
• Correlation: When dealing with measurement on 2
sets of variable in a same person, one variable may
be related to the other in same way. (i.e change in
one variable may result in change in the value of
other variable.)
• Correlation is the relationship between two sets of
variable.
• Correlation coefficient is the magnitude or degree
of relationship between 2 variables. (varies from -1
to +1).
• Obtained by plotting scatter diagram (i.e one variable
on x-axis and other on y-axis).

• Perfect Positive Correlation


• In this, the two variables denoted by letter X and Y are
directly proportional and fully correlated with each
other.
• The correlation coefficent (r) = + 1, i.e. both variables
rise or fall in the same proportion.

• Perfect Negative Correlation


• Values are inversely proportional to each other, i.e.
when one rises, the other falls in the same proportion,
TYPES OF CORRELATION
Regression
• To know in an individual case the value of one variable,
knowing the value of the other, we calculate what is known
as the regression coefficient of one measurement to the
other.
• It is customary to denote the independent variate by x and
the dependent variate by y.

• The value of b is called the regression coefficient of y upon


x. Similarly, we can obtain the regression of x upon y.
REFERENCES
• Essentials Of Preventive Community Dentistry –
Dr.Soben Peter. Third Edition
• Essentials Of Preventive Community Dentistry –
Dr.Soben Peter. Fourth Edition
• Mahajan's Methods in Biostatistics for Medical
Students and Research Workers. 8th edition.
• Parks textbook of preventive and social medicine.
18th edition.
THANK YOU

You might also like