2. Biostatistics and Research Methodology
2. Biostatistics and Research Methodology
RESEARCH
METHODOLOGY-PART 1
Presented by-
Rochana Mahesh
1st MDS
BIOSTATISTICS
Contents:
• Introduction • Types of variability
• Definition • Measures of dispersion
• Normal distribution
• History
• Tests of significance
• Applications
• Analysis and interpretation
• Sources and collection of data
• Correlation and Regression
• Data sampling
• Conclusion
• Presentation of data
• Measures of statistical average
Introduction:
• Any science needs precision for it’s development.
• E.g. If you have to measure the cause effect relationship, we need statistics.
• Hence, training in statistics has been termed “indispensable” for the students
of medical science.
Definition:
Statistics: It is the science of compiling, classifying and tabulating
numerical data and expressing the results in a mathematical and
graphical form.
For e.g., in biostatistics- mean, standard deviation are considered constant for a
population.
Variable: Characteristics which takes different values for different persons, places or
things, such as height, weight, blood pressure.
For e.g., in an institution there are 25% men. This describes the population, hence it is
a parameter.
• Statistics is :
• Pharmacology
• Community medicine
• Community dentistry
• Public health
Surveys: Records:
Experiments:
Carried out for epidemiological
Performed to collect data for Records are maintained as a
studies in the field by trained
routine in registers and books
investigations and research by teams to find incidence or
over a long period of time,
one or more workers. prevalence of health or disease in
a community providing readymade data.
Collection of Data
Secondary Source:
Primary Source:
Already recorded data.
Data obtained by the
investigator himself Eg. Hospital records,
records from OPD
Primary data can
be obtained by:
Direct personal
Questionnaire
interviews: Oral health method:
-Face to face examination:
contact with the -List of questions
-When information pertaining to the
person. is needed on health survey
-Subjective status “questionnaire” is
phenomena prepared.
-Cannot be used in
-Accurate and any extensive studies
ambiguity can be
-Various
clarified -Includes treatment informants are
requested to supply
-Cannot be used in the information.
extensive studies
Sampling and Sample design:
• Population: Group of all individuals who are the focus of the
investigation is known as a population.
• Sample units: The elements of the sample are known as sample points
or sample units.
Sampling:
• Sampling can be explained as a specific principle used to select members of population
to be included in the study. It has been rightly noted that “because many populations of
interest are too large to work with directly, techniques of statistical sampling have been
devised to obtain samples taken from larger populations.”
• In non-probability sampling, on the other hand, sampling group members are selected on
non-random manner, therefore not each population member has a chance to participate in
the study. Non-probability sampling methods include purposive, quota, convenience and
snowball sampling methods.
• PROBABILITY SAMPLING
SIMPLE RANDOM SAMPLING : Sample group members are selected in a random manner
Randomness is ensured by-
a) Lottery method b) Table of random numbers
• Tabulation
• Graphic representation
Tabulation:
• Tables are simple devices used for the presentation of statistical data.
• Principles:
• 3 types:
Simple table: one way table which supplies the answer to questions
about one characteristic of data only.
Frequency Distribution table: The data is split into convenient groups
( class interval) and the number of items (frequency) which occurs in
each group is shown in the adjacent column
Master table Simple table
• For example:
• Number of decayed teeth in 10 children:
2,2,4,1,3,0,10,2,3,8
• Mean= 34/10= 3.4
• Median ( 0,1,2,2,2,3,3,4,8,10) = 2+3/2 = 2.5
• Mode = 2 (3 times)
Errors
• Errors are the difference between a value obtained from a data collection process and
the ‘true’ value for the population.
• Three types:
1. Observer error – the investigator may alter some information or not record the
measurement correctly.
2. Instrumental error - this is due to defects in the measuring instrument. Both the
observer and the instrument error are called non sampling error.
3. Sampling error or error of bias – this occurs when the samples are not chosen at
random from a population. A sample must be a represerntative of the whole
population.
Measures of Dispersion
• Dispersion is the degree of spread or variation of the variable, about a
central value.
• Helps to know how widely the observations are spread on either side
of the average.
• Most common measures of dispersion are:
1. Range
2. Mean deviation
3. Standard deviation
Range: Mean Deviation: Standard Deviation:
Defined as the difference It is the average of the Most important and widely
between the value of the deviation from the arithmetic used measure of studying
largest item and the smallest mean. dispersion.
item.
• Denoted by CV
• It is expressed as percentage.
Normal Curve/ Normal Distribution/
Gaussian Distribution
• When the data is collected
from a very large number of
people and a frequency
distribution is made with
narrow class intervals, the
resulting curve is smooth and
symmetrical- Normal curve.
• The limits on either side of
measurement are called
confidence limits.
Standard Normal Deviation
• There may be many normal curves but only one standard normal curve.
• Characteristics:
Bell shaped
Perfectly symmetrical
Frequency increases from one side, reaches its highest and decreases
exactly the way it had increased.
Total area of the curve is one, its mean is zero and standard deviation
is one.
The highest point denotes mean, median and mode which coincide.
Tests of Significance
• A statistical procedure by which one can conclude if the observed results
from the sample is due chance or not
• When different samples are drawn from the same population, the
estimates might differ- sampling variability.
• It deals with technique to know how far the difference between the
estimates of different samples is, due to sampling variation.
Standard error of mean
Standard error of proportion
Standard error of difference between two means
Standard error of difference between two populations
Standard error of mean
• Gives the standard deviation of the means of several samples from the same
population.
S.E =
Standard error of proportion
• It may be defined as a unit that measures variation which occurs by
chance in the proportions of a character from sample to sample or
from sample to population or vice versa in a qualitative data.
• If it falls in the zone of rejection for H0, shaded areas under the
curves and it is denoted by the letter P which indicates the
probability or relative frequency of occurrence of the difference
by chance.
• Random samples
• Quantitative data
• Paired t-tests are used when the same item or group is tested twice, which is
known as a repeated measures t-test. Some examples of instances for which a
paired t-test is appropriate include:
• The before and after effect of a pharmaceutical treatment on the same group
of people.
• Standardized test results of a group of students before and after a study prep
course.
Unpaired t test
• An unpaired t-test (also known as an independent t-test) is a statistical procedure that compares
the averages/means of two independent or unrelated groups to determine if there is a significant
difference between the two.
• The hypotheses of an unpaired t-test are the same as those for a paired t-test. The two
hypotheses are:
• The null hypothesis (H0) states that there is no significant difference between the means of the
two groups.
• The alternative hypothesis (H1) states that there is a significant difference between the two
population means, and that this difference is unlikely to be caused by sampling error or chance.
Analysis of Variance (ANOVA) test
• When comparisons of more than two independent groups on a continuous
outcome is required, then ANOVA is used
• It is the best way to test the equality of three or more means of more than 2
groups.
• One way anova- where only one factor will affect the result between 2 groups.
• Two way anova- where we have 2 factors that affect the result or outcome.
• Multiway anova- three or more factors affect the result or outcomes between
groups.
The CHI SQUARE test for qualitative data
( test) – Non Parametric Test
• Developed by Karl Pearson
• Chi-square test offers an alternate method of testing the significance of difference between two
proportions. It has the advantage that it can also be used when more than 2 groups are to be
compared.
• It is most commonly used when data are in frequencies such as in the number of responses in
two or more categories.
• Important applications:
• Perfect negative correlation- values are inversely proportional to each other. Ie.
When one rises, the other falls in the same proportion. (r) = -1.
Types of correlation
Regression
• It is a statistical method for studying the relationship between a single
dependent variable and one or more independent variable
• It is customary to denote the independent variate by x and the dependent
variate by y.
• Three types:
1. Observer error – the investigator may alter some information or not record
the measurement correctly.
2. Instrumental error - this is due to defects in the measuring instrument.
Both the observer and the instrument error are called non sampling error.
3. Sampling error or error of bias – this occurs when the samples are not
chosen at random from a population. A sample must be a representative of
the whole population.
• In statistics, a Type I error is a false positive conclusion, while a Type II
error is a false negative conclusion.
• The probability of making a Type I error is the significance level, or alpha (α),
while the probability of making a Type II error is beta (β)
• Using hypothesis testing, you can make decisions about whether your data
support or refuse your research predictions with null and alternative hypotheses.
The null hypothesis (H0) is that the new drug has no effect on symptoms of the
disease.
The alternative hypothesis (H1) is that the drug is effective for alleviating
• Then, you decide whether the null hypothesis can be rejected based on your data
and the results of a statistical test. Since these decisions are based on
probabilities, there is always a risk of making the wrong conclusion.
TYPE 1 ERROR
• A Type I error means rejecting the null hypothesis when it’s actually true. It
means concluding that results are statistically significant when, in reality, they
came about purely by chance or because of unrelated factors.
• Ex : You decide to get tested for COVID-19 based on mild symptoms. There are
two errors that could potentially occur:
Type I error (false positive): the test result says you have coronavirus, but you
actually don’t.
• The risk of committing this error is the significance level (alpha or α) you choose.
That’s a value that you set at the beginning of your study to assess the statistical
probability of obtaining your results (p value).
• Significance level is a term used to state that it is unlikely that their observations
could have occurred under the null hypothesis of a statistical test. Significance is
usually denoted by a p-value, or probability value.
• The significance level is usually set at 0.05 or 5%. This means that your results
only have a 5% chance of occurring, or less, if the null hypothesis is actually true.
• If the p value of your test is lower than the significance level, it means your
results are statistically significant and consistent with the alternative hypothesis.
If your p value is higher than the significance level, then your results are
considered statistically non-significant.
• For Example : In your clinical study, you compare the symptoms of patients who
received the new drug intervention or a control treatment. Using a t test, you
obtain a p value of .035. This p value is lower than your alpha of .05, so you
consider your results statistically significant and reject the null hypothesis.
• However, the p value means that there is a 3.5% chance of your results occurring
if the null hypothesis is true. Therefore, there is still a risk of making a Type I
error.
• To reduce the Type I error probability, you can simply set a lower significance
level.
TYPE 2 ERROR
• A Type II error means not rejecting the null hypothesis when it’s actually false.
This is not quite the same as “accepting” the null hypothesis, because hypothesis
testing can only tell you whether to reject the null hypothesis.
• A Type II error means failing to conclude there was an effect when there actually
was. In reality, your study may not have had enough statistical power to detect
an effect of a certain size.
• Power is the extent to which a test can correctly detect a real effect when there is
one. A power level of 80% or higher is usually considered acceptable.
• The risk of a Type II error is inversely related to the statistical power of a study.
The higher the statistical power, the lower the probability of making a Type II
error.
• The Type II error rate is beta (β)
• For Example: When preparing your clinical study, you complete a power
analysis and determine that with your sample size, you have an 80% chance of
detecting an effect size of 20% or greater. An effect size of 20% means that the
drug intervention reduces symptoms by 20% more than the control treatment.
• However, a Type II may occur if an effect that’s smaller than this size. A smaller
effect size is unlikely to be detected in your study due to inadequate statistical
power.
Statistical power is determined by:
• Sample size: Larger samples reduce sampling error and increase power.
• To (indirectly) reduce the risk of a Type II error, you can increase the sample
size or the significance level.
Analysis of Variance (ANOVA) test
• When comparisons of more than two independent groups on a continuous
outcome is required, then ANOVA is used
• It is the best way to test the equality of three or more means of more than 2
groups.
• One way ANOVA- where only one factor will affect the result between 2
groups.
• Two way ANOVA- where we have 2 factors that affect the result or outcome.
• Multiway ANOVA- three or more factors affect the result or outcomes between
groups.
• One way ANOVA: Suppose we want to know
whether or not three different exam prep programs
lead to different mean scores on a certain exam. To
test this, we recruit 30 students to participate in a
study and split them into three groups . The
students in each group are randomly assigned to
use one of the three exam prep programs for the
next three weeks to prepare for an exam. At the
end of the three weeks, all of the students take the
same exam.
The exam scores for each group are shown :
• Two way ANOVA :You are researching which type of fertilizer and planting
density produces the greatest crop yield in a field experiment. You assign different
plots in a field to a combination of fertilizer type (1, 2, or 3) and planting density
(1=low density, 2=high density), and measure the final crop yield in bushels per
acre at harvest time.
• You can use a two-way ANOVA to find out if fertilizer type and planting density
have an effect on average crop yield.
BIOSTATISTICS AND
RESEARCH
METHODOLOGY-PART 2
RESEARCH
METHODOLOGY
Contents: RESEARCH METHODOLOGY
• Introduction
• Types of research
• Objectives of research
• Steps involved in research
• Conclusion
• References
WHAT IS RESEARCH?
• Research is a logical and systematic search for new useful information on a
particular topic.
• Research is planned activity leading to the generation of information that will help
in answering a specific question.
• Research is a quest for knowledge through diligent search or investigation or
experimentation aimed at the discovery and interpretation of new knowledge.
-Health research methodology, WHO.
• Research is a systematized effort to gain new knowledge
- Redman and Mory
Types of Research
1. BASIC Vs APPLIED
2. OBSERVATIONAL Vs EXPERIMENTAL
3. QUALITATIVE Vs QUANTITATIVE
4. CONCEPTUAL Vs EMPIRICAL
Basic VS Applied
• Basic research is also called fundamental research.
It is a search for knowledge without a defined goal
of utility or purpose.
Ex :A study searching for the causative factors of malocclusion
• The definition of the subject of study and the target population should be
clearly spelt out.
• The inclusion and exclusion criteria should be decided in the beginning.
• Sample size is very important.
• The smaller the sample, the more the uncertainty.
• Sample size should be chosen in such a way that findings in the study should
reflect what is going on in the population.
• A well designed study but poorly analyzed can be rescued by re analysis, but
a poorly designed study but well analyzed is beyond the redemption of even
sophisticated statistics.
• To get valid and reliable results , appropriate research design and
research methodology and design is a prerequisite.
• Study design is the framework in which investigation is planned and
carried out.
• Selection if design is based on the type of research question.
Ex: Treated orthodontic patients of age group 15-20 years
Decide on study design and methodology
1. Observational:
• Studies in which studies are observed including:
Descriptive study
Analytical study
1.Case report
2.Case study/ case series
3.Case control studies
4.Cross sectional
5.Cohort/ longitudinal
2)Experimental: Studies in which the effect of an intervention
is observed
Randomized Controlled trials
• Field trials
• Community trials
Research study designs
Case reports
Case series
Analysis of secular trends
Case –control study
Cohort studies
• Randomized clinical trials
Case report
• Reports of events in a single
platform
• Useful for raising hypothesis
on drug effects. Leads to the
drug test with more rigorous
study designs.
Case series
• Collection of patients, all of whom have had a single exposure, whose
outcomes are then evaluated and described.
• They can also be a collection of patients with a single outcome,
looking at their antecedent exposure.
• Useful for quantifying the incidence of an adverse reaction or whether
it occurs in a larger population.
• Just provides clinical description of a disease or of patients who
receive and exposure.
Analysis of secular trends
• Also known as ecological studies.
• Examines trends in an exposure that is a presumed case and trends
in a disease that is a presumed effect and test whether the trends
coincide.
• Vital statistics and record linkages are often used in these studies.
• Useful for rapidly providing evidence for or against a hypothesis.
• Unable to control confounding variables.
For eg, Lung cancer might be caused because of cigarettes but
occupational hazards cannot be ruled out.
Case-Control studies- retrospective
study
• Compared cases with the disease to the cases without the disease , looking for differences in
exposure.
• Multiple causes of a single disease can be studied.
• Helps in studying relatively rare diseases and requires a smaller sample size.
• Information is generally obtained retrospectively from hospital records, questionnaires or
interviews.
• Limitations are validity of retrospective information and selection of controls is a
challenging task. Inappropriate control selection will result in incorrect conclusions.
Ex: Incidence of white spot lesion in patients who have undergone orthodontic treatment
Cohort studies
• Identify subsets of a defined population and followed them over time,
looking for differences in their outcome.
• Used to compare exposed patients to unexposed patients, can also be
used to compare one exposure to another or when multiple outcomes
from a single exposure is to be studied.
• Can be done prospectively or retrospectively.
• Requires large sample size and can require prolonged time period to
study delayed outcomes.
EXPOSURE DISEASE
Ex: A study on incidence of dental caries in patients undergoing
orthodontic treatment
Randomized clinical trials
• An experimental study- the investigator controls the therapy that is to
be administered to the participants.
• Major strength is the randomization.
• Disadvantages: ethical issues and could be expensive.
Ex: Intraligamentous injections of Vitamin D metabolite caused an
increase in the number of osteoclast which led to increase the
amount of tooth movement
Meta Analysis study
• Definition: Statistical analysis of collection of analytical results for the
purpose of integrating the findings.
• Uses:
Identify sources of variation among study findings.
To provide an overall measure of effect as a summary of those findings.
Most often used the assess the clinical effectiveness of healthcare
interventions. It does this by collecting data from two or more randomized
control trials.
It provides a precise estimate of treatment interventions, giving due weight to
the size of the different studies involved.
Studies chosen for the inclusion of a meta analysis must be sufficiently similar
in a number of characteristics to accurately combine their results.
Evidence Based Dentistry
• It is an approach to
dental practice that
uses the results of
patient care
research and other
available objective
evidence as a
component of
clinical decision
making.
Need for Evidence based Dentistry
• Daunting number of diseases
• Availability of broad number of therapeutic options
• To keep ourselves updates in the field of expertise.
• Addition in number of information sources
• To remain competent throughout the careers.
Writing the Protocol
• All the efforts put into preceding steps culminates into the draft of the
research protocol that incorporates all the information regarding the
research in a concise manner.
• The protocol should contain background information on the study,
objectives, ethical aspects, study design, study procedures, methods of
assessment, statistics and evaluation, administrative issues and
references.
• Once the protocol is ready, approval from the Ethics committee should be
obtained before the start of the study.
• Along with the protocol, the informed consent form and other documents
required should be submitted to the ethics committee for approval.
Collecting the data
• Once the protocol is finalized, the data should be collected.
• The data forms should be legibly filled, and they should be fully
completed.
• Ethical issues must be taken care of from the beginning to the end of the
study.
• In drug trials care must be taken to document the details of adverse
events if any.
• Proper documentation through out the study is important to ensure
credibility of data.
Ex : Patients with wsl post orthodontic treatment
Analyze the data and apply statistical
significance
• The data should be scrutinized for internal consistency and external
validity.
• Data should be analyzed using the already decided data management
plan
Write the report
• The report should be sufficiently detailed that can remove any doubt a
reader might have about any aspect of the results.
• It should be properly worded, should be adequately illustrated by
charts or diagrams or tablets which enhance the clarity.
• All the limitations need to be described openly.
Conclusion
• Research is a scientific method used to collect and analyze information
to increase our understanding or solve issues on a particular area.
• The research topic should be feasible, interesting, novel, ethical and
relevant.
• The ethical consideration should be taken care of conducting the
research
• The research result should not be biased, both the negative and
positive results should be researched/ published.
References
• Essentials of Preventive Community Dentistry- Dr. Soben Peter.
Third Edition
• Essentials of Preventive Community Dentistry- Dr. Soben Peter,
Fourth Edition
• Park’s Textbook of Preventive and Social Medicine- 22nd
Edition
• Health research methodology – WHO publication, 1993.
• Methods of BioStatistics : T Bhaskara Rao
THANK YOU