0% found this document useful (0 votes)
12 views

Biostatistics Definitions

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Biostatistics Definitions

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

1

1. Difference b/w Discrete and Continuous variables


with examples.

A discrete variable is a variable whose value is obtained


by counting.
Example; Number of students present, Number of red
marbles in a jar

A continuous variable is a variable whose value is


obtained by measuring.
Example; Height of students in class, Weight of students
in class, Time it takes to get to school, Distance traveled
between classes

2. Define absolute and relative dispersion.

An absolute measure of dispersion contains the same


unit as the original data set. Absolute dispersion method
expresses the variations in terms of the average of
deviations of observations like standard or mean
deviations. It includes range, standard deviation,
quartile deviation, etc.

The relative dispersion of a data set, more commonly


referred to as its coefficient of variation, is the ratio of
its standard deviation to its arithmetic mean. In effect,
it is a measurement of the degree by which an observed
variable deviates from its average value.
2

3. Define biostatistics.

Statistical processes and methods applied to the


collection, analysis, and interpretation of
biological data and especially data relating to
human biology, health, and medicine is called
biostatistics.

4. Differentiate b/w Descriptive and Inferential


statistics.

5. Define mean deviation.


MEAN DEVIATION:
The mean deviation is defined as a statistical measure
that is used to calculate the average deviation from the
mean value of the given data set. The mean deviation
3

of the data values can be easily calculated using the


below procedure.
Step 1: Find the mean value for the given data values
Step 2: Now, subtract the mean value from each of the
data values given (Note: Ignore the minus symbol)
Step 3: Now, find the mean of those values obtained in
step 2.
Formula;
Mean Deviation = [Σ |X – µ|]/N
Here,
Σ represents the addition of values
X represents each value in the data set
µ represents the mean of the data set
N represents the number of data values

6. Define variance.
VARIANCE:
Variance is a measure of how data points differ from
the mean. According to Layman, a variance is a measure
of how far a set of data (numbers) are spread out from
their mean (average) value.
The more the value of variance, the data is more
scattered from its mean and if the value of variance is
low or minimum, then it is less scattered from mean.
Therefore, it is called a measure of spread of data from
mean.
Formula;
4

The variance is the square of standard deviation;

Variance = (Standard deviation)2= σ2

Where X (or x) = Value of Observations


μ = Population mean of all Values
n = Number of observations in the sample set
¯x¯ = Sample mean
N = Total number of values in the population

7. What is primary data?


The data that have been initially collected and have not
undergone any statistical treatment are called primary
data. Primary data are also called as raw data or most
original data or un grouped data.

8. What is seccondary data?


The data that have undergone any sort of statistical
treatment at least once is called secondary data. The
5

data that have been classified tabulated or presented


are called as secondary data.

9. What is qualitative variables?


A variable that cannot be expressed numerically is
called qualitative variable. It is also called attribute,
categorical variable and non-numerical variable.
Example; Hair color of children, Hobbies of children.

10. What is quantitative variable?


A variable that can be expressed numerically is called
quantitative variable. It is also called numerical variable.
Example; Weight of patients, Heights of soldiers.

11. What is arithmetic mean?


Arithmetic mean represents a number that is obtained
by dividing the sum of the elements of a set by the
number of values in the set.
Formula;
6

12. Properties of Arithmetic mean.

⚫ The sum of deviations of the items from their


arithmetic mean is always zero, i.e. ∑(x – X) = 0.
⚫ The sum of the squared deviations of the items from
Arithmetic Mean (A.M) is minimum, which is less
than the sum of the squared deviations of the items
from any other values.
⚫ If each item in the arithmetic series is substituted by
the mean, then the sum of these replacements will
be equal to the sum of the specific items.
7

13. Merits of Arithmetic mean.


⚫ The arithmetic mean is simple to understand and
easy to calculate.
⚫ It is influenced by the value of every item in the
series.
⚫ A.M is rigidly defined.
⚫ It has the capability of further algebraic treatment.
⚫ It is a measured value and not based on the position
in the series.
⚫ It is the best measure to compare two set of data.
⚫ It is less affected by sampling fluctuation.
⚫ Widely use in statistical analysis.

14. Demerits of Arithmetic mean.


⚫ It is changed by extreme items such as very small and
very large items.
⚫ It can rarely be identified by inspection.
⚫ In some cases, A.M. does not represent the original
item. For example, average patients admitted to a
hospital are 10.7 per day.
⚫ The arithmetic mean is not suitable in extremely
asymmetrical distributions.
⚫ It cannot be used when data is open-ended.

15. What is mode?


A mode is defined as the value that has a higher
frequency in a given set of values. It is the value that
appears the most number of times.
8

Example: In the given set of data: 2, 4, 5, 5, 6, 7, the


mode of the data set is 5 since it has appeared in the set
twice.
More than one Mode:

⚫ When there are two modes in a data set, then the set
is called bimodal
⚫ For example, The mode of Set A = {2,2,2,3,4,4,5,5,5}
is 2 and 5, because both 2 and 5 is repeated three
times in the given set.
⚫ When there are three modes in a data set, then the
set is called trimodal
⚫ For example, the mode of set A =
{2,2,2,3,4,4,5,5,5,7,8,8,8} is 2, 5 and 8
⚫ When there are four or more modes in a data set,
then the set is called multimodal
Formula;
Ungrouped; most repeated value
Grouped;
9

16. Merits of mode.


(1)EASY TO CALCULATE • It is very easy to
& SIMPLE TO calculate.
UNDERSTAND • In some cases it can be
determined just by
observation or
inspection.
• Everyone understands
the concept of majority.
Since, mode is based on
this concept, it is easy to
understand.

(2)REPRESENTATIVE • It is a value around which


VALUE there is maximum
concentration of
observations.
• Hence, it is the best
representative of the
data.

(3) NOT AFFECTED BY • It is not affected by


THE VALUE OF EXTREME extreme values of the
ITEMS given data.
• It can be calculated even
if these extreme
observations are not
known.

(4) NO NEED OF • We can find mode even


COMPLETE DATA in case of open ended
frequency distribution.
• We basically need the
point of maximum
concentration of
frequencies, it is not
necessary to know all the
values.
10

(5) USEFUL FOR BOTH • It can be used to describe


QUANTITATIVE & quantitative as well as
QUALITATIVE DATA qualitative data.
• For example: In the
surveys it is used to
measure taste and
preferences of people for
a particular brand of the
commodity.

(6) GRAPHIC • It can be determined


DETERMINATION graphically with the help
of Histogram.

17. Demerits of mode.


(1) NOT BASED ON • The value of mode is not
ALL THE based on each and every
OBSERVATIONS OF item of the series as it
THE SERIES considers only the highest
concentration of
frequencies.

(2) SOMETIMES IT IS • Value of mode may not be


INDETERMINATE OR determined always.
ILL DEFINED • Some distributions can be Bi-
modal, Tri-modal or Multi-
modal.

(3) NOT RIGIDLY • There are two methods of


DEFINED determining mode,
Inspection Method and
Grouping Method. We may
not get the same value of
mode by the two methods.
So, it is not rigidly defined.

(4) AFFECTED BY THE • Mode is affected by sampling


FLUCTUATIONS OF fluctuations to a great
11

SAMPLING extent.
• This effect is more than that
in case of Mean.

(5) COMPLEX • Grouping of data is desirable


GROUPING PROCESS for correct computation but
it is a complex process and
involves so much
calculations.

(6) NOT CAPABLE OF • Since it is not based on all


ALGEBRAIC the observations and not
TREATMENT rigidly defined, it is not
suitable for further algebraic
treatment.

18. Properties of mode.


1) Although mode is the most popular measure of
central tendency, there are cases when mode remains
undefined.
2) Unlike mean, it has no mathematical property.
3) Mode is affected by sampling fluctuations.

19. What is median?


The median of a set of data is the middlemost number
or center value in the set. The median is also the
number that is halfway into the set.
To find the median, the data should be arranged, first, in
order of least to greatest or greatest to the least value.
A median is a number that is separated by the higher
half of a data sample, a population or a probability
12

distribution, from the lower half. The median is different


for different types of distribution.
Formula;
Ungrouped

Where, n is the number of observations


Grouped;

20. Merits of median.


(1) Easy to ● It is easy to calculate and simple
calculate and to understand.
understand ● In many situations, the median
can be located simply by inspection.
(2) Not affected ● It is not affected by the extreme
by extreme values values, i.e., the largest and smallest
values because it is a positional
average and not dependent on
magnitude.
(3) Rigidly defined ● It has a definite and certain
13

value because it is rigidly defined.


(4) Best average in ● Median is the best measure of
case of qualitative central tendency when we deal with
data qualitative data, where ranking is
preferred instead of measurement or
counting.
(5) Useful in case ● It can be calculated even if the
of an open-ended values of the extremes are not
distribution known. However, the number of
items should be known.
(6) Represented ● Its value can be determined or
graphically represented graphically with the help
of ogive curves. Whereas, it is not
possible in case of an arithmetic
mean.

21. Demerits of median.


(1) Arrangement ● Since the median is an average
of data is position, arranging the data in
necessary ascending or descending order of
magnitude is time-consuming in case
of a large number of observations.
(2) Not based on ● It is a positional average and
all the does not consider the magnitude of
observations the items.
● It neglects the extreme values.
(3) Not a ● It is not dependent on all the
representative of observations, so it cannot be
the universe considered as their good
representative.
● In case there is a big variation
between the data, it will not be able
to represent the data.
(4) Affected by ● It is affected by the fluctuations
14

fluctuations in in sampling and this effect is more


sampling than that in case of an arithmetic
mean.
(5) Lack of further ● It is a positional average, so
algebraic further algebraic treatment is not
treatment possible. Example: We cannot
compute the combined median of
two groups of data.

22. Properties of median.


• Median is not dependent on all the data values in a
data set.
• The median value is fixed by its position and is not
reflected by the individual value.
• The distance between the median and the rest of
the values is less than the distance from any other
point.
• Every array has a single median.
• Median cannot be manipulated algebraically. It
cannot be weighed and combined.
• In a grouping procedure, the median is stable.
• Median is not applicable to qualitative data.
• The values must be grouped and ordered for
computation.
• Median can be determined for ratio, interval and
ordinal scale.
• Outliers and skewed data have less impact on the
median.
• If the distribution is skewed, the median is a better
measure when compared to mean.
15

23. What is Null hypothesis?


The null hypothesis is a kind of hypothesis which
explains the population parameter whose purpose is to
test the validity of the given experimental data. This
hypothesis is either rejected or not rejected based on
the viability of the given population or sample. In other
words, the null hypothesis is a hypothesis in which the
sample observations results from the chance. It is said
to be a statement in which the surveyors wants to
examine the data.

Symbol;
In statistics, the null hypothesis is usually denoted by
letter H with subscript ‘0’ (zero), such that H0.It is
pronounced as H-null or H-zero or H-nought.
16

Remember that, p0 is the null hypothesis and p – hat is the


sample proportion.
24. Types of null hypothesis.
Simple Hypothesis
It completely specifies the population distribution. In
this method, the sampling distribution is the function of
the sample size.
Composite Hypothesis
The composite hypothesis is one that does not
completely specify the population distribution.
Exact Hypothesis
Exact hypothesis defines the exact value of the
parameter. For example μ= 50
Inexact Hypothesis
This type of hypothesis does not define the exact value
of the parameter. But it denotes a specific range or
interval. For example 45< μ <60
25. Difference b/w null and alternative hypothesis.
17

26. What is Alternative hypothesis?


The alternative hypothesis is a statement used in
statistical inference experiment. It is contradictory to
the null hypothesis and denoted by Ha or H1. We can
also say that it is simply an alternative to the null.
Example;
To check the water quality of a river for one year, the
researchers are doing the observation. As per the null
hypothesis, there is no change in water quality in the
first half of the year as compared to the second half. But
in the alternative hypothesis, the quality of water is
poor in the second half when observed.

27. Types of alternative hypothesis.


Basically, there are three types of the alternative
hypothesis, they are;
Left-Tailed: Here, it is expected that the sample
proportion (π) is less than a specified value which is
denoted by π0, such that;
H1 : π < π0
Right-Tailed: It represents that the sample proportion (π)
is greater than some value, denoted by π0.
H1 : π > π0
Two-Tailed: According to this hypothesis, the sample
proportion (denoted by π) is not equal to a specific value
which is represented by π0.
H1 : π ≠ π0
18

Note: The null hypothesis for all the three alternative


hypotheses, would be H1 : π = π0.

28. Properties of Normal distribution.


• In a normal distribution, the mean, mean and mode
are equal.(i.e., Mean = Median= Mode).
• The total area under the curve should be equal to 1.
• The normally distributed curve should be
symmetric at the centre.
• There should be exactly half of the values are to the
right of the centre and exactly half of the values are
to the left of the centre.
• The normal distribution should be defined by the
mean and standard deviation.
• The normal distribution curve must have only one
peak. (i.e., Uni-modal)
• The curve approaches the x-axis, but it never
touches, and it extends farther away from the
mean.

29. Applications of Normal distribution.

The normal distributions are closely associated with


many things such as:
• Marks scored on the test
• Heights of different persons
• Size of objects produced by the machine
• Blood pressure and so on.
19

30. Difference b/w correlation and regression.

Correlation is explained as an analysis which helps us to


determine the absence of the relationship between the
two variables – ‘p’ and ‘q’.

Regression too is an analysis, that foretells the value of


a dependent variable based on the value, that is already
known of the independent variable.

You might also like