Basic Statistical Concepts and Methods
Basic Statistical Concepts and Methods
Ahmed-Refat AG Refat
FOM-ZU
Ahmed-Refat-ZU
Definition of Statistics
Statistics is the science of dealing with
numbers.
collection, summarization,
presentationandanalysisofdata.
It is used for
Statisticsprovidesawayoforganizingdatato
get information on a wider and more formal
(objective) basis than relying on personal
experience(subjective).
Ahmed-Refat-ZU
Uses of medical
statistics
Medicalstatisticsareusedin
1- Planning, monitoring and evaluating
community
healthcareprograms.
2- Epidemiologicalresearchstudies.
3- Diagnosisofcommunityhealthproblems.
4- Comparisonofhealthstatusanddiseasesindifferent
countriesandinonecountryoveryears.
5- To form standards for the different biological
measurementsasweight,height.
6- Todifferentiatebetweendiseasedandnormalgroups.
Ahmed-Refat-ZU
Types of data
Anyaspectofanindividualthatismeasured,iscalled
variable.Variablesareeither
1-Quantitative or 2-Qualitative.
1-Quantitative data: it isnumericaldata.
Discrete data: areusually whole numbers, suchasnumber
of cases of certain disease, number of hospital beds (no
decimalfraction).
Continuous data: it implies the measurement on a
continuous scalee.g.height,weight,age(adecimalfraction
canbepresent).
Ahmed-Refat-ZU
1-Quantitative data
.
Quantitative data: it isnumericaldata.
TowTypes
A- Discrete data: are usually whole numbers, such
as number of cases of certain disease, number of
hospitalbeds(nodecimalfraction).
B- Continuous data:itimpliesthemeasurementona
continuous scalee.g.height,weight,age
(adecimalfractioncanbepresent).
Ahmed-Refat-ZU
2- Qualitative data
Qualitative data: It is non numerical data and is
subdividedintoTwoTypes:
A- Categorical : data are purely descriptive and
implynoorderingofanykindsuchassex,areaof
residence.
B- Ordinal data:arethosewhichimplysomekind
oforderinglike
Levelofeducation:
Socio-economicstatus:
Degreeofseverityofdisease:
Ahmed-Refat-ZU
Presentation Of Data
Thefirststepinstatisticalanalysisistopresent
datainaneasywaytobeunderstood.
Thetwobasicwaysfordatapresentationare:
Tabular presentation.
2. Graphical presentation
1.
Ahmed-Refat-ZU
Tabulation
Somerulesfortheconstructiontables:
1-Thetablemustbeself-explanatory.
2- Title: written at the top of table to define
preciselythecontent,theplaceandthetime.
3-Clearheading of the columns and rows and
units of measurements
4-The size of the table depends on the number
of classes. Usually lie between 2 and 10 rows
or classes. Its selection depends on the form of data and the
requirementofthedistribution.Toosmallmayobscuresomeinformationand
toolongwillnotdifferfromrawdata.
Ahmed-Refat-ZU
Types of tables
For Qualitative data, draw a simple table eg., List
Table : count the number of observations
( frequencies) in each category.
For Quantitative data, we have to form a
frequency distribution Table
Ahmed-Refat-ZU
Types of tables
:List:
A table consisting of two columns, the first giving an
identificationoftheobservational unitandthesecondgiving
thevalue of variable for that unit.
Example : number of patients in each hospital department
are
Medicine100patients
Surgery80
ENT28
Ophthalmology30
Ahmed-Refat-ZU
Frequency Distribution
tables
FDTs are used for presentation of
qualitative ( and quantitative Discrete)data,
Byrecordingthenumberof
observationsineachcategory.
Thesecountsarecalledfrequencies.
.
No Classes .. No Intervals
Ahmed-Refat-ZU
Frequency Distribution
tables
FDTforQuantitative Continuous Data
consistsofaseriesof classes
(intervals) together with the number of
observations ( frequency) whose values
fallwithintheintervalofeachclass.
Ahmed-Refat-ZU
Frequency Distribution
tables
EXAMPLE(1)Assumewehaveagroup
of 20 individuals whose blood groups
were as followed :A,AB,AB, O, B,A,
A,B,B,AB,O,AB,AB,A,B,B,B,A,O,
A.Wewanttopresentthesedataby
table.
?????Typeofdata>>>>>>
Ahmed-Refat-ZU
How to Construct a
Frequency Distribution
tables
FourSteps
Title,Table,No,%
1- Put a title
2- Draw Columns & Rows
3- Enumerate the individuals in each
category
4- Calculate The relative frequency (%)
(%)
Ahmed-Refat-ZU
How to Construct a
Frequency Distribution
tables
FourSteps
1- Put a titleeg.,
Distribution of the studied individuals according
to their blood group.
2- Draw a table (Columns & Rows),
Firstcolumn>StudiedVariable
Blood Group,
2ndcolumnheading>Frequency-Number
3rdcolumnheading>Percentage%
Ahmed-Refat-ZU
Frequency Distribution
tables
3- Enumerate the individuals in each
blood group, i.e. individuals with blood group A are 6
and those with blood group B are 6 , AB are 5 and blood group
Ahmed-Refat-ZU
Frequency Distribution
tables
4- Calculate The relative frequency
(%)ofeachbloodgroupbydividingthe
(%)
frequency of that group over the total
number of individuals and multiplied by
100
i.e.thepercentageofgroupA=6/20x100,andthesamefor
groupAB=5/20x100andgroupO=3/20x100.Thefinal
tablewillbe
:
Ahmed-Refat-ZU
Frequency Distribution
tables What is Your
Conclusion?
Ahmed-Refat-ZU
Frequency Distribution
tables
We can conclude from this table that
blood groups A & B are the most
commongroupsandtherarestisgroup
O(depending on the percentage of each group).
Sopresentingdataintableisbeneficial
in deducing facts and simplify
informationthanrawdata.
Ahmed-Refat-ZU
Frequency Distribution
tables
EXAMPLE (3) : The Following data are
Systolic Blood Pressure measurements
(mmHg) of 30 patients with hypertension.
Presentthesedatainfrequencytable:
150,155,160,154,162,170,165,155,190,186,180,178,
195,200,180,156,173,188,173,189,190,177,186,
177,174,155,164,163,172,160.
???????TypeofData
Ahmed-Refat-ZU
Frequency Distribution
tables
FourSteps
1- Put a titleeg.,
Frequencydistributionofbloodpressure
measurements(mmHg)amongagroupof
hypertensivepatients.
2- Draw a table (Columns & Rows),
Ahmed-Refat-ZU
Frequency Distribution
tables
3-Inthefirstcolumnwehavetoclassify
blood pressure into categories or
classes because we have a large
sample(N=30)
and the measured variable is of
continuoustype(notdiscreteasintheprevious
examples).
Ahmed-Refat-ZU
Frequency Distribution
tables
construction of classes
Calculate the Range of observation:
subtractthelowestvalueofbloodpressuresfromthehighestvalue
(thehighestwas200andthelowestwas150)thedifferenceis50 .
intervalbe10,sowewillhave50/10=5classes.
EnumeratetheFrequencyByTallyMethods
Calculate the Exact Frequncy & Relative
frequency
Ahmed-Refat-ZU
Frequency Distribution
tables
construction of classes
Ahmed-Refat-ZU
2-Graphical Presentation
The diagram should be:
Simple
Easy to understand
Save a lot of words
Self explanatory
Has a clear title indicating its content
Fully labeled
The y axis (vertical) is usually used for frequency
Ahmed-Refat-ZU
2-Graphical Presentation
Graphicpresentationsusedtoillustrate
and clarify information. Tables are
essential in presentation of scientific
data and diagrams are complementary
to summarize these tables in an easy,
attractiveandsimpleway.
Ahmed-Refat-ZU
Graphical Presentation
1- Bar chart
>>>Simple ,
>>> Multiple,
>>>Components
Ahmed-Refat-ZU
Graphical Presentation
Meanageinyears
27
26.5
26
25.5
25
24.5
24
groupI
groupII
Thestudiedgroups
Ahmed-Refat-ZU
groupIII
Graphical Presentation
1- Bar chart
Graphical Presentation
1- Bar chart-Multiple
Multiple bar chart:
Males
Females
Cancer
Anemia
Ahmed-Refat-ZU
Graphical Presentation
1- Bar chart
Ahmed-Refat-ZU
Graphical Presentation
1- Bar chart
Graphical Presentation
1- Bar chartComponent
ComparisonbetweenEgyptandUSAinsocio-economicstandardof
living
percentageofpopulation
100%
80%
high
60%
moderate
low
40%
20%
0%
Egypt
USA
Ahmed-Refat-ZU
Graphical Presentation
2-Pie diagram:
Consistofacirclewhosearearepresents
thetotalfrequency(100%)whichis
dividedinto segments.
Eachsegmentrepresentsaproportional
compositionofthetotalfrequency.
Ahmed-Refat-ZU
Graphical Presentation
2-Pie diagram:
PercentageofcausesofchilddeathinEgypt
congenital
10%
accident
10%
diarrhea
50%
chestinfection
30%
Ahmed-Refat-ZU
Graphical Presentation
3- Histogram:
Graphical Presentation
3- Histogram:
Distributionofstudiedgroupaccordingtotheirheight
numberofindividuals
30
25
20
15
10
5
0
100-
110-
120-
130-
height in cm
Ahmed-Refat-ZU
140-
150-
Graphical Presentation
4 -Frequency Polygon
Derived from a histogram by connecting the
mid points of the tops of the rectangles in
thehistogram.
The line connecting the centers of histogram
rectanglesiscalledfrequencypolygon.
We can draw polygon without rectangles so
wewillgetsimplerformoflinegraph.
A special type of frequency polygon is the
NormalDistributionCurve.
Ahmed-Refat-ZU
Graphical Presentation
5 - Scatter diagram
-
relationship
between
numeric measurements,
two
each
observation being represented by a
pointcorrespondingtoitsvalueoneach
axis
Ahmed-Refat-ZU
Thisscatterdiagramshowedapositiveordirect
relationshipbetweenNAGand
albumin/creatinineamongdiabeticpatients
NAG
CorrelationbetweenNAGandalbumincreatinine
ratioingroupofearlydiabetics
35
30
25
20
15
10
5
0
0
0.05
0.1
0.15
0.2
albumincreatinineratio
Ahmed-Refat-ZU
0.25
0.3
0.35
CorrelationbetweenDopplervelocimetry(RI)and
babybirthweight
1
RI
0.8
0.6
0.4
0.2
0
1.5
2.5
3.5
4.5
babyweightinkg
Innegativecorrelation,thepointswillbe
scatteredindownwarddirection,
meaningthattherelationbetweenthe
twostudiedmeasurementsis
controversiali.e.ifonemeasure
increasestheotherdecreases.As
showninthefollowinggraph
Ahmed-Refat-ZU
Graphical Presentation
6- Line graph:
itisdiagramshowingtherelationshipbetweentwo
numericvariables(asthescatter)butthepointsare
joinedtogethertoformaline(eitherbrokenlineor
smoothcurve)
Changesinbodytemperatureofapatientafteruseofantibiotic
39.5
39
temperature
38.5
38
37.5
37
36.5
36
1
Ahmed-Refat-ZU
timeinhours
Normal Distribution
Curve
Ahmed-Refat-ZU
Normal Distribution
curve
NDCisaGraphical
Presentation<FrequencyPolygon>
ofanyQuantitativeBiologicVariables
TheNormalDistributionCurveisthefrequency polygonofaquantitativevariable
measuredinlargenumber.
Itisaformofpresentationoffrequencydistributionofbiologicvariablessuchas
weights,heights,hemoglobinlevelandbloodpressureoranycontinuousdata.
Itoccupiesamajorroleinthetechniquesofstatistical
analysis.
Ahmed-Refat-ZU
Ahmed-Refat-ZU
Characteristics of Normal
Distribution curve
1- Itisbellshaped,continuouscurve.
2- It is symmetrical i.e.can bedividedinto two equal
halvesvertically.
3- The tails never touch the base line but extended
toinfinityineitherdirection.
4- Themean,medianandmodevaluescoincide
5- Itisdescribedbytwoparameters:arithmeticmean
determinethelocationofthecenterofthecurveand
standard deviation represents the scatter around
themean.
Ahmed-Refat-ZU
Ahmed-Refat-ZU
Skewed data
Ifwerepresentacollecteddatabya
frequencypolygongraphandthe
resultedcurvedoesnotsimulatethe
normaldistributioncurve(withallitscharacteristics)
thenthesedataare not normally
distributed
Ahmed-Refat-ZU
Thecurvemaybeskewedtotherightortotheleftside
ThisisbecauseThedatacollectedarefrom:
1.
2.
thereforetheresultsobtainedfromthesedatacannotbeapplied
orgeneralizedonthewholepopulation.
Ahmed-Refat-ZU
NDCcanbeusedindistinguishingbetweennormal
fromabnormalmeasurements.
Example:
IfwehaveNDCforhemoglobinlevelsforapopulation
ofnormaladultmaleswithmeanSD=11
1.5
Ifweobtainahemoglobinreadingforanindividual=
8.1andwewanttoknowifhe/sheisnormalor
anemic.
Ifthisreadinglieswithintheareaunderthecurveat
95%ofnormal (i.e. mean 2 SD)he/she
willbeconsiderednormal.Ifhisreadingisless
thenheisanemic.
Ahmed-Refat-ZU
i.ethenormalrangeofhemoglobinofadultmales
isfrom8to14.
our sample (8.1 ) lieswithinthe95%ofhis
population.
thereforethisindividualis normalbecausehis
readinglieswithinthe95%ofhispopulation.
Ahmed-Refat-ZU
Data Summarization
To summarize data, we need to use
one or two parameters that can
describethedata.
1.
Measures of Central
tendency
whichdescribesthecenterofthedata
2. and the Measures of Dispersion,
whichshowhowthedataarescattered
arounditscenter.
Ahmed-Refat-ZU
1- The arithmetic
mean:
the sum of observation divided by the number
ofobservations:
x =
x
n
Where:x=mean
denotesthe(sumof)
xthevaluesofobservation
nthenumberofobservation
Ahmed-Refat-ZU
1- The arithmetic
mean:
Example: In a study the age of 5
studentswere:12,15,10,17,13
Mean = sum of observations / number
ofobservations
ThenthemeanX=(12+15+10+17
+13)/5=13.4years
Ahmed-Refat-ZU
CalculationofMean
ForfrequencyDistributionData
In case of frequency distribution data we
calculatethemeanbythisequation:
x =
fx
n
wheref=frequency
for example : we want to calculate the
meanincubationperiodofthisgroup.
Ahmed-Refat-ZU
CalculationofMean
ForfrequencyDistributionData
Ahmed-Refat-ZU
CalculationofMean
ForfrequencyDistributionData
Ahmed-Refat-ZU
Ahmed-Refat-ZU
2- Median
It is the middle observation in a series
of observation after arranging them in
anascendingordescendingmanner.
The rank of median for is (n + 1)/2 if
thenumberofobservationisodd
andn/2ifthenumberiseven
Ahmed-Refat-ZU
2- Median
Calculatethemedianofthefollowing
data5,6,8,9,11n = 5~ Odd!!
-Therankofthemedian=n + 1 / 2
i.e.(5+ 1)/ 2 = 3
The median is the third value in these groups
when data are arranged in ascending (or
descending)manner.
-So the median is 8 (the third value)
Ahmed-Refat-ZU
2- Median
- If the number of observation is even, the
medianwillbecalculatedasfollows:
e.g. 5, 6, 8, 9
n=4
-Therankofmedian= n / 2i.e.4/2=2.The
medianisthesecondvalueofthatgroup.Ifdata
arearrangedascendinglythenthemedianwillbe
6 and if arranged descendingly the median will
be8thereforethemedianwill be the mean of
both observationsi.e.(6+8)/2=7.
Ahmed-Refat-ZU
2- Median
For simplicity we can apply the same
equationusedforoddnumbersi.e.
n + 1 / 2. The median rank will be 4 +
1 /2 = 2 i.e. the median will be the
secondandthethirdvaluesi.e.6and
8,taketheirmean=7.
Ahmed-Refat-ZU
3- Mode
Themostfrequentoccurringvalueinthedata
isthemodeandiscalculatedasfollows:
Example: 5, 6, 7, 5, 10. The mode in this
data is 5 since number 5 is repeated twice.
Sometimes, there is more than one mode
andsometimesthereis no modeespecially
insmallsetofobservations.
Ahmed-Refat-ZU
3- Mode
Example : 20 , 18 , 14, 20, 13, 14, 30,
19.Therearetwomodes14and20.
Example : 300, 280 , 130, 125 , 240 ,
270.Hasnomode.
UnimodalBimodalNomodal
Ahmed-Refat-ZU
the measures of
central Tendency:
of
Ahmed-Refat-ZU
the measures of
central Tendency:
of
Ahmed-Refat-ZU
Measures of
Dispersion
The measure of dispersion describes the
degreeofvariationsorscatterordispersion
of the data around its central values:
(dispersion=variation=spread=scatter).
1.
Range-R
2.
Variance-V
3.
StandardDeviation-SD
4.
CoefficientofVariation-COV
Ahmed-Refat-ZU
1-Range:
is the difference between the largest and
smallestvalues.
isthesimplestmeasureofvariation.
disadvantages,itisbasedonlyontwoof
theobservationsandgivesnoideaofhowthe
otherobservationsarearrangedbetween
thesetwo.
Also,ittendstobelargewhenthesizeof
thesampleincreases
Ahmed-Refat-ZU
2-Variance
2-Variance
VarianceV=(meanx)/n
Thevalueofthisequationwillbeequal
tozero
because the differences between each value and the
mean will have negative and positive signs that will
equalize zero on algebraic summation.
Ahmed-Refat-ZU
2-Variance
Toovercomethiszerowesquarethe
differencebetweenthemeanandeachvalue
sothesignwillbealwayspositive
.Thusweget:
V
= (mean x)2 / n - 1
Ahmed-Refat-ZU
3- Standard Deviation
SD
The main disadvantage of the variance
isthatitisthesquareoftheunitsused.
So,itismoreconvenienttoexpressthe
variation in the original units by taking
the square root of the variance. This is
called the standard deviation (SD).
ThereforeSD=V
i.e.SD = (mean x)2 / n - 1
Ahmed-Refat-ZU
4- Coefficient of
variation CoV
C. V = SD / mean * 100
C.V is useful when, we are interested in the
relativesizeofthevariabilityinthedata.
Example : if we have observations 5, 7, 10, 12
and 16. Their mean will be 50/5=10. SD =
(25+9+0+4+36)/(5-1)=74/4=4.3
C.V.=4.3/10x100=43%
Ahmed-Refat-ZU
Example
Calculate the mean, variance, SD and CV
Fromthefollowingmeasurements
5,7,10,12and16.
Mean=5+7+10+12+16/5=10.
SD=(25+9+0+4+36)/ (5-1)=
74/4=4.3
C.V.=4.3/ 10x100=43%
Ahmed-Refat-ZU
Example
Another observations are 2, 2, 5, 10, and 11. Their
mean=30/5=6
SD=(16+16+1+16+25)/(51)=74/4
=4.3
C.V=4.3/6x100=71.6%
Both observations have the same SD but they are
different in C.V. because data in the first group is
homogenous (so C.V. is not high), while data in the
second observations is heterogenous (so C.V. is
high).
Ahmed-Refat-ZU
Example
Example: In a study where age was
recorded the following were the
observed values: 6, 8, 9, 7, 6. and the
numberofobservationswere5.
Calculate the mean, SD and range,
modeandmedian.
Themean=sumofobservation/
theirnumber
Ahmed-Refat-ZU
Examples
The variance = Sum of the squared
differences (mean minus observation) /
number of observations. (7.2 6)2 +
(7.28)2+(7.29)2+(7.27)2+(7.2
6)2/51.whichisequalto(1.2) 2+(-
0.8)2+(-1.8)2+(0.2)2+(1.2)2/4=1.7
-Sothevariance=1.7
Ahmed-Refat-ZU
Examples
-TheS.D.=1.7=1.3
Range=96=3
Themodeis6
Themedianis:firstwehavetoarrange
dataascendinglyi.e.66789.
Therankofmedian=n+1/2i.e.5+1/2=
3 therefore the median is the third value i.e.
median=7
Ahmed-Refat-ZU
Inferential statistics
Inference
involves
making
a
Generalization about a larger group
ofindividualsonthebasisofasubsetor
sample.
Ahmed-Refat-ZU
Inferential statistics
Hypothesis Testing
Inhypothesistestingwewanttofindout
whether the observed variation among
samplingisexplainedbychance alone
???? (i.e., the chance of random sampling
variations ),orduetoa real difference
????betweengroups.
Ahmed-Refat-ZU
Hypothesis Testing
Itinvolvesconductingatestofstatistical
significance quantifying the chance of
random
sampling variations that may
accountforobservedresults.
Inhypothesestesting,weareaskingwhether
the sample mean for example is consistent
with a certain hypothesis value for the
populationmean.
Ahmed-Refat-ZU
Hypothesis Testing
The method of assessing the
hypotheses testing is known as
significance test.
Hypothesis Testing
Steps
>>>FormulateHypothesis
>>>CollecttheData
>>>>TestYourHypothesis
>>>AcceptofRejectYourHypothesis
Ahmed-Refat-ZU
General principles of
significance tests
1. set up a null hypothesis and its
alternative.
2. findthevalueoftheteststatistic.
3. referthevalueoftheteststatistictoa
known distribution which it would
followifthenullhypothesiswastrue.
Ahmed-Refat-ZU
General principles of
significance tests
4-concludethatthedataareconsistentor
inconsistentwiththenullhypothesis.
If the data are not consistent with the
nullhypotheses,thedifferenceissaidto
bestatisticallysignificant.Ifthedataare
consistent with the null hypotheses it is
said that we accept it i.e. statistically
insignificant.
Ahmed-Refat-ZU
General principles of
significance tests P<0.05
In medicine, we usually consider that
differences are significant if the
probabilityislessthan0.05.Thismeans
that if the null hypothesis is true, we
shallmakeawrongdecisionlessthan5
inahundredtimes
Ahmed-Refat-ZU
Tests of significance
The selection of test of significance depends
essentiallyonthetypeofdatathatwehave.
1-Quantitative Data ( Means & SD): t
test ,paired
Tests of significance
Comparison of means:
1-comparingtwomeansoflargesamplesusingthe
normaldistribution:
(ztestorSNDstandardnormaldeviate)
Ifwehavealargesamplesizei.e.60ormoreand
itfollowsanormaldistributionthenwehavetouse
thez-test.
z = (population mean sample mean) /
Tests of significance
Since the normal range for any
biological reading lies between the
meanvalueofthepopulationreading
2 SD. (this range includes 95% of the
area under the normal distribution
curve).
Ahmed-Refat-ZU
Students t-test
2-Comparing two means of small
samplesusingt-test:
If we have a small sample size (less
than 60), we can use the t distribution
insteadofthenormaldistribution.
T = mean1 mean2 /(SD1 2 / n1) +
(SD22/n2)
Ahmed-Refat-ZU
t-test
Thevalueoftwillbecomparedtovaluesin
thespecifictableof"tdistributiontest"atthe
valueofthedegreeoffreedom.Ifthevalueof
t is less than that in the table , then the
differencebetweensamplesisinsignificant.
Ifthetvalueislargerthanthatinthetableso
the difference is significant i.e. the null
hypothesisisrejected.
Ahmed-Refat-ZU
t-test
2-Comparing two means of small
samplesusingt-test:
If we have a small sample size (less
than 60), we can use the t distribution
insteadofthenormaldistribution.
T = mean1 mean2 /(SD1 2 / n1) +
(SD22/n2)
Ahmed-Refat-ZU
Paired t-test
3-pairedt-test:
If we are comparing repeated
observation in the same individual or
difference between paired data, we
have to use paired t-test where the
analysis is carried out using the mean
andstandarddeviationofthedifference
betweeneachpair.
Ahmed-Refat-ZU
ANOVA
4-comparingseveralmeans:
Sometimesweneedtocomparemore
thantwomeans,thiscanbedonebythe
useofseveralt-testwhichisnotonly
tediousbutcanleadtospurious
significantresults.Thereforewehaveto
usewhatwecallanalysisofvarianceor
ANOVA.
Ahmed-Refat-ZU
ANOVA
4-comparingseveralmeans:
Therearetwomaintypes:one-wayanalysisof
varianceandtwo-wayanalysisofvariance.Onewayanalysisofvarianceisappropriatewhenthe
subgroupstobecomparedaredefinedbyjust
onefactor,forexamplecomparisonbetween
meansofdifferentsocio-economicclasses.The
two-wayanalysisofvariablesisusedwhenthe
subdivisionisbaseduponmorethanonefactor
Ahmed-Refat-ZU
ANOVA
The main idea in the analysis of variance is
that we have to take into account the
variability within the groups and between the
groups and value of F is equal to the ratio
between the means sum square of between
thegroupsandwithinthegroups.
F=between-groupsMS/within-groupsMS
Ahmed-Refat-ZU
Chi-Squared Test
b-Qualitative variables:
1)Chi -squared test:
Qualitative data are arranged in table
formed by rows and columns. One
variable define the rows and the
categories of the other variable define
thecolumn.
Ahmed-Refat-ZU
Chi-Squared Test
A chi-squared test is used to test whether
there is an association between the row
variable and the column variable or, in other
words whether the distribution of individuals
among the categories of one variable is
independent of their distribution among the
categoriesoftheother.
X2=(O-E)2
/E
Ahmed-Refat-ZU
Chi-Squared Test
1)Chi -squared test:
degreeoffreedom=(row-1)(column-
1)
O=observedvalueinthetable
E=expectedvaluecalculatedasfollows:
E=Rt x Ct / GT
totalofrowxtotalofcolumn/grandtotal
Ahmed-Refat-ZU
Ahmed-Refat-ZU
Chi-Squared Test
FromtablesofX2significanceat
degreeoffreedom(row3-1)x(column31)=2x2=4.Thelevelofsignificanceat
0.05level,d.f.=4is9.48.thereforewe
concludethatthereissignificantrelation
betweensocioeconomiclevelandthe
degreeofintelligence(becausethe
valueofX2>thatofthetable).
Ahmed-Refat-ZU
Z Test
2)Ztestforcomparingtwopercentages:
z = p1 p2
/p1q1/n1 + p2q2/n2.
Ahmed-Refat-ZU
Chi-Squared Test
Example:ifthenumberofanemicpatientsin
group 1 which includes 50 patients is 5 and
the number of anemic patients in group 2
which contains 60 patients is 20. To find if
groups 1 & 2 are statistically different in
prevalenceofanemiawecalculateztest.
P1=5/50=10%p2=20/60=33%q1=10010=90q2=100-33=67
Ahmed-Refat-ZU
Chi-Squared Test
Z=1033/10x90/50+33x67/60
Z=23/18+36.85z=23/7.4
z=3.1
Therefore there is statistical significant
difference between percentages of
anemia in the studied groups (because
z>2).
Ahmed-Refat-ZU
Correlation &
regression
c-Correlation and regression:
Correlation measures the closeness of
theassociationbetweentwocontinuous
variables, while linear regression gives
theequationofthestraightlinethatbest
describesandenablesthepredictionof
onevariablefromtheother.
Ahmed-Refat-ZU
Correlation &
regression
1-Correlation:
In the correlation, the closeness of the
association is measured by the correlation
coefficient,r.Thevaluesofrrangesbetween+
1and1.
Onemeansperfectcorrelationwhile0means
nocorrelation.Ifrvalueisnearthezero,it
meansweakcorrelationwhileneartheoneit
meansstrongcorrelation.Thesignand+
denotesthedirectionofcorrelation,
Ahmed-Refat-ZU
Correlation
1-Correlation:
the +ve correlation means that if one
variable increases the other one
increases similarly while for the ve
correlation means that when one
variable increases the other one
decreases
Ahmed-Refat-ZU
Linear regression
2- Linear regression:
Similar to correlation, linear regression
is used to determine the relation and
prediction of the change in a variable
due to changes in other variable. For
linearregression,theindependentfactor
has to be specified from the dependent
variable.
Ahmed-Refat-ZU
Linear regression
2- Linear regression:
The linear regression, not only allow assessment
of the presence of association between the
independent and dependent variable but also
allows the prediction of dependent variable for a
particular independent variable. However,
regression for prediction should not be used
outside the range of original data. a t-test is also
used for the assessment of the level of
significance. The dependent variable in linear
regressionmustbeacontinuousone.
Ahmed-Refat-ZU
CorrelationbetweenDopplervelocimetry(RI)and
babybirthweight
1
0.8
RI
0.6
0.4
0.2
0
1.5
2.5
babyweightinkg
Ahmed-Refat-ZU
3.5
4.5
Multiple
regression
3-Multiple regression:
Situations frequently occur in which we
are interested in the dependency of a
dependent variable on several
independent variables, not just one.
Testofsignificanceusedistheanalysis
ofvariance.(Ftest).
Ahmed-Refat-ZU
outpatientclinic
4. Random20femalesand20males
outofgroupof100person
5. Allworkersinafactorychosenfrom
allfactoriesincertaingovernorate
Ahmed-Refat-ZU
Ahmed-Refat-ZU
Ahmed-Refat-ZU
Theweight(Kg)ofapregnant
Ahmed-Refat-ZU
Ahmed-Refat-ZU