0% found this document useful (0 votes)
255 views10 pages

Chapter Three 3.0 Methodology 3.1 Source of Data

The document describes the methodology used in a study on contraceptive use in Ethiopia. It discusses: 1) The data source which was the 2005 Ethiopian Demographic and Health Survey, including data from 14,070 women aged 15-49. 2) The dependent and independent variables in the study, with the dependent variable being current contraceptive use. 3) The methods of data analysis including descriptive statistics, chi-square tests, and logistic regression to examine the relationship between contraceptive use and factors like age, education, wealth.

Uploaded by

Olaolu Joseph
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
255 views10 pages

Chapter Three 3.0 Methodology 3.1 Source of Data

The document describes the methodology used in a study on contraceptive use in Ethiopia. It discusses: 1) The data source which was the 2005 Ethiopian Demographic and Health Survey, including data from 14,070 women aged 15-49. 2) The dependent and independent variables in the study, with the dependent variable being current contraceptive use. 3) The methods of data analysis including descriptive statistics, chi-square tests, and logistic regression to examine the relationship between contraceptive use and factors like age, education, wealth.

Uploaded by

Olaolu Joseph
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

CHAPTER THREE

3.0 METHODOLOGY

3.1 Source of data

The data for this study were obtained from the publications of the Central Statistical Agency
(CSA) on the 2005 Ethiopian Demographic and Health Survey (EDHS). The 2005 EDHS was
the second national demographic and health survey conducted by CSA. The 2005 EDHS was
conducted with the prime objective of generating health and demographic information on family
planning, female circumcision fertility level and determinants, infant, child, adult and maternal
mortality, child and maternal nutrition, malaria, women’s empowerment, and knowledge of
HIV/AIDS along with other household characteristics in the nine regions and two administrative
regions both at rural and urban levels. The study used data a total of 14,070 women in the age
group 15-49 years.

3.2 Variables in the study

The dependent variable is a dichotomous random variable “currently use a contraceptive


method” (coded as 1) and “currently use no contraceptive method” (coded as 0). Factors that
influence the use of contraceptive practice included in the study are: age of a woman, place of
residence, religion, partners education level, woman’s education, heard information about family
planning (FP) on radio last month before the survey, frequency of listening to radio, visited by
FP worker in last 12 months, woman currently working, region and wealth index.

3.3 Methods of Data analysis

After collecting the data by designing questionnaire and face to face interviewing, it is necessary
to analyze the collected data by using appropriate statistical tools. Data analysis is a critical study
by which we extract information from the collected data. Methods of data analysis used in this
study are both descriptive and inferential statistics.
3.3.1. Descriptive Statistics

In this study frequency distribution and graphs were used. Bar charts are used diagrammatic
representation of nominal and ordinal data. In this study it is used for nominal data
representation.

3.3.2. Inferential Statistics

Based on the nature of the study and objective stated in the introduction part the following
relevant inferential statistics were used.

3.3.2.1. Chi Square Tests

Ch-square is used to test whether there is association between the use of family planning and
predictor variables considered in study or not. The chi square test statistics is:

2
(Oij −Eij )
χ 2 =∑ ∑
Eij

With degrees of freedom equal to numbers of categories minus one i.e. (R−1)(C−1)

Where: R is row and C is column

Oij – observed frequency

Eij – expected frequency

The null hypothesis of chi square test is no association between categorical variables while
alternative hypothesis is there is an association. If the calculated value is greater than the
tabulated value reject the null hypothesis and accept the alternative hypothesis.

Basic assumption of chi square test

 The sample must be randomly selected from the population.


 The population must be normally distributed for the variable under study.
 Each cell and every individual object is independent of each other i.e. the observation
should be independent of each other.
 It requires sufficiently large expected frequency for one cell.

3.4 Logistic regression

Logistic regression analysis extends the techniques of multiple regression analysis to research
situations in which the outcome variable is categorical. Generally, the response variable is
binary, such as (use or not use, save or no save, presence or absence, success or failure etc.) in
logistic regression. Further, Logistic regression analysis is statistical technique examines the
influence of various factors on a dichotomous outcome by estimating the probability of the
event’s occurrence. It describes the relationship between a dichotomous response variable and a
set of explanatory variables. The logistic model is special case of generalized linear model [6]
where the assumptions of normality and constant variance of residuals are not satisfied. Binary
(binomial) logistic regression is the form of regression we is used when the dependent variable is
a dichotomous and the predictor variables are of any type.

3.4.1 Basic assumption of logistic regression

Once the model is fitted checking the validity of inferences drawn from modern statistical
modeling techniques depends on the assumptions of the statistical model being satisfied.
Assumptions were should consider for the efficient use of logistic regression as given below [7].
The following are the basic assumptions:

 Logistic regression assumes meaningful coding of the variables. Logistic coefficients


were difficult to interpret if not coded meaningfully. The convention for binomial logistic
regression is to code the dependent class of interest as 1 and the other class as 0.
 Logistic regression does not assume a linear relationship between the dependent and
independent variables.
 The dependent variable must be categorical.
 The independent variables need not be interval, no normally distributed, no linearly
related and no equal variance within each group.
 The groups must be mutually exclusive and exhaustive; a case can only be in one group
and every case must be a member of one of the groups.
 Larger samples are needed than for linear regression because maximum likelihood
coefficients are large sample estimates.
 The logit regression equation should have a linear relationship with the logit form of the
dependent variable.
 Absence of multicollinearity.
 It does not need a linear relationship between the dependent and independent variables.
Logistic regression can handle all sorts of relationships; it applies a nonlinear log
transformation to the predicted odds ratio.
 The error terms need to be binomially distributed.
 The assumption of homoscedasticity is not necessary in logistic regression. Logistic
regression can handle ordinal and nominal data as independent variables.
 Logistic regression requires the dependent variable to be categorical (Mostly binary).

Since logistic regression assumes that P( y =1) is the probability of the event occurring, it is
necessary that the dependent variables are coded accordingly. That is for the factor level I the
dependent variables should represent the desired outcome. Logistic regression assumes linearity
of independent variables and log odds. Otherwise, the logistic regression underestimates the
strength of the relationship and reject the relationship easily, that is being not significant (not
rejecting the null hypothesis) where it should be significant. Logistic regression requires quite
large sample sizes.

3.5 Odds ratio

The odds of an event happenings (e.g. the event that Y = 1) is defined as the ratio of the
probability that the event will occur divided by the probability that the event will not occur. That
is, the odd of event A is given by:

P(A) P( A )
odds ( E )= =
P( A )
'
1−P ( A )

Where A is an event defined related to the use of family planning.


3.6 The Model of Binary Logistic Regression Model

The dependent variable in logistic regression is usually dichotomous, that is, the dependent
variable can take the value 1 with a probability of success π, or the value 0 with probability of
failure 1−π . This type of variable is called a Bernoulli (or binary) variable. As mentioned
previously, the independent or predictor variables in logistic regression can take any form. The
relationship between the predictor and response variables is not a linear function in logistic
regression; instead, the logistic regression function, which is the logit transformation of π, is
used. Consider a collection of p explanatory variables denoted by the vector
X ' =( X 1 , X 2 , … , Xp). Let the conditional probability that the outcome is present be denoted by
P(Y =1∨X )=π .

exp ⁡(β 0+ β1 χ 1 +. ..+ β p χ p )


¿
1+ exp ⁡( β 0 + β 1 χ 1 +. . .+ β p χ p)

Then the logit or log-odds of having Y =1 is modeled as a linear function of the explanatory
variables as:

¿ ( 1−ππ )=β + β χ +. . .+ β χ ; 0 ≤ π ≤ 1
0 1 1 p p

where β 0 is the constant of the equation and, β 1 . . . , β p are the coefficients of the predictor
variables. The above equation is known as the logistic function.

3.7 Method of estimation

The most common method used to estimate the unknown parameters in linear regression is the
Ordinary Least Squares (OLS). Under usual assumptions, least square estimations have some
desirable properties. But when the OLS method is applied to estimate a model with a
dichotomous outcome the estimators no longer have these properties. In such a situation, the
most commonly used method of estimating the parameters of a logistic regression model is the
method of Maximum Likelihood (ML). In logistic regression, the likelihood equations are non-
linear explicit function of unknown parameters. Therefore, we use a very effective and well
known Newton-Raphson iterative method to solve the equations which is known as iteratively
reweighted least squares algorithm. In general, the sample likelihood function is defined as the
joint probability function of the random variables. Specifically, suppose ( y 1 , y2 , . .. , y n) are the n
independent random observations corresponding to the random variables ( y 1 , y2 , . .. , y n) . Since
the Yi is a Bernoulli random variable, the probability function of Yi is
f i ( y i ) =π iyi (1−π i )1− yi ; y i=0∨1; i=0,1, . . .. , n since Y’s are assumed to be independent, the joint
probability function or likelihood function is given by:

n
g ( y 1 , y 2 , .. . , y n )=∏ π iyi (1−π i )1− yi
i=1

the log-likelihood function as:

n n
L ( β 0+ β1 , .. , β p )=∑ y i ( β 0 + β 1 χ 1 +. . .+ β p χ p )−¿ ∑ ¿ {1+exp ( β0 + β 1 χ 1+ .. .+ β p χ p ) } ¿
i =1 i=1

The most effective and well known Newton-Raphson iterative method can solve the equations.

3.8 Test of Overall Model Fit

2
3.8.1 R for Logistic Regression

In logistic regression, there is no true R2 value as there is in OLS regression. However, because
deviance can be thought of as a measure of how poorly the model fits (i.e., lack of fit between
observed and predicted values), an analogy can be made to sum of squares residual in ordinary
least squares. The proportion of unaccounted for variance that is reduced by adding variables to
the model is the same as the proportion of variance accounted for, or R2.

2 −2 LLnull−2 LLk
Rlogistic =
−2 LLnull

where the null model is the logistic model with just the constant and the k model contains all the
predictors in the model. In SPSS, there are two modified versions of this basic idea, one
developed by Cox and Snell and the other developed by Nagelkerke. The Cox and Snell R-
square is computed as follows:
[ ]
2
−2 LLnull
2 n
Cox∧Snell Pseudo−R =1−
−2 LLk

Because this R-squared value cannot reach 1.0, Nagelkerke modified it. The correction increases
the Cox and Snell version to make 1.0 a possible value for Rsquared (Hosmer and Lemeshow,
2000).

[ ]
2
−2 LLnull n
1−
2 −2 LLk
Nagelkerke Pseudo−R =¿ 2
n
1−[ −2 LLnull ]

3.8.2 The Hosmer-Lemeshow

Test In order to find the overall goodness-of-fit, Hosmer and Lemeshow proposed grouping
based on the values of the estimated probabilities. Hosmer-Lemeshow goodness-of-fit test
divides subjects in to deciles based on predicted probabilities and computes a chi-square from
observed and expected frequencies. Using this grouping strategy, the Hosmer-Lemeshow
goodness-offit statistic, Ĉ is obtained by calculating the Pearson chi-square statistic from the g2
table of observed and estimated expected frequencies. A formula defining the calculation of Ĉ is
as follows:

2
g
( o k −n ' k nk )
C=∑
^
k=1 n ' k nk ( 1−nk )

where, g denotes the number of groups, n ' k is the number of observations in the kth group, o k is
the sum of the Y values for the kth group and is the average of the ordered for the kth group.
Hosmer and Lemeshow (1980) demonstrated that under the null hypothesis that the fitted logistic
regression model is the correct model, the distribution of the statistic Ĉ is well approximated by
the chisquare distribution with g-2 degrees of freedom. This test is more reliable and robust than
the traditional chi-square test (Agresti, 2002).

3.8.3 The likelihood ratio test


The likelihood ratio (LR) test is performed by estimating two models and comparing the fit of
one model to the fit of the other. Removing predictor variables from a model will almost always
make the model fit less well (i.e., a model will have a lower log likelihood), but it is necessary to
test whether the observed difference in model fit is statistically significant. The likelihood ratio
test does this by comparing the log likelihoods of the two models, if this difference is statistically
significant, then the less restrictive model (the one with more variables) is said to fit the data
significantly better than the more restrictive model. If one has the log likelihoods from the
models, the likelihood ratio statistic is fairly easy to calculate. The likelihood ratio test is
performed to test the overall significance of all coefficients in the model on the basis of test
statistic:

G 2=[( −2∈Lo )−(−2∈L1 )]

where, Lo is the likelihood of the null model and L1 is the likelihood of the saturated model. The
statistic G 2 plays the same role in logistic regression as the numerator of the partial F-test does in
linear regression. Under the global null hypothesis, H 0 : β 1=β 2=. ..=β p=0 the statistic G2
follows a chi-square distribution with p degrees of freedom and measures how well the
independent variables affect the response variable.

3.9 Tests of a Single Predictor

3.9.1 The Wald test

The Wald test approximates the likelihood ratio test, but with the advantage that it only requires
estimating one model. The Wald test works by testing the null hypothesis that a set of parameters
is equal to some value. If the test fails to reject the null hypothesis, this suggests that removing
the variables from the model will not substantially harm the fit of that model, since a predictor
with a coefficient that is very small relative to its standard error is generally not doing much to
help predict the dependent variable. The Wald statistic is:

[ ]
2

W=
SE ( β^ )
Under the null hypothesis H 0 : βi=0, the square of W , W 2 , is approximately distributed as chi-
square with one degree of freedom.

3.9.2 The Lagrange multiplier or score test

As with the Wald test, the Lagrange multiplier test requires estimating only a single model. The
difference is that with the Lagrange multiplier test, the model estimated does not include the
parameter(s) of interest. The test statistic is calculated based on the slope of the likelihood
function at the observed values of the variables in the model (predictor variables). This estimated
slope or "score" is the reason the Lagrange multiplier test is sometimes called the score test. The
scores are then used to estimate the improvement in model fit if additional variables were
included in the model. Because it tests for improvement of model 33 fit if variables that are
currently omitted are added to the model, the Lagrange multiplier test is sometimes also referred
to as a test for omitted variables.

If the MLE equals the hypothesized value, p0, then p0 would maximize the likelihood and
U ( p0 )=0. The score statistic measures how far from zero the score function is when evaluated at
the null hypothesis. The test statistic is:

2
U ( p 0)
S=
I ( p0 )

The statistic S is approximately distributed as chi-square with one degree of freedom. The
Likelihood Ratio, Wald and score test of the significance of a single predictor are said to be
“asymptotically” equivalent, which means that their significance values will converge with larger
N. With small samples, however, they are not likely to be equal and may sometimes lead to
different statistical conclusions (i.e., significance). The likelihood ratio test for a single predictor
is usually recommended by texts as the most powerful (although some authors have stated that
neither the Wald nor the LR tests are superior). Wald tests are known to have low power (higher
Type II errors) and can be biased when there is insufficient data (i.e., expected frequency is too
low) for each category or value of X. However, very few researchers use the likelihood ratio test
for tests of individual predictors. One reason may be that the statistical packages do not provide
this test for each predictor, making hand computations and multiple analyses necessary. This is
inconvenient, especially for larger models. If the analysis has a large N, researchers are likely to
be less concerned about the differences (Agresti, 2002).

You might also like