Presentación Modelo 4
Presentación Modelo 4
6. No multicollinearity
VARIABLE CODE STORAGE TYPE VARIABLE LABEL
Area Byte Area
Dataset Ciudad
Conglomerado
long
Text
City (código)
Conglomerate
Panelm Text Panelm
Vivienda Text Housing
P02 Byte Gender
P03 Byte Age
P06 Byte marital status
P11 Byte Knows how to read and write
Labor Market Variables of Model P12a Byte Did you receive diploma for studies
P12b Long Level of education
VARIABLE STORAGE VARIABLE
CODE P15 Byte Ethnicity
TYPE LABEL P20 – P43 Byte Employment variables
Byte Gender P44a – P44k Byte, Int Social and labor benefits
Gender
P44f Int Recibe Seguro Social
Float Age P63 – P78 Byte, Int Income Variable
Age
Float Level of Sd01 – ced01a Byte – int Unemployment variables
Education education Ced01a Byte Have Identification card
Float Labor market nnivins Byte Level of education
Occupation activity Ingrl Long Labor income
Float Years working
Ingpc Dobule Per capita income
agejob Condact Byte Labor Activity condition
Is this plausible for the proposed application to Social
security affiliation? (INEC, 2023)
• According to the National Employment, Unemployment and Underemployment Survey (INEC, 2023),
the dataset and proposed model in class fit in the context of social security affiliation, since the
variables on the survey just as employment, social and labor benefits, labor activity condition and
others interact in the way that is possible to develop the preliminary estimate presented in class.
• Based on government literature is possible to suggest the characteristics of the labor market
considered in the model do not influence the error term.
Jobincome, Jobtype and diploma were identified as relevant variables that could
have been omitted in the model developed in class.
3A. Justify the additional variables of the
model and implement their computation in
Stata.
Density
1. 3. log_jobincome
Diploma Jobtype
.6
.4
.2
frequency
frequency
25,000 3,000
0
0 5 10 15
log_jobincome
20,000
2,000
15,000
10,000
1,000
5,000
0 0
0 1 Appointment
Permanent Contract
Temporary Contract Per job Per hour Per day
3C. Reestimate the model seen
in class with the variables in The of both models
increased to 0,5621 and
question and interpret the 0.5509. The difference is
not significant.
resulting marginal effects
Logit model Probit Model
In the logit Model, with
a SL of 5%, we conclude
the new variables are
significant for the
model.
Occupation -0,1024 -0,0020 Yes Yes Occupation -0,09107 -0,0134 Yes Yes
• The sign changed for the variables gender in LM and job years in the PM marginal effects
• gender and job years is no longer significant in the probit model
• big decrease on magnitude for significant and not significant variables.
4. Assume that there is (at least) one factor
omitted in the regression developed in
point 3. Document with scientific literature
what types of factors can motivate the
omission (use references in APA format).
Document with scientific
literature what types of factors
can motivate the omission Some types of factors that
(Hanck & Arnold, 2023)
(Buck, 2015)
(Granados, 2016)
may motivate the omission
An omitted factor in
1. Factors that are unknown or
the regression is a ignored due to lack of
variable that is not The omission of these information or prior
included in the model, factors can cause bias knowledge.
but that has a in the estimators of the 2. Factors that are deliberately
excluded for practical reasons,
significant effect on regression coefficients such as model simplicity, data
the dependent variable and affect the validity availability, or computational
and is correlated with of the inferences cost.
one of the independent 3. Unobservable or difficult-to-
measure factors, such as
variables motivation, ability, or
individuals preference.
5. Pose (formally) a model that is capable
of dealing with the omission indicated in
point 4, as well as the assumptions that
justify it (use references in APA format).
Pose (formally) a model that is
capable of dealing with the omission
indicated in point 4, as well as the
assumptions that justify it
In order to deal with omitted variables, one could choose to employ a multiple
regression approach that incorporates all relevant variables in the analysis.
This model should take into account the correlation between the independent
variables to provide a more accurate representation of the underlying
relationship (Middela & Ramadurai, 2024). The model would be as follows:
𝑦 = 𝛽0 + 𝛽1 𝑥 1 + 𝛽 2 𝑥 2 +…+ 𝛽 𝑘 𝑥 𝑘+ 𝜖
Where:
• is the dependent variable
• are the explanatory variables
• are the coefficients of the regression
• is the random error
1. Once the model was proposed with the new variables, a 2. Then, to confirm the situation, a correlation test was
particular case was identified with the gender variable, which performed to verify multicollinearity between the
was not significant for the model. explanatory variables (ÇAĞLAYAN, 2012).
and justified.
Logit model and marginal effects
The marginal effects of
gender, occupation,
log_jobincome, jobtype
and diploma are significant.
and justified.
Probit model and marginal effects
The marginal effects of all
labor market variables are
significant.
• In terms of significant variables, • In terms of significant variables, • Both Models indicate that the
jobyears was the only variable in gender and jobyears became regression models used in point 3
both models to not be significant, significant in point 6 while in point and point 6 are similar in their
including instruction in point 6. 3 they weren’t. ability to explain affiliation to
• The Pseudo R2 value didn’t differ • The Pseudo R2 value didn’t differ social security. If the models are
for a significant amount (0,5621 for a significant amount (0,5509 evaluated by their Pseudo R2, the
to 0,5607), neither the Log to 0,5493), neither the Log first model seems to fit the data
pseudo likelihood difference (- pseudo likelihood difference (- better.
2083 to -2090). 2136 to -2144). • It is emphasized that the
• On the other hand, if they are • On the other hand, if they are differences are minimal and that
evaluated according to their Log evaluated according to their Log these values are only a rough
pseudo-likelihood, the second pseudo-likelihood, the second measure of the quality of the
model seems to be a better fit. model seems to be a better fit. model and do not provide a
complete indication of its
accuracy or validity.