Advanced Quantitative Methods
Advanced Quantitative Methods
Descriptive statistics
Understand the theory first from the literature that you are reading
Price and quantity are inversely related – law of demand
One man’s idea then becomes a statement and the further it
spreads – it becomes a conjucture
Concepts – Abstract stage. Factors/Constructs/Latent Variable
Ability – construct, IQ – item/ measured variable
99% interval – while sampling a. population, you are not able to
sample all 1000 max 500
Construct – something that you don’t know how to measure or can
be measured in multiple ways
Moderating variable can have different effect on different groups or
categories.
Categorical:
Nominal- the label changed into a number.
Ordinal – It’s in an order but you can’t do any mathematical
operations on ordinal or nominal.
They are discrete
Scale:
Interval – They do not have true zero and doesn’t have a ratio. 20
degrees doesn’t mean its twice as hot as 10 degrees.
Ratio – They have a degree to it; all mathematical operations can be
done. Here 10% is half of 20%.
YOU CAN’T TRIM MORE THAN 20% OF DATA EVEN IF IT’S A VERY BIG
DATA SET.
Percentile – gives you the location
Upper and lower 95% mean – exactly 95% of the population’s mean
will most likely lie between these two values
As my confidence interval value increases, my prediction accuracy
decreases.
95% CI means if I had taken 100 samples, then there is a high
chance that average mean lies in the range between upper and
lower 95%.
Confidence interval and precision are inversely related. CI can be
subjective.
HYPOTHESIS
If equal to sign is used it’s a two tailed test, otherwise one-tailed
test.
Univariant analysis – just one characteristic.
Parametric test – assuming that population is normal – use z or t
test.
Non-parametric test – you click on Wilcoxon Signed rank box on JMP
1. T- test: 2 categories
2. One-way anova: more than 2
3. Two-way anova? – more than 2 categories
Effects test, gender and drug separate are main effects, and the last one
is called as interaction effect.
Linear regression - Least square method tries to minimize the sum of the
squared errors.
Fit model
R square – how fit the model is; how much of the variation in BP is
explained by cholesterol and age. Here, age and cholesterol only explain
13.8% OF BP THEREFORE MY RGERESSION WASN’T VERY SUCCESFUL, ITS
NOT A VERY FITTING MODEL.
RMSE – an important measure to see how fit the model is; to check the
accuracy of the model. As close to 0 as possible
Hypothesis here: if age=0 or not, same for the rest is what is done in
probability section. Here, alternate is true since it’s all below 0.05.
1. Another solution here is to increase your sample size but it’s not a
plausible solution.
Adj R square is very rigid hence even a 0.1 variation is good. It tells us if
we have underfit or overfit a model.
R square tells the degree of freedom; the more you identify from a data
the more it loses its freedom.
If you don’t include intercept, then you can compute r square or adj r
square.
Since my RMSE has gone up from my previous model, this is not a better
model.
Confirmatory Factor Analysis – CFA will confirm whether all latent variables
are properly constructed. Analyze – multivariate – SEM
1. Model comparison:
-2 log likelihood tells you how worst fitting the model. Higher the
number the worse it is. This gives you an implication of how good your
model based on its closeness to unrestricted model.
AICc and BIC tells you good fitting your model. Lower value implies
better fitting model.
Chi Square in the model tells you if it’s a good fitting model. You can’t
tell anything form the value, so you have to check if null is accepted or
not. Here it if it’s a good fitting model. 0.7624 means null is accepted
hence it’s a good fitting model.
Comparative Fit Index (CFI) – closer to 1 (0.9-1 preferably)
Heat map – if there’s too many dark red or blue boxes then its heated.
We want a cooled map.
Assess measurement model – only visible for CFA. It tells you about the
reliability and validity of the model. Croneback alpha can also be used
for this.
If all the items are surpassing the threshold level, then all the items are
individually reliable. We will drop the question if it’s too far from the
data. Reliability indicates consistency.
If the latent variable is crossing the threshold line. Here we will accept
as it’s almost meeting the red line.
If my values above the diagonal is greater than on the diagonal, it
indicates high validity. There will be nothing below the diagonal.
Bad model:
- change your latent variable: Here we could drop one of the latent
variables like leadership. We will look at correlation to judge. For
example, the variables that come under leadership, one might be
negative, or the correlation might be weak.
2. Parameter estimates
For SEM you must interconnect your independent variables with one
another. If there many dependent variables you must interconnect them
also.
Chi square – null is accepted hence it’s a good model. All other criterions
are useless. is useless. RMSEA almost equal to 0 then good model.
1 unit in leadership will lead to a 0.634 unit avg increase in satisfaction,
keeping conflict constant.
Mid-term:
- Here, null is mean across all six categories are equal to another