0% found this document useful (0 votes)
2 views

Advanced Quantitative Methods

The document outlines advanced quantitative methods, focusing on descriptive statistics, hypothesis testing, and regression analysis. It discusses various statistical concepts such as confidence intervals, types of data, and the importance of sample size, as well as methods like ANOVA and Structural Equation Modeling (SEM). Key points include the significance of p-values, the Central Limit Theorem, and the relationship between variables in regression models.

Uploaded by

saina
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Advanced Quantitative Methods

The document outlines advanced quantitative methods, focusing on descriptive statistics, hypothesis testing, and regression analysis. It discusses various statistical concepts such as confidence intervals, types of data, and the importance of sample size, as well as methods like ANOVA and Structural Equation Modeling (SEM). Key points include the significance of p-values, the Central Limit Theorem, and the relationship between variables in regression models.

Uploaded by

saina
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

pAdvanced Quantitative Methods

Descriptive statistics
 Understand the theory first from the literature that you are reading
 Price and quantity are inversely related – law of demand
 One man’s idea then becomes a statement and the further it
spreads – it becomes a conjucture
 Concepts – Abstract stage. Factors/Constructs/Latent Variable
 Ability – construct, IQ – item/ measured variable
 99% interval – while sampling a. population, you are not able to
sample all 1000 max 500
 Construct – something that you don’t know how to measure or can
be measured in multiple ways
 Moderating variable can have different effect on different groups or
categories.

Categorical:
Nominal- the label changed into a number.
Ordinal – It’s in an order but you can’t do any mathematical
operations on ordinal or nominal.
They are discrete

Scale:
Interval – They do not have true zero and doesn’t have a ratio. 20
degrees doesn’t mean its twice as hot as 10 degrees.
Ratio – They have a degree to it; all mathematical operations can be
done. Here 10% is half of 20%.

YOU CAN’T TRIM MORE THAN 20% OF DATA EVEN IF IT’S A VERY BIG
DATA SET.
Percentile – gives you the location

Standard variation – average deviation/dispersion from the mean


The mean absolute deviation (MAD) is a measure of variability
that indicates the average distance between each observation and
the mean.
Knowing whether your data is normal or not helps with inference –
confidence interval and hypotheses
Skewness is a measure of the asymmetry of a distribution.
Kurtosis – volatility of data
Data must be mesokurtic, low- platykurtic, high- leptokurtic
Standard error mean – dispersion between the samples. I would
want this to be 0 or as low as possible.

Very rare when weight, frequency or by is used in JMP.


Bar graph – qualitative data, histogram – ratio data

After running JMP, first we quote the data.


If the standard deviation is too close to the range, then it’s too high.

Upper and lower 95% mean – exactly 95% of the population’s mean
will most likely lie between these two values
As my confidence interval value increases, my prediction accuracy
decreases.
95% CI means if I had taken 100 samples, then there is a high
chance that average mean lies in the range between upper and
lower 95%.
Confidence interval and precision are inversely related. CI can be
subjective.

Central Limit theorem – If your sample size is large, n >30, this


means your sampling distribution becomes more and more normally
distributed.

HYPOTHESIS
If equal to sign is used it’s a two tailed test, otherwise one-tailed
test.
Univariant analysis – just one characteristic.
Parametric test – assuming that population is normal – use z or t
test.
Non-parametric test – you click on Wilcoxon Signed rank box on JMP

P value tells you the actual level of significance, if p value is less


than 5%(default) then whatever the null was considered is rejected.

When we do a two tailed test, 2p is compared with alpha.


Don’t write that null is accepted, always write rejected or not
rejected because while there is a high chance our calculation is
correct, our confidence interval is never 100%.
If null is rejected, then its colored on JMP.
Only if the sample mean is greater than your hypothesized mean
then it is a right tail rejection (null – age < 50) otherwise it’s a left
tail rejection (null – age>50).

In social science fields we want more inclusivity so 99% confidence


but in the medical field we need more precision so we can use 95%
or 90% confidence.

If we want null to be accepted, then we will choose a lower alpha


and if you want the null to be rejected then we will choose a higher
alpha.

Chi square test – nominal data, positively skewed.

1. T- test: 2 categories
2. One-way anova: more than 2
3. Two-way anova? – more than 2 categories

Fit y by x is used when I have a single dependent variable (BMI) y and I


have a single independent variable (gender).

If p value is greater than 0.05 then null is accepted.


For two-way Anova – add cross in JMP because you want, he interaction
between two variables.

Effects test, gender and drug separate are main effects, and the last one
is called as interaction effect.

When my correlation is 0, this indicates there is no linear relationship.

Linear regression - Least square method tries to minimize the sum of the
squared errors.

Correlation: Analyze – Multivariate

If its low, then there is a high chance it could be non-linear.

Steep fit lines show stronger positive or negative relationships, flatline


means it’s a weak relationship.

Fit model

You use cross in fit model if one of your x variables is categorical.

Used for linear regression

R square – how fit the model is; how much of the variation in BP is
explained by cholesterol and age. Here, age and cholesterol only explain
13.8% OF BP THEREFORE MY RGERESSION WASN’T VERY SUCCESFUL, ITS
NOT A VERY FITTING MODEL.

RMSE – an important measure to see how fit the model is; to check the
accuracy of the model. As close to 0 as possible

If age and cholesterol are equal to 0 then we can expect the BP to be


67.10
If my age changes by 1unit, on an average I can expect my BP to go up by
0.308 keeping other factors constant.

If the total cholesterol changes by 1 unit, then the change in BP will go up


by 0.066 keeping all other factors constant.

Hypothesis here: if age=0 or not, same for the rest is what is done in
probability section. Here, alternate is true since it’s all below 0.05.

Here, intercept is significant, there is no theoretical justification. So, in my


next model I will remove the intercept and run the model.

1. Another solution here is to increase your sample size but it’s not a
plausible solution.

2. Another solution would be to increase my independent variables.

3. Another would be to include a categorical variable.

Adjusted R square tells you if you should be adding your explanatory


variable to a model – your adjusted R square should go up from the
previous one.

Adj R square is very rigid hence even a 0.1 variation is good. It tells us if
we have underfit or overfit a model.

R square tells the degree of freedom; the more you identify from a data
the more it loses its freedom.

3 of my data from 442 has become rigid (n - 1 = 2).

If you don’t include intercept, then you can compute r square or adj r
square.
Since my RMSE has gone up from my previous model, this is not a better
model.

Typically, when you add a variable, it could make another variable


insignificant and multi-collinearity could happen.

Colored prob means null is not accepted, hence its significant.

Categorical variable – dummy variable

Given that age, cholesterol are 0, for gender=0 , BP = 49.928

Given that age, cholesterol are 0, for gender= 1, BP = 49.928-2.371 =


47.557

Structural Equation Modelling


Latent variable – hidden or cannot be measured but can be observed
using other variables (decision making, financial literacy)

Difference from linear regression – do not have latent variables

SEM resolves multicollinearity

Measured variables called items are also in it.

Latent variables are either reflective (JMP) or formative model.

Reflective model – Basically the ways how it is reflected using other


variables, these variables will have high correlation.

Formative model – formed by. They need not be correlated.


You can have multiple independent and dependent variables. Latent
variables can be measured by latent variables.

Linear regression – 1 dependent variable

Structural model – relationship between two latent variables.

Confirmatory Factor Analysis – CFA will confirm whether all latent variables
are properly constructed. Analyze – multivariate – SEM

First you have to form the diagram. Then run it

1. Model comparison:

 Unrestricted model – apparently the best fitting model in the entire


data. All variables can interact with each other.
 Independence model – Worst model
 Model 1(our model: we name it) – lies in between the best and worst
fitting model

-2 log likelihood tells you how worst fitting the model. Higher the
number the worse it is. This gives you an implication of how good your
model based on its closeness to unrestricted model.

AICc and BIC tells you good fitting your model. Lower value implies
better fitting model.

Chi Square in the model tells you if it’s a good fitting model. You can’t
tell anything form the value, so you have to check if null is accepted or
not. Here it if it’s a good fitting model. 0.7624 means null is accepted
hence it’s a good fitting model.
Comparative Fit Index (CFI) – closer to 1 (0.9-1 preferably)

Root Mean Square Error Approximate (RMSEA) – should be less than


0.1. Here, its 0 hence it’s a good fitting model.

Heat map – if there’s too many dark red or blue boxes then its heated.
We want a cooled map.

Assess measurement model – only visible for CFA. It tells you about the
reliability and validity of the model. Croneback alpha can also be used
for this.

If all the items are surpassing the threshold level, then all the items are
individually reliable. We will drop the question if it’s too far from the
data. Reliability indicates consistency.

If the latent variable is crossing the threshold line. Here we will accept
as it’s almost meeting the red line.
If my values above the diagonal is greater than on the diagonal, it
indicates high validity. There will be nothing below the diagonal.

Bad model:

- change your latent variable: Here we could drop one of the latent
variables like leadership. We will look at correlation to judge. For
example, the variables that come under leadership, one might be
negative, or the correlation might be weak.

2. Parameter estimates

1 on the first factor by default - Factor loading – my factor is


represented or loaded against goal, work and interact.
You can’t interpret factor loading so you change it to correlation by
going to show – show estimates – make it standardized (its
unstandardized by default)

Positive and high correlations. If I am seeing a very high correlation


between latent variables then it might be a problem later. It might
indicate multicollinearity.
SEM

For SEM you must interconnect your independent variables with one
another. If there many dependent variables you must interconnect them
also.

Chi square – null is accepted hence it’s a good model. All other criterions
are useless. is useless. RMSEA almost equal to 0 then good model.
1 unit in leadership will lead to a 0.634 unit avg increase in satisfaction,
keeping conflict constant.

1 unit change in conflict will lead to 0.357 fall in satisfaction, keeping


leadership constant

Solid line – significant, dashed line – insignificant.

Here null is if beta is equal to 0 or not.

Mid-term:

Comparing mean – use Anova

t-test: pair wise comparison

two-way Anova – single independent variable and 2 x factors

- Here, null is mean across all six categories are equal to another

You might also like