When to use_
When to use_
Correlation
- To measure the degree and direction of the linear association between
two variables.
- Variables' Characteristics:
+ Both variables should be quantitative (numerical), either interval or
ratio scale.
+ The relationship between the variables should ideally be linear.
+ Both variables should be independent observations (no
dependent/independent variable distinction, as you are only
examining association).
+ No causation is assumed—correlation only looks at the strength of
the relationship, not how one variable influences another.
Example:
+ Studying the relationship between temperature and ice cream sales.
+ Finding the association between hours studied and test scores.
2. Factor Analysis
- To identify underlying patterns or latent factors in the data by examining
relationships among multiple variables. It reduces dimensionality and
identifies groups of related variables.
- Variables' Characteristics:
+ It requires multiple variables that are continuous (or ordinal that
can be treated as continuous).
+ These variables should exhibit some degree of correlation with
each other (high correlation indicates that factor analysis may be
appropriate).
+ Often assumes that the variables are normally distributed,
especially for some methods of factor extraction (e.g., principal
component or maximum likelihood methods).
+ Used for dimensionality reduction—grouping related variables that
measure the same underlying concept (e.g., survey items measuring
"job satisfaction").
Example:
+ Identifying key factors that explain students’ academic performance (e.g.,
intelligence, motivation, and study habits) from multiple variables.
+ Grouping survey questions into categories such as satisfaction, loyalty,
and experience.
3. Regression
- To explore and model the relationship between a dependent variable and
one or more independent variables. Often used for prediction and
determining the extent to which independent variables influence the
dependent variable.
- Variables' Characteristics:
+ Dependent Variable:
● Must be quantitative (interval or ratio scale) for linear
regression.
● Can be categorical for logistic regression and other variants.
+ Independent Variable:
● Can be quantitative or categorical (categorical variables
typically need to be dummy-coded for linear regression).
+ Assumes a causal relationship between the independent and
dependent variable(s).
+ Assumes a linear relationship between the dependent variable and
the predictors (for linear regression).
+ Observations must be independent, and multicollinearity (high
correlation between predictors) should be avoided.
Example:
+ Predicting house prices based on features like square footage, number of
bedrooms, and location.
+ Analyzing the effect of education level, gender, and work experience on
salary.