Income Tax
Income Tax
Correlation is a statistical measure that expresses the extent to which two variables are linearly
related (meaning they change together at a constant rate). It’s a common tool for describing simple
relationships without making a statement about cause and effect.
The sample correlation coefficient, r, quantifies the strength of the relationship. Correlations are also
tested for statistical significance.
Correlation refers to the statistical relationship between two entities. In other words, it's how two
variables move in relation to one another. Correlation can be used for various data sets, as well. In
some cases, you might have predicted how things will correlate, while in others, the relationship will
be a surprise to you. It's important to understand that correlation does not mean the relationship is
causal.
To understand how correlation works, it's important to understand the following terms:
Positive correlation: A positive correlation would be 1. This means the two variables moved
either up or down in the same direction together.
Negative correlation: A negative correlation is -1. This means the two variables moved in
opposite directions.
Zero or no correlation: A correlation of zero means there is no relationship between the two
variables. In other words, as one variable moves one way, the other moved in another
unrelated direction.
Types of Correlation
The scatter plot explains the correlation between the two attributes or variables. It represents
how closely the two variables are connected. There can be three such situations to see the
relation between the two variables –
Positive Correlation – when the values of the two variables move in the same direction so that
an increase/decrease in the value of one variable is followed by an increase/decrease in the
value of the other variable.
Negative Correlation – when the values of the two variables move in the opposite direction so
that an increase/decrease in the value of one variable is followed by decrease/increase in the
value of the other variable.
No Correlation – when there is no linear dependence or no relation between the two variables.
Pearson Correlation Coefficient Formula
The most common formula is the Pearson Correlation coefficient used for linear dependency
between the data sets. The value of the coefficient lies between -1 to +1. When the coefficient
comes down to zero, then the data is considered as not related. While, if we get the value of +1,
then the data are positively correlated, and -1 has a negative correlation.
What Is a Regression?
Regression is a statistical method used in finance, investing, and other disciplines that attempts to
determine the strength and character of the relationship between one dependent variable (usually
denoted by Y) and a series of other variables (known as independent variables).
Also called simple regression or ordinary least squares (OLS), linear regression is the most common
form of this technique. Linear regression establishes the linear relationship between two variables
based on a line of best fit. Linear regression is thus graphically depicted using a straight line with the
slope defining how the change in one variable impacts a change in the other. The y-intercept of a
linear regression relationship represents the value of one variable when the value of the other is
zero. Non-linear regression models also exist, but are far more complex.
Regression analysis is a powerful tool for uncovering the associations between variables observed in
data, but cannot easily indicate causation. It is used in several contexts in business, finance, and
economics. For instance, it is used to help investment managers value assets and understand the
relationships between factors such as commodity prices and the stocks of businesses dealing in
those commodities.
Regression as a statistical technique should not be confused with the concept of regression to the
mean (mean reversion).
KEY TAKEAWAYS
A regression model is able to show whether changes observed in the dependent variable are
associated with changes in one or more of the explanatory variables.
It does this by essentially fitting a best-fit line and seeing how the data is dispersed around
this line.
Regression helps economists and financial analysts in things ranging from asset valuation to
making predictions.
In order for regression results to be properly interpreted, several assumptions about the
data and the model itself must hold.
Regression analysis is used for prediction and forecasting. This has substantial overlap with the field
of machine learning. This statistical method is used across different industries such as,
Financial Industry- Understand the trend in the stock prices, forecast the prices, and evaluate risks in
the insurance domain
Marketing- Understand the effectiveness of market campaigns, and forecast pricing and sales of the
product.
Manufacturing- Evaluate the relationship of variables that determine to define a better engine to
provide better performance
Medicine- Forecast the different combinations of medicines to prepare generic medicines for
diseases.
Linear regression is the most basic and commonly used predictive analysis. One variable is
considered to be an explanatory variable, and the other is considered to be a dependent variable.
For example, a modeler might want to relate the weights of individuals to their heights using a linear
regression model.
Simple linear regression
Factor analysis is a statistical technique that reduces a set of variables by extracting all their
commonalities into a smaller number of factors. It can also be called data reduction.
When observing vast numbers of variables, some common patterns emerge, which are known as
factors. These serve as an index of all the variables involved and can be utilized for later analysis.
Factor analysis uses several assumptions:
Absence of multicollinearity
Therefore, it becomes a statistical technique used to see how a group shares a common variance.
While it is mostly used in psychological research, it can also be applied in areas like business and
market study to understand customer satisfaction or employee job satisfaction and in finance, to
study the fluctuation of stock prices.
While studying customer satisfaction related to a product, a researcher will usually pose several
questions about the product through a survey. These questions will consist of variables regarding the
product’s features, ease of purchase, usability, pricing, visual appeal, and so forth. These are
typically quantified on a numeric scale. But, what a researcher looks for is the underlying dimensions
or “factors” regarding customer satisfaction. These are mostly psychological or emotional factors
toward the product that cannot be directly measured. Factor analysis uses the variables from the
survey to determine them indirectly.
Exploratory Factor Analysis: In exploratory factor analysis, the researcher does not make
any assumptions about prior relationships between factors. In this method, any variable can
be related to any factor. This helps identify complex relationships among variables and
group them based on common factors.
Confirmatory Factor Analysis: The confirmatory factor analysis, on the other hand, assumes
that variables are related to specific factors and uses pre-established theory to confirm its
expectations of the model.
Factor analysis makes use of several assumptions in order to produce the outcomes:
The sample size will be greater than the size of the factor.
Since the method is interdependent, there will be no perfect multicollinearity between any
of the variables.
When in a sequence of random variables, all the variables have the same finite variance,
known as being homoscedastic. Since factor analysis works as a linear function, it will not
need homoscedasticity between variables.
There is the assumption of linearity. This means that even non-linear variables can be used,
but once transferred, they become linear variables.
Cluster Analysis
Cluster analysis foundations rely on one of the most fundamental, simple and very often
unnoticed ways (or methods) of understanding and learning, which is grouping “objects” into
“similar” groups. This process includes a number of different algorithms and methods to
make clusters of a similar kind. It is also a part of data management in statistical analysis.
When we try to group a set of objects that have similar kind of characteristics, attributes these
groups are called clusters. The process is called clustering. It is a very difficult task to get to
know the properties of every individual object instead, it would be easy to group those similar
objects and have a common structure of properties that the group follows.
Discriminant Analysis
Discriminant analysis (DA) is a multivariate technique which is utilized to divide two or more groups
of observations (individuals) premised on variables measured on each experimental unit (sample)
and to discover the impact of each parameter in dividing the groups.
In addition, the prediction or allocation of newly defined observations to previously specified groups
may be examined using a linear or quadratic function for assigning each individual to existing groups.
This can be done by determining which group each individual belongs to.
A system for determining membership in a group may be constructed using discriminant analysis.
The method comprises a discriminant function (or, for more than two groups, a set of discriminant
functions) that is premised on linear combinations of the predictor variables that offer the best
discrimination between the groups. If there are more than two groups, the model will consist of
discriminant functions. After the functions have been constructed using a sample of instances for
which the group membership is known, they may be applied to fresh cases that contain
measurements for the predictor variables but whose group membership is unknown.
Assumptions
It is presumable that cases cannot correspond to more than one group since group
membership is considered mutually exclusive (that is, no case belongs to more than one
group) (that is, all cases are members of a group).
Types
Often known as LDA, is a supervised approach that attempts to predict the class of the Dependent
Variable by utilizing the linear combination of the Independent Variables. It is predicated on the
hypothesis that the independent variables have a normal distribution (continuous and numerical)
and that each class has the same variance and covariance. Both classification and conditionality
reduction may be accomplished with the assistance of this method.
It is a subtype of Linear Discriminant Analysis (LDA) that uses quadratic combinations of independent
variables to predict the class of the dependent variable. The assumption of the normal distribution
is maintained. Even if it does not presume that the classes have an equal covariance. The QDA
produces a quadratic decision boundary.
Application
Not only is it possible to solve classification issues using discriminant analysis. It also makes it
possible to establish the informativeness of particular classification characteristics and assists in
selecting a sensible set of geophysical parameters or research methodologies.
Businesses use discriminant analysis as a tool to assist in gleaning meaning from data sets. This
enables enterprises to drive innovative and competitive remedies supporting the consumer
experience, customization, advertising, making predictions, and many other common strategic
purposes.
The human resources function is to evaluate potential candidates’ job performance by using
background information to predict how well candidates would perform once employed.
Based on many performance metrics, an industrial facility can forecast when individual machine
parts may fail or require maintenance.
The ability to anticipate market trends that will have an impact on new products or services is
required for sales and marketing.
Multidimensional Scaling
Multidimensional Scaling is a family of statistical methods that focus on creating mappings of items
based on distance. Inside Multidimensional Scaling, there are methods for different types of data:
Individual Differences Scaling is a type of Multidimensional Scaling that applies when you
have multiple (different) estimates of the distances between items. This is often the case
when multiple individuals each give an estimation of the distances between all the pairs.
The most interesting (statistically speaking) applications of Multidimensional Scaling are those in
which you have multiple participants giving (slightly or very) different estimates. Therefore, I will go
a bit faster with Metric and Nonmetric Multidimensional Scaling and I’ll spend a bit more time on
Individual Differences Scaling and Multidimensional Analysis of Preference.