ML NOTES
ML NOTES
Factor analysis is a potent statistical method for comprehending complex datasets’ underlying
structure or patterns. Its primary objective is to condense many observed variables into a smaller set
of unobserved variables called factors. These factors aim to capture the essential information from
the original variables, simplifying the understanding and interpretation of data
Factor analysis hinges on several pivotal concepts underpinning its functionality and application
across diverse domains. Understanding these core concepts is fundamental to grasping the essence
of this statistical technique.
• Variables: These are the measurable quantities or items used in an analysis, such as survey
responses, test scores, or economic indicators.
• Observed Data: The data matrix containing measurements or responses across multiple
variables for each observation or individual.
• Confirmatory Factor Analysis (CFA): Validates pre-existing theories or hypotheses about the
structure of relationships among variables by testing and confirming a specific factor
structure.
• Objective: FA aims to identify latent factors that underlie the observed variables. It assumes
that these unobserved factors influence observed variables and are related to each other
and the observed variables.
• Usage: It is often used in social sciences, psychology, and market research to identify
underlying constructs, understand relationships between variables, and uncover hidden
patterns in data.
Factor Analysis finds wide-ranging applications across numerous fields due to its ability to unveil
underlying structures within complex datasets. Some prominent areas where Factor Analysis is
extensively utilized include:
See also K-Nearest Neighbours Explained, Practical Guide & How To Tutorial In Python
1. Social Sciences
• Financial Analysis: Reducing many financial indicators into key underlying factors influencing
market performance or economic trends.
• Test Development: Validating test items and determining underlying constructs measured by
assessments.
• Opinion Polls and Surveys: Analyzing public opinions or perceptions on social, political, or
environmental issues.
Dimension Reduction-
• It is a process of converting a data set having vast dimensions into a data set with lesser
dimensions.
• It ensures that the converted data set conveys similar information concisely.
Example-
In machine learning,
• We convert the dimensions of data from 2 dimensions (x1 and x2) to 1 dimension (z1).
• It makes the data relatively easier to explain.
Benefits-
• It compresses the data and thus reduces the storage space requirements.
• It reduces the time required for computation since less dimensions require less computation.
Least Square method is a fundamental mathematical technique widely used in data analysis,
statistics, and regression modeling to identify the best-fitting curve or line for a given set of data
points. This method ensures that the overall error is reduced, providing a highly accurate model for
predicting future data trends.
In statistics, when the data can be represented on a cartesian plane by using the independent and
dependent variable as the x and y coordinates, it is called scatter data. This data might not be useful
in making interpretations or predicting the values of the dependent variable for the independent
variable. So, we try to get an equation of a line that fits best to the given data points with the help
of the Least Square Method.
In this article, we will learn the least square method, its formula, graph, and solved examples on it.
Table of Content
Least Square Method is used to derive a generalized linear equation between two variables. when
the value of the dependent and independent variable is represented as the x and y coordinates in a
2D cartesian coordinate system. Initially, known values are marked on a plot. The plot obtained at
this point is called a scatter plot.
Then, we try to represent all the marked points as a straight line or a linear equation. The equation
of such a line is obtained with the help of the Least Square method. This is done to get the value of
the dependent variable for an independent variable for which the value was initially unknown. This
helps us to make predictions for the value of dependent variable.
Least Squares method is a statistical technique used to find the equation of best-fitting curve or line
to a set of data points by minimizing the sum of the squared differences between the observed values
and the values predicted by the model.
This method aims at minimizing the sum of squares of deviations as much as possible. The line
obtained from such a method is called a regression line or line of best fit.
Formula for Least Square Method
Least Square Method formula is used to find the best-fitting line through a set of data points. For a
simple linear regression, which is a line of the form y=mx+c, where y is the dependent variable, x is
the independent variable, a is the slope of the line, and b is the y-intercept, the formulas to calculate
the slope (m) and intercept (c) of the line are derived from the following equations:
The steps to find the line of best fit by using the least square method is discussed below:
• Step 1: Denote the independent variable values as xi and the dependent ones as yi.
• Step 3: Presume the equation of the line of best fit as y = mx + c, where m is the slope of the
line and c represents the intercept of the line on the Y-axis.
c = Y – mX
Thus, we obtain the line of best fit as y = mx + c, where values of m and c can be calculated from the
formulae defined above.
These formulas are used to calculate the parameters of the line that best fits the data according to
the criterion of the least squares, minimizing the sum of the squared differences between the
observed values and the values predicted by the linear model.
Let us have a look at how the data points and the line of best fit obtained from the Least Square
method look when plotted on a graph.
The red points in the above plot represent the data points for the sample data
available. Independent variables are plotted as x-coordinates and dependent ones are plotted as y-
coordinates. The equation of the line of best fit obtained from the Least Square method is plotted as
the red line in the graph.
We can conclude from the above graph that how the Least Square method helps us to find a line
that best fits the given data points and hence can be used to make further predictions about the
value of the dependent variable where it is not known initially.
The Least Square method assumes that the data is evenly distributed and doesn’t contain any
outliers for deriving a line of best fit. But, this method doesn’t provide accurate results for unevenly
distributed data or for data containing outliers.
E
MARKOV CHAIN MONTE CARLO METHODS
REINFORCEMENT LEARNING