Machine Learning
Machine Learning
A labelled dataset is one that has both input and output parameters.
The algorithm tries to learn the relationship between the input and output
data so that it can make accurate predictions on new, unseen data.
Types of Supervised Learning
Regression
Used to predict continuous numerical values based on input
features.
Simple Linear Regression This is the simplest form of linear regression, and
it involves only one independent variable and one dependent variable. The
equation for simple linear regression is:
Multiple Linear Regression This involves more than one independent
variable and one dependent variable. The equation for multiple linear
regression is:
Best Fit Line : Our primary objective while using linear regression is
to locate the best-fit line, which implies that the error between the
predicted and actual values should be kept to a minimum.
The best Fit Line equation provides a straight line that represents
the relationship between the dependent and independent
variables.
Problem
Given below data build a machine learning model that can predict home prices based on
square feet area (use linear regression)
Logistic Regression
Logistic regression is used for binary classification where we use sigmoid
function, that takes input as independent variables and produces a
probability value between 0 and 1.
K-NN algorithm stores all the available data and classifies a new data
point based on the similarity. This means when new data appears then it
can be easily classified into a well suite category by using K- NN
algorithm.
Among these k neighbours, count the number of the data points in each category.
Assign the new data points to that category for which the number of the neighbor is
maximum.
Decision Tree?
A decision tree is a non-parametric supervised learning algorithm,
which is utilised for both classification and regression tasks.
Selecting the Best Attribute: Using a metric like Gini impurity, entropy, or information
gain, the best attribute to split the data is selected.
Splitting the Dataset: The dataset is split into subsets based on the selected attribute.
Repeating the Process: The process is repeated recursively for each subset, creating a
new internal node or leaf node until a stopping criterion is met.
Random Forest
A random forest is an ensemble
learning method that combines the
predictions from multiple decision trees
to produce a more accurate and
stable prediction.
Models itself find the hidden patterns and insights from the given data.
To find the underlying structure of dataset, group that data according to similarities,
Unsupervised learning is helpful for finding useful insights from the data
We do not always have input data with the corresponding output so to solve such cases,
we need unsupervised learning.
Unsupervised Machine Learning
Types of Unsupervised Learning
Algorithm:
Clustering
Clustering is a method of grouping the objects into
clusters such that objects with most similarities remain
into a group
Compute the distance of every point from the centroids and the cluster accordingly
Adjust the centroids so that they become the centre of gravity for the given cluster
Again re-cluster every point based on their distance with the centroids
Recompute the cluster and repeat this till the data points stop changing the cluster
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Clustering
Elbow Methods
Elbow Methods
Association
An association rule is an unsupervised learning method
which is used for finding the relationships between
variables in the large database.