4 ML
4 ML
learning
Regression analysis is a statistical method to model the relationship between a
dependent (target) and independent (predictor) variables with one or more independent
variables. More specifically, Regression analysis helps us to understand how the value of
the dependent variable is changing corresponding to an independent variable when
other independent variables are held fixed. It predicts continuous/real values such
as temperature, age, salary, price, etc.
We can understand the concept of regression analysis using the below example:
Now, the company wants to do the advertisement of $200 in the year 2019 and wants
to know the prediction about the sales for this year. So to solve such type of
prediction problems in machine learning, we need regression analysis.
In Regression, we plot a graph between the variables which best fits the given
datapoints, using this plot, the machine learning model can make predictions about the
data. In simple words, "Regression shows a line or curve that passes through all the
datapoints on target-predictor graph in such a way that the vertical distance
between the datapoints and the regression line is minimum." The distance between
datapoints and line tells whether a model has captured a strong relationship or not.
o Regression estimates the relationship between the target and the independent
variable.
o It is used to find the trends in data.
o It helps to predict real/continuous values.
o By performing the regression, we can confidently determine the most important
factor, the least important factor, and how each factor is affecting the other
factors.
Types of Regression
There are various types of regressions which are used in data science and machine
learning. Each type has its own importance on different scenarios, but at the core, all the
regression methods analyze the effect of the independent variable on dependent
variables. Here we are discussing some important types of regression which are given
below:
o Linear Regression
o Logistic Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression
o Ridge Regression
o Lasso Regression:
Linear Regression:
1. Y= aX+b
When we provide the input values (data) to the function, it gives the S-curve as follows:
o It uses the concept of threshold levels, values above the threshold level are
rounded up to 1, and values below the threshold level are rounded up to 0.
o Binary(0/1, pass/fail)
o Multi(cats, dogs, lions)
o Ordinal(low, medium, high)
Polynomial Regression:
Note: This is different from Multiple Linear regression in such a way that in Polynomial
regression, a single element has different degrees instead of multiple variables with the
same degree.
o Decision Tree is a supervised learning algorithm which can be used for solving
both classification and regression problems.
o It can solve problems for both categorical and numerical data
o Decision Tree regression builds a tree-like structure in which each internal node
represents the "test" for an attribute, each branch represent the result of the test,
and each leaf node represents the final decision or result.
o A decision tree is constructed starting from the root node/parent node (dataset),
which splits into left and right child nodes (subsets of dataset). These child nodes
are further divided into their children node, and themselves become the parent
node of those nodes. Consider the below image:
Above image showing the example of Decision Tee regression, here, the model is trying
to predict the choice of a person between Sports cars or Luxury car.
o Random forest is one of the most powerful supervised learning algorithms which
is capable of performing regression as well as classification tasks.
o The Random Forest regression is an ensemble learning method which combines
multiple decision trees and predicts the final output based on the average of
each tree output. The combined decision trees are called as base models, and it
can be represented more formally as:
o Ridge regression is one of the most robust versions of linear regression in which
a small amount of bias is introduced so that we can get better long term
predictions.
o The amount of bias added to the model is known as Ridge Regression penalty.
We can compute this penalty term by multiplying with the lambda to the squared
weight of each individual features.
o The equation for ridge regression will be:
Lasso Regression:
The main goal of the Classification algorithm is to identify the category of a given
dataset, and these algorithms are mainly used to predict the output for the categorical
data.
Classification algorithms can be better understood using the below diagram. In the
below diagram, there are two classes, class A and Class B. These classes have features
that are similar to each other and dissimilar to other classes.
1. Lazy Learners: Lazy Learner firstly stores the training dataset and wait until it receives
the test dataset. In Lazy learner case, classification is done on the basis of the most
related data stored in the training dataset. It takes less time in training but more time for
predictions.
Example: K-NN algorithm, Case-based reasoning
2. Eager Learners:Eager Learners develop a classification model based on a training
dataset before receiving a test dataset. Opposite to Lazy learners, Eager Learner takes
more time in learning, and less time in prediction. Example: Decision Trees, Naïve Bayes,
ANN.
o Linear Models
o Logistic Regression
o Support Vector Machines
o Non-linear Models
o K-Nearest Neighbours
o Kernel SVM
o Naïve Bayes
o Decision Tree Classification
o Random Forest Classification
Note: We will learn the above algorithms in later chapters.
1. ?(ylog(p)+(1?y)log(1?p))
2. Confusion Matrix:
o
Actual Positive Actual Negative
o ROC curve stands for Receiver Operating Characteristics Curve and AUC stands
for Area Under the Curve.
o It is a graph that shows the performance of the classification model at different
thresholds.
o To visualize the performance of the multi-class classification model, we use the AUC-ROC
Curve.
o The ROC curve is plotted with TPR and FPR, where TPR (True Positive Rate) on Y-axis and
FPR(False Positive Rate) on X-axis.
Linear regression algorithm shows a linear relationship between a dependent (y) and
one or more independent (y) variables, hence called as linear regression. Since linear
regression shows the linear relationship, which means it finds how the value of the
dependent variable is changing according to the value of the independent variable.
The linear regression model provides a sloped straight line representing the relationship
between the variables. Consider the below image:
y= a0+a1x+ ε
Here,
The values for x and y variables are training datasets for Linear Regression model
representation.
The different values for weights or the coefficient of lines (a 0, a1) gives a different line of
regression, so we need to calculate the best values for a 0 and a1 to find the best fit line,
so to calculate this we use cost function.
Cost function-
o The different values for weights or coefficient of lines (a 0, a1) gives the different line of
regression, and the cost function is used to estimate the values of the coefficient for the
best fit line.
o Cost function optimizes the regression coefficients or weights. It measures how a linear
regression model is performing.
o We can use the cost function to find the accuracy of the mapping function, which maps
the input variable to the output variable. This mapping function is also known
as Hypothesis function.
For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is
the average of squared error occurred between the predicted values and actual values. It
can be written as:
Where,
Residuals: The distance between the actual value and predicted values is called residual.
If the observed points are far from the regression line, then the residual will be high, and
so cost function will high. If the scatter points are close to the regression line, then the
residual will be small and hence the cost function.
Gradient Descent:
o Gradient descent is used to minimize the MSE by calculating the gradient of the cost
function.
o A regression model uses gradient descent to update the coefficients of the line by
reducing the cost function.
o It is done by a random selection of values of coefficient and then iteratively update the
values to reach the minimum cost function.
Model Performance:
The Goodness of fit determines how the line of regression fits the set of observations.
The process of finding the best model out of various models is called optimization. It
can be achieved by below method:
1. R-squared method:
The main difference between Regression and Classification algorithms that Regression
algorithms are used to predict the continuous values such as price, salary, age, etc. and
Classification algorithms are used to predict/Classify the discrete values such as Male
or Female, True or False, Spam or Not Spam, etc.
The task of the classification algorithm is to find the mapping function to map the
input(x) to the discrete output(y).
Example: The best example to understand the Classification problem is Email Spam
Detection. The model is trained on the basis of millions of emails on different
parameters, and whenever it receives a new email, it identifies whether the email is spam
or not. If the email is spam, then it is moved to the Spam folder.
o Logistic Regression
o K-Nearest Neighbours
o Support Vector Machines
o Kernel SVM
o Naïve Bayes
o Decision Tree Classification
o Random Forest Classification
Regression:
Regression is a process of finding the correlations between dependent and independent
variables. It helps in predicting the continuous variables such as prediction of Market
Trends, prediction of House prices, etc.
The task of the Regression algorithm is to find the mapping function to map the input
variable(x) to the continuous output variable(y).
Example: Suppose we want to do weather forecasting, so for this, we will use the
Regression algorithm. In weather prediction, the model is trained on the past data, and
once the training is completed, it can easily predict the weather for future days.
In Regression, the output variable must be of In Classification, the output variable must be a discrete
continuous nature or real value. value.
The task of the regression algorithm is to The task of the classification algorithm is to map the
map the input value (x) with the continuous input value(x) with the discrete output variable(y).
output variable(y).
Regression Algorithms are used with Classification Algorithms are used with discrete data.
continuous data.
In Regression, we try to find the best fit line, In Classification, we try to find the decision boundary,
which can predict the output more which can divide the dataset into different classes.
accurately.
Regression algorithms can be used to solve Classification Algorithms can be used to solve
the regression problems such as Weather classification problems such as Identification of spam
Prediction, House price prediction, etc. emails, Speech Recognition, Identification of cancer cells,
etc.
The regression Algorithm can be further The Classification algorithms can be divided into Binary
divided into Linear and Non-linear Classifier and Multi-class Classifier.
Regression.
Note: Logistic regression uses the concept of predictive modeling as regression; therefore,
it is called logistic regression, but is used to classify samples; Therefore, it falls under the
classification algorithm.
o The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
o It maps any real value into another value within a range of 0 and 1.
o The value of the logistic regression must be between 0 and 1, which cannot go beyond
this limit, so it forms a curve like the "S" form. The S-form curve is called the Sigmoid
function or the logistic function.
o In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Such as values above the threshold value tends to 1, and a
value below the threshold values tends to 0.
o In Logistic Regression y can be between 0 and 1 only, so for this let's divide the above
equation by (1-y):
o But we need range between -[infinity] to +[infinity], then take logarithm of the equation
it will become:
Example: There is a dataset given which contains the information of various users
obtained from the social networking sites. There is a car making company that has
recently launched a new SUV car. So the company wanted to check how many users
from the dataset, wants to purchase the car.
For this problem, we will build a Machine Learning model using the Logistic regression
algorithm. The dataset is shown in the below image. In this problem, we will predict
the purchased variable (Dependent Variable) by using age and salary (Independent
variables).
Steps in Logistic Regression: To implement the Logistic Regression using Python, we
will use the same steps as we have done in previous topics of Regression. Below are the
steps:
1. Data Pre-processing step: In this step, we will pre-process/prepare the data so that
we can use it in our code efficiently. It will be the same as we have done in Data pre-
processing topic. The code for this is given below:
By executing the above lines of code, we will get the dataset as the output. Consider the
given image:
Now, we will extract the dependent and independent variables from the given dataset.
Below is the code for it:
In the above code, we have taken [2, 3] for x because our independent variables are age
and salary, which are at index 2, 3. And we have taken 4 for y variable because our
dependent variable is at index 4. The output will be:
Now we will split the dataset into a training set and test set. Below is the code for it:
1. #feature Scaling
2. from sklearn.preprocessing import StandardScaler
3. st_x= StandardScaler()
4. x_train= st_x.fit_transform(x_train)
5. x_test= st_x.transform(x_test)
We have well prepared our dataset, and now we will train the dataset using the training
set. For providing training or fitting the model to the training set, we will import
the LogisticRegression class of the sklearn library.
After importing the class, we will create a classifier object and use it to fit the model to
the logistic regression. Below is the code for it:
Output: By executing the above code, we will get the below output:
Out[5]:
Our model is well trained on the training set, so we will now predict the result by using
test set data. Below is the code for it:
In the above code, we have created a y_pred vector to predict the test set result.
Output: By executing the above code, a new vector (y_pred) will be created under the
variable explorer option. It can be seen as:
The above output image shows the corresponding predicted users who want to
purchase or not purchase the car.
Now we will create the confusion matrix here to check the accuracy of the classification.
To create it, we need to import the confusion_matrix function of the sklearn library.
After importing the function, we will call it using a new variable cm. The function takes
two parameters, mainly y_true( the actual values) and y_pred (the targeted value return
by the classifier). Below is the code for it:
By executing the above code, a new confusion matrix will be created. Consider the
below image:
We can find the accuracy of the predicted result by interpreting the confusion matrix. By
above output, we can interpret that 65+24= 89 (Correct Output) and 8+3= 11(Incorrect
Output).
Finally, we will visualize the training set result. To visualize the result, we will
use ListedColormap class of matplotlib library. Below is the code for it:
In the above code, we have imported the ListedColormap class of Matplotlib library to
create the colormap for visualizing the result. We have created two new
variables x_set and y_set to replace x_train and y_train. After that, we have used
the nm.meshgrid command to create a rectangular grid, which has a range of -
1(minimum) to 1 (maximum). The pixel points we have taken are of 0.01 resolution.
To create a filled contour, we have used mtp.contourf command, it will create regions
of provided colors (purple and green). In this function, we have passed
the classifier.predict to show the predicted data points predicted by the classifier.
Output: By executing the above code, we will get the below output:
We have successfully visualized the training set result for the logistic regression, and our
goal for this classification is to divide the users who purchased the SUV car and who did
not purchase the car. So from the output graph, we can clearly see the two regions
(Purple and Green) with the observation points. The Purple region is for those users who
didn't buy the car, and Green Region is for those users who purchased the car.
Linear Classifier:
As we can see from the graph, the classifier is a Straight line or linear in nature as we
have used the Linear model for Logistic Regression. In further topics, we will learn for
non-linear Classifiers.
Our model is well trained using the training dataset. Now, we will visualize the result for
new observations (Test set). The code for the test set will remain same as above except
that here we will use x_test and y_test instead of x_train and y_train. Below is the code
for it:
1. #Visulaizing the test set result
2. from matplotlib.colors import ListedColormap
3. x_set, y_set = x_test, y_test
4. x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, s
tep =0.01),
5. nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
6. mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape(x1.shape),
Output:
The above graph shows the test set result. As we can see, the graph is divided into two
regions (Purple and Green). And Green observations are in the green region, and Purple
observations are in the purple region. So we can say it is a good prediction and model.
Some of the green and purple data points are in different regions, which can be ignored
as we have already calculated this error using the confusion matrix (11 Incorrect output).
Hence our model is pretty good and ready to make new predictions for this
classification problem.