Logistic Regression 5
Logistic Regression 5
LEARNING
Classification Algorithm in Machine Learning
• Finally, we will visualize the training set result. To visualize the result, we will use ListedColormap class of
matplotlib library. Below is the code for it:
• #Visualizing the training set result
• from matplotlib.colors import ListedColormap
• x_set, y_set = x_train, y_train
• x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step =0.01),
• nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
• mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape(x1.shape),
• alpha = 0.75, cmap = ListedColormap(('purple','green' )))
• mtp.xlim(x1.min(), x1.max())
• mtp.ylim(x2.min(), x2.max())
• for i, j in enumerate(nm.unique(y_set)):
• mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
• c = ListedColormap(('purple', 'green'))(i), label = j)
• mtp.title('Logistic Regression (Training set)')
• mtp.xlabel('Age')
• mtp.ylabel('Estimated Salary')
• mtp.legend()
• mtp.show()
• In the above code, we have imported the ListedColormap class
of Matplotlib library to create the colormap for visualizing the
result. We have created two new variables x_set and y_set to
replace x_train and y_train. After that, we have used
the nm.meshgrid command to create a rectangular grid, which
has a range of -1(minimum) to 1 (maximum). The pixel points
we have taken are of 0.01 resolution.
• To create a filled contour, we have
used mtp.contourf command, it will create regions of provided
colors (purple and green). In this function, we have passed
the classifier.predict to show the predicted data points
predicted by the classifier.
• Output: By executing the above code, we will
get the below output:
• The graph can be explained in the below points:
• In the above graph, we can see that there are some Green points within the green region
and Purple points within the purple region.
• All these data points are the observation points from the training set, which shows the result
for purchased variables.
• This graph is made by using two independent variables i.e., Age on the x-axis and Estimated
salary on the y-axis.
• The purple point observations are for which purchased (dependent variable) is probably 0,
i.e., users who did not purchase the SUV car.
• The green point observations are for which purchased (dependent variable) is probably 1
means user who purchased the SUV car.
• We can also estimate from the graph that the users who are younger with low salary, did not
purchase the car, whereas older users with high estimated salary purchased the car.
• But there are some purple points in the green region (Buying the car) and some green points
in the purple region(Not buying the car). So we can say that younger users with a high
estimated salary purchased the car, whereas an older user with a low estimated salary did
not purchase the car.
• The goal of the classifier:
• We have successfully visualized the training set
result for the logistic regression, and our goal for
this classification is to divide the users who
purchased the SUV car and who did not purchase
the car. So from the output graph, we can clearly
see the two regions (Purple and Green) with the
observation points. The Purple region is for those
users who didn't buy the car, and Green Region is
for those users who purchased the car.
• Linear Classifier:
• As we can see from the graph, the classifier is a Straight
line or linear in nature as we have used the Linear model
for Logistic Regression. In further topics, we will learn for
non-linear Classifiers.
• Visualizing the test set result:
• Our model is well trained using the training dataset. Now,
we will visualize the result for new observations (Test set).
The code for the test set will remain same as above except
that here we will use x_test and y_test instead of x_train
and y_train. Below is the code for it:
• #Visulaizing the test set result
• from matplotlib.colors import ListedColormap
• x_set, y_set = x_test, y_test
• x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step =0.01),
• nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
• mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()]).T).reshape(x1.shape),
• alpha = 0.75, cmap = ListedColormap(('purple','green' )))
• mtp.xlim(x1.min(), x1.max())
• mtp.ylim(x2.min(), x2.max())
• for i, j in enumerate(nm.unique(y_set)):
• mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
• c = ListedColormap(('purple', 'green'))(i), label = j)
• mtp.title('Logistic Regression (Test set)')
• mtp.xlabel('Age')
• mtp.ylabel('Estimated Salary')
• mtp.legend()
• mtp.show()
Output:
• The above graph shows the test set result. As we can
see, the graph is divided into two regions (Purple and
Green). And Green observations are in the green
region, and Purple observations are in the purple
region. So we can say it is a good prediction and
model. Some of the green and purple data points are
in different regions, which can be ignored as we have
already calculated this error using the confusion matrix
(11 Incorrect output).
• Hence our model is pretty good and ready to make
new predictions for this classification problem.