Machine Learning SVM - Supervised
Machine Learning SVM - Supervised
SVM stands for a support vector machine. SVM's are typically used for
classification tasks similar to what we did with K Nearest Neighbors. They
work very well for high dimensional data and are allow for us to classify data
that does not have a linear correspondence. For example classifying a data
set like the one below.
Hyper-Planes
When we create a hyper-plane we need to do the following. We must pick
two points that are known as our support vectors. These points must be the
two closest points to the hyper-plane and their distance from the hyper-
plane must be identical. In the example above we can see that the two
circled points are our support vectors and their distance to the hyper-plane
is the same, they are also the closest points. With this rule we can actually
create an infinite amount of hyper planes (see below).
All of the images above are valid hyper-planes.
Picking a Hyper Plane
Once we create a hyper-plane we are going to use it to classify our data. If
a test point is on the left side of the plane we would classify it as red (in our
examples above) and if it is on the right we would classify it as green. So
how can we pick a hyper-plane that will give us the best classification
predictions?
Have a look at the hyper-planes above and determine which you think would
give the best classification for a mystery test point. What do you notice
about that hyper-plane?
Well the best possible hyper-plane would be the first image on this page.
Notice the distance between the support vectors and the hyper-plane is far
greater than the other generated hyper-planes.
When we pick a hyper-plane we want to pick one that has the greatest
possible margin.
Margin
The margin is the distance that separates all of the points in our test data.
The blue lines below show you the margin for this particular data and hyper-
plane. Typically the greater our margin the better our classification will be.
Note: Imagine the blue lines are parallel to the black…
Kernels
So you now have a very basic understanding of how a SVM works. Seems
pretty simple in theory, but in practice we can run into a lot of issues.
Let’s say our data isn’t as pretty and we have some points that look like
this:
Can you determine which hyper-plane would be the best for this data? Even
if you could it would make a horrible classifier. This is where we introduce
something called kernels.
Kernels provide a way for us to create a hyper-plane for data like seen
above. We use a kernel to bring our data up to a higher dimension (in this
case from 2D->3D). We hope that by doing this we will have our points
plotted in a way that we can divide them using a hyper-plane.
By applying a kernel to our data above we hope to get something that looks
like the following:
You can see that we can now divide our points with a plane in 3D. By
applying the kernel our data has become separable.
What Is A Kernel?
A kernel is simply a function that takes as input our features (x1, x2 in our
example) and returns a value equal to the third dimensional coordinate (x3).
An example of a kernel copuld be the equation:
(x1)^2 + (x2)^2 = x3
\
Typically when we use a kernel we use a pre-existing one. There is much
debate about which kernel is the best but here are some examples of
popular kernels.
– Linear
– Polynomial
– Circular
– Hyperbolic Tangent (Sigmoid)
Importing Modules
Before we start we need to import a few things from sklearn.
import sklearn
from sklearn import svm
from sklearn import datasets
Loading Data
In previous tutorials we did quite a bit of work to load in our data sets from
places like the UCI Machine Learning Repository . That is a very useful skill
and is something you will often have to do when applying these algorithm to
your own data. However, now that we have learned this we will use the data
sets that come with sklearn. These are much nicer to work with and have
some nice methods that make loading in data very quick.
For this tutorial we will be using a breast cancer data set. It consists of
many features describing a tumor and classifies them as either cancerous
or non cancerous.
To load our data we will simply do the following.
cancer = datasets.load_breast_cancer()
Splitting Data
Now that we have loaded in our data set it is time to split it into training and
testing data. We will do this like seen in previous tutorials.
If we want to have a look at our data we can print the first few instances.
print(x_train[:5], y_train[:5])
Implementing a SVM
Implementing the SVM is actually fairly easy. We can simply create a new
model and call .fit() on our training data.
To score our data we will use a useful tool from the sklearn module.
And that is all we need to do to implement our SVM, now we can run the
program and take note of our amazing accuracy!
Wait... Our accuracy is close to 60% and that is horrible! Looks like we need
to add something else.
Adding a Kernel
The reason we received such a low accuracy score was we forgot to add a
kernel! We need to specify which kernel we should use to increase our
accuracy.
In machine learning, kernel methods are a class of algorithms for pattern analysis, whose
best known member is the support vector machine (SVM). The general task of pattern
analysis is to find and study general types of relations (for
example clusters, rankings, principal components, correlations, classifications) in datasets.
For many algorithms that solve these tasks, the data in raw representation have to be
explicitly transformed into feature vector representations via a user-specified feature map:
in contrast, kernel methods require only a user-specified kernel, i.e., a similarity
function over pairs of data points in raw representation.
Kernel Options:
- linear
- poly
- rbf
- sigmoid
- precomputed
If we do again….
Changing the Margin
By default our kernel has a soft margin of value 1. This parameter is known
as C. We can increase C to give more of a soft margin, we can also
decrease it to 0 to make a hard margin. Playing with this value should alter
your results slightly.
If you want to play around with some other parameters have a look here.
Comparing to KNearestNeighbors
If we want to see how this algorithm runs in comparison to KNN we can run
the KNN classifier on this data-set and compare our accuracy values.
To change to the KNN classifier is quite simple.
Worse accuracy but, Note that KNN still does well on this data set but hovers
around the 90% mark.
HANDWRITTEN NUMBER RECOGNITION
The data that we are interested in is made of 8x8 images of digits, let's have a look at the first
image, stored in the `images` attribute of the dataset. If we were working from image files, we
could load them using matplotlib.pyplot.imread. Note that each image must have the same
size. For these images, we know which digit they represent: it is given in the 'target' of the
dataset.
Preparamos la Data
En este caso dividimos al 50%, parece un poco alto no se debe pasar de 20%.
Entender que cada vez que se hace el Split se cogen elemento diferentes, es aleatorio, y por
tanto puede cambiar los resultados finales.
Entrenamos el Modelo.
Comprobamos el Modelo
VISUALIZACIÓN.
CAMBIANDO EL MODELO
PREDECIR UN ELEMENTO.
Dice que es un 1.
Vamos a ver gráficamente la imagen testeada. Para ello el vector dado tenemos que
redimensionarla a 8x8.
print(__doc__)
# The data that we are interested in is made of 8x8 images of digits, let's
# have a look at the first 4 images, stored in the `images` attribute of the
# dataset. If we were working from image files, we could load them using
# matplotlib.pyplot.imread. Note that each image must have the same size. For these
# images, we know which digit they represent: it is given in the 'target' of
# the dataset.
print(digits.images[0])
print(digits.target[0])
images_and_labels = list(zip(digits.images, digits.target))
print("Numero de datos",len(images_and_labels))
_, axes = plt.subplots(2, 6)
for ax, (image, label) in zip(axes[0, :], images_and_labels[:6]):
ax.set_axis_on()
ax.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
ax.set_title('Train: %i' % label)
plt.show()
classifier = svm.SVC(kernel="linear")
classifier.fit(X_train, y_train)
predicted = classifier.predict(X_test)
acc = metrics.accuracy_score(y_test, predicted)
print("accuracy con kernel linear:", acc)
classifier = svm.SVC(kernel="poly")
classifier.fit(X_train, y_train)
predicted = classifier.predict(X_test)
acc = metrics.accuracy_score(y_test, predicted)
print("accuracy con kernel poly:", acc)
classifier = svm.SVC(kernel="sigmoid")
classifier.fit(X_train, y_train)
predicted = classifier.predict(X_test)
acc = metrics.accuracy_score(y_test, predicted)
print("accuracy con kernel sigmoid:", acc)
classifier = svm.SVC(kernel="rbf")
classifier.fit(X_train, y_train)
predicted = classifier.predict(X_test)
acc = metrics.accuracy_score(y_test, predicted)
print("accuracy con kernel rbf:", acc)
classifier = svm.SVC()
classifier.fit(X_train, y_train)
predicted = classifier.predict(X_test)
acc = metrics.accuracy_score(y_test, predicted)
print("accuracy:", acc)
test = X_test[750].reshape(1,-1)
prediction = classifier.predict(test)
test8x8 = test.reshape(8,8)
plt.imshow(test8x8, cmap=plt.cm.gray_r, interpolation='nearest')
plt.title('Prediccion: %i' % prediction)
plt.show()
test = X_test[770].reshape(1,-1)
prediction = classifier.predict(test)
print(prediction)
test8x8 = test.reshape(8,8)
plt.imshow(test8x8, cmap=plt.cm.gray_r, interpolation='nearest')
plt.title('Prediccion: %i' % prediction)
plt.show()