Annotated 3
Annotated 3
Ali Kadhem
The dataset was split into a training–set and a test–set, 80% for training and 20% for test. Error
for training–set for nearest neighbour classifier is irrelevant since it will always be zero. The error
for the test–set for 100 trials of different partitions of training– and test–sets was 15%. To print a
face image and a non–face image the code in listing 2 was used. The images chosen were the first
and last images among the test images. First image is a face and the last image is not a face.
1
colormap(gray)
imNotF=X_test(:,ind(2));
if Y_test(ind(2))==classify(imNotF,classification_data)
display(’SAME!’)
else
display(’NOT SAME!’)
end
imNotF=reshape(X_test(:,ind(2)),[19 19]);
figure
imagesc(imNotF)
colormap(gray)
2 2
4 4
6 6
8 8
10 10
12 12
14 14
16 16
18 18
2 4 6 8 10 12 14 16 18 2 4 6 8 10 12 14 16 18
2
Mean error rates for the training–set were:
For the test set the nearest neighbour classifier and the regression tree have the same performance,
while support vector machine is much better. Support vector machine performs much better since it
has a good algorithm for producing a (hyper)plane that separates different classes (by for example
mapping to a higher dimensional non–linear feature space), as compared to a simple distance
function for nearest neighbour. Training set is zero for nearest neighbour, since this method does
not use any real training, and zero for support vector machine means that it was possible to
completely separate the dataset into their classes with a hyperplane.
Figure 2: Accuracy of training of the simple CNN model for one trial.
The CNN model performs much like the SVM for this face classification problem.
3
4 Linefit
A line fit on a set of data can be done in different ways. Least square method is a method that
minimizes the vertical distance from data points to the fitted line. This method produces fits that
are susceptible to outlier data and therefore the fitted line can misrepresent the data, as seen in
figure 3.
35
30 data points
least-squares
25 RANSAC
20
15
y
10
-5
0 2 4 6 8 10 12
x
Figure 3: Line fit using least squares and RANSAC. Least square equation is y = 1.7422x + 5.4217
and of RANSAC it is y = 3.2671x − 3.0998
A method that tries to solve this problem is RANSAC. For this problem, RANSAC chooses two
random points and makes a fit for these two points. The number of points close to this fit is
calculated and if there are enough points this line is taken as the fitted line. In case not enough
points are close to the line, a new set of points are randomly chosen and again the number of close
points to the fitted line is considered. These steps are however only done for a limited number of
times. The least square errors (sum of square vertical errors) and the total least square errors (sum
of square orthogonal errors) were found to be
The code of how the errors are found is seen in listing 3. Least squares minimize the vertical errors,
4
which explains why the error for least square is less than RANSAC. The total least squares was
less for RANSAC since this method tries to find a line that is close to as many points as possible
(or as one desires). In general, to consider only vertical errors can be result in very large errors
depending on the slope of the line fit. This is seen in the difference between the values for the two
different error types.
Listing 3: Code for calculating least squares error and total least squares error for least squares fit
and RANSAC.
ls_error=sum(abs(y_leastVal-ym).^2)
a=-p_ls(1);
b=1;
c=-p_ls(2);
d=(1/sqrt(a^2+b^2).*abs(a.*xm+b.*ym+c));
tls_error_least=sum(d.^2)
y_RANVal=p_ransac(1) * xm + p_ransac(2);
ransac_error=sum(abs(y_RANVal-ym).^2)
a=-p_ransac(1);
b=1;
c=-p_ransac(2);
d2=(1/sqrt(a^2+b^2).*abs(a.*xm+b.*ym+c));
tls_error_ransac=sum(d2.^2)