Classification
Classification
Generative classifiers
• generative classifier tries to learn the model that
generates the data behind the scenes by **estimating the
assumptions and distributions of the model.
• It then uses this to predict unseen data, because it
assumes the model that was learned captures the real
model.
• A Generative model assumes that all the features are
conditionally independent .
• A Generative Model learns the joint probability
distribution p(x,y).
• An example is the Naive Bayes Classifier.
Discriminative Classifiers
Underfitting: when model is too simple, both training and test errors are large
(the model has “ not learned enough”)
Overfitting: )when model is too complex, training error is small but test error is
large (learning “too much)
Overfitting in Machine Learning
• Over fitting refers to a model that models the training data too well.
• Overfittting happens when our model doesn’t generalize well from our training data to
unseen data.
• Overfitting is the case where the overall cost is really small, but the generalization of the model is unreliable. This is due to
the model learning “too much” from the training data set.
• Overfitting happens when a model learns the detail and noise in the training data to the extent that it
negatively impacts the performance of the model on new data.
• If we train for too long, then we will overfit the data, which means that we have learnt about
the noise and inaccuracies in the data as well as the actual function. Therefore, the model
that we learn will be much too complicated, and won’t be able to generalise.
• The problem is that these concepts do not apply to new data and negatively impact the models
ability to generalize.
• If our model does much better on the training set than on the test set, then we’re likely
overfitting.
• Overfitting is more likely with nonparametric and nonlinear models that have more flexibility when
learning a target function
Reasons for Model Overfitting
• Limited Training Size
Classification Regression
• Regression: real numbers associated with
feature vector.
eg: to predict house price from training data.
• Classification : A discrete value associated with
a feature vector.
eg: to predict gender
What is classification?
A machine learning task that deals with identifying the class to which an
instance belongs
Test instance
Classifier Discrete-valued
• Eager learners
Lazy learners
• Lazy learners simply store the training data and
wait until a testing data appear.
• When it does, classification is conducted based
on the most related data in the stored training
data.
• Compared to eager learners, lazy learners have
less training time but more time in predicting.
• Ex. k-nearest neighbor, Case-based reasoning
Eager learners
Training Testing
phase phase
Learning the classifier Testing how well the classifier
from the available data performs
‘Training set’ ‘Testing set’
(Labeled)
1. The Classification
challange
Learning of binary classification
• Given: a set of m examples (xi,yi) i = 1,2…m
sampled from some distribution D, where xiRn and
yi{-1,+1}
• Find: a function f f: Rn -> {-1,+1} which classifies
‘well’ examples xj sampled from D.
comments
• The function f is usually a statistical model, whose
parameters are learnt from the set of examples.
• The set of examples are called – ‘training set’.
• Y is called – ‘target variable’, or ‘target’.
• Examples with yi=+1 are called ‘positive examples’.
Examples with yi=-1 are called ‘negative examples’.
Some Real life applications
• Systems Biology – Gene expression microarray data:
• Text categorization: spam detection
• Face detection: Signature recognition: Customer discovery
• Medicine: Predict if a patient has heart ischemia by a
spectral analysis of his/her ECG.
• Fraud detection
Microarray data
Separate malignant from healthy
tissues based on the mRNA
expression profile of the tissue.
Face detection
•discriminating human faces from non faces.
Signature recognition
• Recognize signatures by structural similarities
which are difficult to quantify.
• does a signature belongs to a specific person, say
Tony Blair, or not.
Classification problem
x2
?
?
?
x1
Classification algorithms
– Fisher linear discriminant
– KNN
– Decision tree
– Neural networks
– SVM
– Naïve bayes
– Adaboost
– Many many more ….
Confusion Matrix
• It is the easiest way to measure the performance of a
classification problem where the output can be of two or
more type of classes.
• A confusion matrix is nothing but a table with two
dimensions viz. “Actual” and “Predicted” and
furthermore, both the dimensions have “True Positives
(TP)”, “True Negatives (TN)”, “False Positives (FP)”, “False
Negatives (FN)”
• True Positives (TP) − It is the case when both actual class &
predicted class of data point is 1.
• True Negatives (TN) − It is the case when both actual class &
predicted class of data point is 0.
• False Positives (FP) − It is the case when actual class of data
point is 0 & predicted class of data point is 1.
• False Negatives (FN) − It is the case when actual class of
data point is 1 & predicted class of data point is 0.
We can use confusion_matrix function of sklearn.metrics to
compute Confusion Matrix of our classification model.
Accuracy
Precision = specificity
= true -ve values/(true-ve values + false +ve values)
Advantages
• It is very simple algorithm to understand and
interpret.
• It is very useful for nonlinear data because there
is no assumption about data in this algorithm.
• It is a versatile algorithm as we can use it for
classification as well as regression.
• It has relatively high accuracy but there are much
better supervised learning models than KNN.
Disadvantages
• It is computationally a bit expensive algorithm
because it stores all the training data.
• High memory storage required as compared to
other supervised learning algorithms.
• Prediction is slow in case of big N.
• It is very sensitive to the scale of data as well
as irrelevant features.
• Choosing k can be tricky.