0% found this document useful (0 votes)
18 views

Classification

Uploaded by

lovishh03.ssll
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Classification

Uploaded by

lovishh03.ssll
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

classifiers

Generative classifiers
• generative classifier tries to learn the model that
generates the data behind the scenes by **estimating the
assumptions and distributions of the model.
• It then uses this to predict unseen data, because it
assumes the model that was learned captures the real
model.
• A Generative model assumes that all the features are
conditionally independent .
• A Generative Model ‌learns the joint probability
distribution p(x,y).
• An example is the Naive Bayes Classifier.
Discriminative Classifiers

• A discriminative classifier tries to model by just


depending on the observed data. It makes fewer
assumptions on the distributions but depends
heavily on the quality of the data.
• Discriminative model does not assume anything
related to the independence of features.
• Discriminative model ‌learns the conditional
probability distribution p(y|x).
• An example is the Logistic Regression, Scalar Vector
Machine, ‌Nearest neighbour.
When to use a generative classifier or a discriminative classifier?

• In practice, discriminative classifiers


outperform generative classifiers, if you have a
lot of data.
• Machine Learning models have one sole
purpose; to generalize well.
• Generalization is the model’s ability to give
sensible outputs to sets of input that it has
never seen before.
• A model that generalizes well is a model that
is neither underfit nor overfit.
Underfitting and overfitting
The cause of poor performance in machine learning is either
overfitting or underfitting the data.

Underfitting: when model is too simple, both training and test errors are large
(the model has “ not learned enough”)

Overfitting: )when model is too complex, training error is small but test error is
large (learning “too much)
Overfitting in Machine Learning

• Over fitting refers to a model that models the training data too well.
• Overfittting happens when our model doesn’t generalize well from our training data to
unseen data.
• Overfitting is the case where the overall cost is really small, but the generalization of the model is unreliable. This is due to
the model learning “too much” from the training data set.
• Overfitting happens when a model learns the detail and noise in the training data to the extent that it
negatively impacts the performance of the model on new data.
• If we train for too long, then we will overfit the data, which means that we have learnt about
the noise and inaccuracies in the data as well as the actual function. Therefore, the model
that we learn will be much too complicated, and won’t be able to generalise.

• The problem is that these concepts do not apply to new data and negatively impact the models
ability to generalize.
• If our model does much better on the training set than on the test set, then we’re likely
overfitting.

• Overfitting is more likely with nonparametric and nonlinear models that have more flexibility when
learning a target function
Reasons for Model Overfitting
• Limited Training Size

• High Model Complexity


Underfitting in Machine Learning

• Underfitting refers to a model that can neither


model the training data nor generalize to new data.
• An underfit machine learning model is not a
suitable model and will be obvious as it will have
poor performance on the training data.
• Underfitting is the case where the model has “ not
learned enough” from the training data, resulting
in low generalization and unreliable predictions.
How To Limit Overfitting

• Both overfitting and underfitting can lead to poor


model performance. But by far the most common
problem in applied machine learning is overfitting.
• There are two important techniques that you can
use when evaluating machine learning algorithms
to limit overfitting:
1. Use a resampling technique to estimate model
accuracy.
2. Hold back a validation dataset(cross validation).
Cross validation
• Leave one out:
• take (n-1) data values as training set and one
data set value for test.
• Then again take (n-1) data values and leave
some different one valve data set value for
test.
• Repeat it for number of times and then take
average.
Repeat random subsampling
• Select ration of the data which will act as
training set and which will act as test set.
For eg:80% training set & 20% test set.
Supervised learning
• Supervised learning typically begins with an
established set of data and a certain
understanding of how that data is classified.
• Supervised learning is intended to find
patterns in data that can be applied to an
analytics process.
• This data has labeled features that define the
meaning of data.
• For example, there could be millions of images
of animals and include an explanation of what
each animal is and then you can create a
machine learning application that distinguishes
one animal from another.
• By labeling this data about types of animals,
you may have hundreds of categories of
different species.
• When the label is continuous, it is a regression.
• when the data comes from a finite set of
values, it known as classification.
Supervised learning

Classification Regression
• Regression: real numbers associated with
feature vector.
eg: to predict house price from training data.
• Classification : A discrete value associated with
a feature vector.
eg: to predict gender
What is classification?
A machine learning task that deals with identifying the class to which an
instance belongs

A classifier performs classification

Test instance
Classifier Discrete-valued

Attributes Issue Loan? {Yes, No}


Class label
(a1, a2,… an)
Types of learners in classification
There are two types of learners in classification
as lazy learners and eager learners.
• Lazy learners

• Eager learners
Lazy learners
• Lazy learners simply store the training data and
wait until a testing data appear.
• When it does, classification is conducted based
on the most related data in the stored training
data.
• Compared to eager learners, lazy learners have
less training time but more time in predicting.
• Ex. k-nearest neighbor, Case-based reasoning
Eager learners

• Eager learners construct a classification model


based on the given training data before receiving
data for classification.
• It must be able to commit to a single hypothesis
that covers the entire instance space.
• Due to the model construction, eager learners
take a long time for train and less time to predict.
• Ex. Decision Tree, Naive Bayes, Artificial Neural
Networks
Classification learning

Training Testing
phase phase
Learning the classifier Testing how well the classifier
from the available data performs
‘Training set’ ‘Testing set’
(Labeled)
1. The Classification
challange
Learning of binary classification
• Given: a set of m examples (xi,yi) i = 1,2…m
sampled from some distribution D, where xiRn and
yi{-1,+1}
• Find: a function f f: Rn -> {-1,+1} which classifies
‘well’ examples xj sampled from D.

comments
• The function f is usually a statistical model, whose
parameters are learnt from the set of examples.
• The set of examples are called – ‘training set’.
• Y is called – ‘target variable’, or ‘target’.
• Examples with yi=+1 are called ‘positive examples’.
Examples with yi=-1 are called ‘negative examples’.
Some Real life applications
• Systems Biology – Gene expression microarray data:
• Text categorization: spam detection
• Face detection: Signature recognition: Customer discovery
• Medicine: Predict if a patient has heart ischemia by a
spectral analysis of his/her ECG.
• Fraud detection
Microarray data
Separate malignant from healthy
tissues based on the mRNA
expression profile of the tissue.
Face detection
•discriminating human faces from non faces.
Signature recognition
• Recognize signatures by structural similarities
which are difficult to quantify.
• does a signature belongs to a specific person, say
Tony Blair, or not.
Classification problem

x2
?

?
?

x1
Classification algorithms
– Fisher linear discriminant
– KNN
– Decision tree
– Neural networks
– SVM
– Naïve bayes
– Adaboost
– Many many more ….

– Each one has its properties wrt bias,


speed, accuracy, transparency…
Nearest neighbor
• Simplest approach.
• Without any priori assumptions.
• Remember training set.
• Calculate distance between different labels
and compare new x with nearest neighbor.
• Problem: if noisy data we can get wrong
answers
Properties of KNN
• The following two properties would define KNN
well −
• Lazy learning algorithm − KNN is a lazy learning
algorithm because it does not have a specialized
training phase and uses all the data for training
while classification.
• Non-parametric learning algorithm − KNN is also a
non-parametric learning algorithm because it
doesn’t assume anything about the underlying
data.
KNN in simple terms
• We take some k no of nearest neighbors and
compare the new x with its k nearest
neighbors.
Performance of ML algorithms
There are various metrics which we can use to evaluate the
performance of ML algorithms, classification as well as regression
algorithms. We must carefully choose the metrics for evaluating
ML performance because −

• How the performance of ML algorithms is measured and


compared will be dependent entirely on the metric you choose.

• How you weight the importance of various characteristics in the


result will be influenced completely by the metric you choose.
Performance Metrics for Classification Problems

Confusion Matrix
• It is the easiest way to measure the performance of a
classification problem where the output can be of two or
more type of classes.
• A confusion matrix is nothing but a table with two
dimensions viz. “Actual” and “Predicted” and
furthermore, both the dimensions have “True Positives
(TP)”, “True Negatives (TN)”, “False Positives (FP)”, “False
Negatives (FN)”
• True Positives (TP) − It is the case when both actual class &
predicted class of data point is 1.
• True Negatives (TN) − It is the case when both actual class &
predicted class of data point is 0.
• False Positives (FP) − It is the case when actual class of data
point is 0 & predicted class of data point is 1.
• False Negatives (FN) − It is the case when actual class of
data point is 1 & predicted class of data point is 0.
We can use confusion_matrix function of sklearn.metrics to
compute Confusion Matrix of our classification model.
Accuracy

• It is most common performance metric for


classification algorithms. It may be defined as
the number of correct predictions made as a
ratio of all predictions made. We can easily
calculate it by confusion matrix with the help
of following formula −
Accuracy= TP+TN/TP+FP+TN+FN
If class data unbalance
• Some values more than other type of values.
• If +ve values> -ve values
Or
If –ve values> +ve values
then accuracy is not perfect measure to be
calculated for performance.
recall, precision can be used for performance
measure
Performance measures
• Recall may be defined as the number of positives returned by our ML
model. We can easily calculate it by confusion matrix with the help of
following formula.
Recall = senstivity
= true + ve values/(true +ve values + false –ve values)

• Precision, used in document retrievals, may be defined as the number


of correct documents returned by our ML model. We can easily
calculate it by confusion matrix with the help of following formula:

Precision = specificity
= true -ve values/(true-ve values + false +ve values)
Advantages
• It is very simple algorithm to understand and
interpret.
• It is very useful for nonlinear data because there
is no assumption about data in this algorithm.
• It is a versatile algorithm as we can use it for
classification as well as regression.
• It has relatively high accuracy but there are much
better supervised learning models than KNN.
Disadvantages
• It is computationally a bit expensive algorithm
because it stores all the training data.
• High memory storage required as compared to
other supervised learning algorithms.
• Prediction is slow in case of big N.
• It is very sensitive to the scale of data as well
as irrelevant features.
• Choosing k can be tricky.

You might also like