0% found this document useful (0 votes)
22 views

ML Unit 1

Uploaded by

sanju.25qt
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

ML Unit 1

Uploaded by

sanju.25qt
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

MACHINE LEARNING

Machine Learning
How it is different from traditional Programming:

In Traditional Programming, we feed the Input,


Program logic and run the program to get output.
Model
In Machine Learning, we feed the input, output and
run it on machine during training and the machine
creates its own logic, which is being evaluated while
testing.
INGREDIENTS OF MACHINE LEARNING

Build a model using right features that achieve right tasks.

FEATURES
TASKS

MODEL
1. FEATURES
● Play an important role in Machine Learning.

● Relevant objects in specific domain

Example: feature is a Language : i.e emails/complex/organic Molecule.


2. TASK

1. Abstract representation of a problem.

2. Classify them into two or more classes.

3. It is represented as mapping data points to outputs.


3. MODEL

1. It is an output of an machine learning algorithm applied to trained


data.

2. Small number of tasks and use only few different types of features.

3. Models lend machine learning field diversity, but tasks and features
give a unity.
Task: The problem that can be solved with machine
learning
CLASSIFICATION
Types:
1. Binary Classification: Two Classes
Examples: Boolean-Yes/No ; Binary:1/0 ; Result: Pass/Fail

2.Multi Class Classification: More than two Classes


Examples: E-Mails---Spam/Ham----Private ones/ Work related,
Private ones
REGRESSION
Real valued function training examples labelled with
true function values.
Example: Email----Inbox
Label
Scale(0 to 10) 10-20 20-30
Supervised Learning:: Labelled data
Model output involves target variable then its
PREDICTIVE MODEL

Unsupervised Learning: Unlabelled data


Model output does not involve target variable then its
DESCRIPTIVE MODEL
OVERVIEW OF MACHINE LEARNING SETTINGS

Predictive Model Descriptive Model

Supervised Classification, Subgroup Discovery


Learning Regression

Unsupervised Predictive Descriptive


Learning Clustering Clustering,
Association Rule
Discovery.
EVALUATING PERFORMANCE ON A TASK
ACCURACY
1. Divide the dataset to calculate accuracy.
2. Use 80% of dataset for training.
3. Use 20% of dataset for testing.

Data sample correctly predicted

Total data samples

ERROR RATE
Data sample wrongly predicted

Total data samples


2. MODELS
What is being learned from data in order to solve the task?

1. Geometric Model.

2. Probabilistic Model.

3. Logical Model
GEOMETRIC MODEL
● INSTANCE SPACE:
All possible instances that are present or not present in dataset.

● Geometric concepts apply to high dimensional spaces are usually


prefixed with “Hyper”-for instance, a decision boundary in an
unspecified number of dimensions is called a HYPERPLANE

● They are Constructed directly to visualize using lines, planes, distance.

● Easy to visualize.
LINEAR DECISION BOUNDARY: Linear Seperable

● Linearly divides two classes(Linear Classifier).

● W * X= T

W -> vector perpendicular to decision boundary.

X -> Any Arbitary point on the boundary.

T -> Decision Threshold.


LARGE MARGINAL CLASSIFIER
NEAREST NEIGHBOUR
•Geometric concept in machine learning is the notion of distance.

•Distance between instance s is small then the instances are similar in terms of
their feature , so nearby instances will receive same classification or belong to
same cluster.
Training Instance

Retrieve similar instance Class A

(A)

ECLUDEAN DISTANCE:
2. PROBABILISTIC MODEL
● Probabilistic in nature.
● Let X-> Instance or Feature Variable,
Y-> Target Variable or Class

● X is known for a particular instance but Y may not be.

● Conditional Probabilities : P(Y/X)


For Ex: Y -> Either Email is Spam / Ham
X -> keywords is Lottery / free / Viagra
(Y values are depend on X values)
● P(Y/X) ; Y=Spam/ Ham ; X=lottery, bonus

Posterior Probability:

● P(Y / lottery, bonus)  posterior probability


(After Knowing X values, we can predict Y values)

● P(Y/ lottery=1, bonus=0)


or
● P(Y=SPAM / lottery=1, bonus=0)
DECISION RULE
A Simple Probabilistic Model
BAYE’S RULE

 Conditional Probabilities

● P(Y)  Prior Probability without observing data X.

● P(X)  The Probability of X (It is completely independent of Y i.e 1 or 0)

● P(X/Y)  Likelihood Function


Maximum A Posterior Probability (MAP)

● Find out the maximum posterior probability

For ex:
lottery=1

SPAM=20
HAM=10
3. Logical Models
● Rules are easy to understand.

● Rules are constructed using Feature Tree.

● Iteratively select features and construct.

● Leaves 

● Class/ real values 


FEATURE List or DECISION LIST

Or
3. FEATURES
(Workinghorses of Machine Learning)
● A Kind of Measurement that can Perform on Instance
Space.
● Domain of Features: Real value, Integer, Boolean.

● USES:
1. Feature as SPLIT.
2. Feature as PREDICT.
Feature as SPLIT

1. UNIVARIATE (Single feature: Not in use)

2. BINARY ( Two Subset )


(For ex: Satisfy and not Satisfy)

3. NON-BINARY (More than Two Classes)


Feature as PREDICT

● Useful in Supervised Learning.


For Example: Decision Tree

● Predict the Target Class.


FEATURE CONSTRUCTION AND FEATURE
TRANSFORMATION

FEATURE CONSTRUCTION: Constructing or creating new


Features.

For Ex: Bag of Words


1. Scan entire text and divide it into groups of words.
2. Ignoring grammar, but keeps order and multiplicity.
3. Number of times word appears.
4. Using frequency of word.
5. Design the model.
FEATURE TRANSFORMATION

● Feature that do not fit in model then convert.

● Transform into required format.

● For Example:
Real values are converted into Discretisation.
BINARY CLASSIFICATION AND RELATED TASKS
Binary Classification
● Classifier maps Instance space to a specific class labels
Ĉ: X Ci
Where X is an instance, Ci is class

(X , C(X)) then
C(X)  Actual class
Ĉ(X)  Predict class
If C(X) == Ĉ(X) then its good classification
● Class labels are finite set.

● Only two possible class labels.

● Example:
E-Mail Recognition:

SPAM Positive,

HAM  Negative
Performance of Classifier

● Confusion Matrix or Contigency Table: SPAM:+ve, HAM:-ve


row  Actual class
column  Predicted class
ACCURACY
● Portion of correctly classified Positive and Negative / Total Instances.

Accuracy= 30+40/100

=70/100
=0.7 (70%)

● True Positive: Prediction is +ve and is correct.


●True Negative: Prediction is –ve and is correct.
●False Positive: Prediction is +ve and is wrong.
●False Negative: Prediction is –ve and is wrong.
Predicted Positive Predicted Negative MARGINA
LS
Actual Positive(+) 30 20 50
TRUE POSITIVE (TP) FALSE NEGATIVE (FP)
Actual Negative(-) 10 40 50
FALSE POSITIVE (FP) TRUE NEGATIVE (TN)

MARGINALS 40 60 100

● TPR (True Positive Rate) = 30/50=

● FPR (False Positive Rate) = 10/50=

● FNR (False Negative Rate) = 20/50=

● TNR(True Negative Rate) = 40/50=


● Unequal number of Positives and Negatives then calculate Weighted
Average.

Predicted Predicted MARGINAL


Positive Negative
S
Actual Positive 60 (TP) 15 (FN) 75
Actual Negative 10 (FP) 15 (TN) 25
MARGINA 70 30 100
● Calculate Weighted
LS Average:
TPR= 0.80 TNR=0.60
Accuracy = 0.75 * 0.80 + 0.25 * 0.60
= 0.75
Visualising Performance (Coverage Plots)

● Degree of freedom.
● Number of Positives
● Number of Negatives.
● True Positive.
● False Positives
Visualising Performance
C1: TP is More
FP is less
C1 is better than C2

C2: TP is Less
FP is more

C3: 40 TP
20 FP

● If the choice is TP then C3


● If the choice is FP then C1.
● If both then either C1 or C3.
Scoring And Ranking
● Mapping from instance space to k-vector real Numbers.
S: X -> Rk

● Scores indicates
Ŝi(x)  Ci

● Score for Positive class.


● Left: 20/40  1/2 log(1/2) -1
● Middle: 10/5  2 log 2 1
● Right: 20/5  4  log 4  2
● C(x)  +1 for Positive examples.
 -1 for Negative examples.

● Z(X) = C(x) Ŝ(x)


>0 then Correct
<0 then Incorrect

● Z(x) is margin
Large Positive margin then reward.
Large Negative Margin then Penalty.

● Loss Function :[0-1] is a simple Loss Function.


Accessing and Visualizing Ranking Performance

● Ranking Error :29%


● Ranking Accuracy: 71%
ROC and AUC
● Sensitivity vs Specificity

● +ve Recall Out of all +ve samples, how many samples was my classifier
able to pick up.

● TPR –> TP/TP + FN

● -ve Recall  Out of all –ve samples.

● TN/TN+FP

● 1-Specificity - FP/TN+FP OR 1- TP/ TP+FN


1. Threshold =0 ; TPR= 2/2+0 =1 FPR=2/0+2 = 1

2. Threshold= 0.2; TPR= 1 FPR= 1

3. Threshold = 0.4 TPR=1 FPR= 0.5

4. Threshold= 0.6 TPR= 0.5 FPR= 0.5

5. Threshold = 0.8 TPR= 0.5 FPR= 0

6. Threshold=1 TPR=0 FPR=0

You might also like