ML Unit 1
ML Unit 1
Machine Learning
How it is different from traditional Programming:
FEATURES
TASKS
MODEL
1. FEATURES
● Play an important role in Machine Learning.
2. Small number of tasks and use only few different types of features.
3. Models lend machine learning field diversity, but tasks and features
give a unity.
Task: The problem that can be solved with machine
learning
CLASSIFICATION
Types:
1. Binary Classification: Two Classes
Examples: Boolean-Yes/No ; Binary:1/0 ; Result: Pass/Fail
ERROR RATE
Data sample wrongly predicted
1. Geometric Model.
2. Probabilistic Model.
3. Logical Model
GEOMETRIC MODEL
● INSTANCE SPACE:
All possible instances that are present or not present in dataset.
● Easy to visualize.
LINEAR DECISION BOUNDARY: Linear Seperable
● W * X= T
•Distance between instance s is small then the instances are similar in terms of
their feature , so nearby instances will receive same classification or belong to
same cluster.
Training Instance
(A)
ECLUDEAN DISTANCE:
2. PROBABILISTIC MODEL
● Probabilistic in nature.
● Let X-> Instance or Feature Variable,
Y-> Target Variable or Class
Posterior Probability:
Conditional Probabilities
For ex:
lottery=1
SPAM=20
HAM=10
3. Logical Models
● Rules are easy to understand.
● Leaves
Or
3. FEATURES
(Workinghorses of Machine Learning)
● A Kind of Measurement that can Perform on Instance
Space.
● Domain of Features: Real value, Integer, Boolean.
● USES:
1. Feature as SPLIT.
2. Feature as PREDICT.
Feature as SPLIT
● For Example:
Real values are converted into Discretisation.
BINARY CLASSIFICATION AND RELATED TASKS
Binary Classification
● Classifier maps Instance space to a specific class labels
Ĉ: X Ci
Where X is an instance, Ci is class
(X , C(X)) then
C(X) Actual class
Ĉ(X) Predict class
If C(X) == Ĉ(X) then its good classification
● Class labels are finite set.
● Example:
E-Mail Recognition:
SPAM Positive,
HAM Negative
Performance of Classifier
Accuracy= 30+40/100
=70/100
=0.7 (70%)
MARGINALS 40 60 100
● Degree of freedom.
● Number of Positives
● Number of Negatives.
● True Positive.
● False Positives
Visualising Performance
C1: TP is More
FP is less
C1 is better than C2
C2: TP is Less
FP is more
C3: 40 TP
20 FP
● Scores indicates
Ŝi(x) Ci
● Z(x) is margin
Large Positive margin then reward.
Large Negative Margin then Penalty.
● +ve Recall Out of all +ve samples, how many samples was my classifier
able to pick up.
● TN/TN+FP