0% found this document useful (0 votes)
3 views

ML - Unit-2 - Machine Learning Algorithm

Uploaded by

pabrimoo
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

ML - Unit-2 - Machine Learning Algorithm

Uploaded by

pabrimoo
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

Unit-2

MACHINE LEARNING – ALGORITHMS AND APPLICATIONS

22MCAP401

Dr. A. Prabhakaran
Assistant Professor, Department of Computer Application / MITS
CONTENTS
• Simple Linear Regression,
• Multiple Linear Regression,
• Classification Methods
• Logistic regression,
• Nearest neighbor Classifier
• Decision trees
• Support Vector Machine
• Genetic Algorithm.
What is linear regression?
1. Regression analysis is one of the most widely used methods for prediction. Linear
regression is probably the most fundamental machine learning method out there and a
starting point for the advanced analytical learning path of every aspiring data scientist.
2. A linear regression is a linear approximation of a causal relationship between two or
more variables. Regression models are highly valuable, as they are one of the most
common ways to make inferences and predictions. Apart from this, regression analysis
is also employed to determine and assess factors that affect a certain outcome in a
meaningful way.
3. As many other statistical techniques, regression models help us make predictions about
the population based on sample data.
sum of squares total (SST)
sum of squares regression (SSR)
sum of squares error (SSE)
R - Square

R-Squared (R² or the coefficient of


determination) is a statistical measure in
a regression model that determines the
proportion of variance in the dependent
variable that can be explained by the
independent variable.
Hierarchical
Cluster
Analysis

Centroid-based Clustering Distribution-based Density-based Clustering


Clustering
Support Vector Machine (SVM) Algorithm

Support Vector Machine (SVM) is a powerful machine learning algorithm used for
linear or nonlinear classification, regression, and even outlier detection tasks.

• Text classification
• Image classification, spam detection
• Handwriting identification, gene expression analysis
• Face detection
• Anomaly detection

Support Vector Machine (SVM) is a supervised machine learning algorithm used


for both classification and regression.
Classification Vs Regression

Classification:
Classification is a process of finding a
function which helps in dividing the Regression
dataset into classes based on
different parameters.
Regression is a process of finding the
Types of ML Classification correlations between dependent and
Algorithms: independent variables.
• Logistic Regression Types of Regression Algorithm:
• K-Nearest Neighbours • Simple Linear Regression
• Support Vector Machines • Multiple Linear Regression
• Kernel SVM • Polynomial Regression
• Naïve Bayes
• Support Vector Regression
• Decision Tree Classification
• Random Forest Classification
• Decision Tree Regression
• Random Forest Regression
Support Vector Machine
The main objective of the SVM algorithm is to find the
optimal hyperplane in an N-dimensional space that can
separate the data points in different classes in the feature
space. The hyperplane tries that the margin between the
closest points of different classes should be as maximum
as possible.
How does SVM work?

1. maximum-margin
hyperplane/hard
margin
2. Soft margins
3. Linearly separable?
4. Kernel (New
Variable)
Support Vector Machine Terminology
1. Hyperplane: Hyperplane is the decision boundary that is used to separate the data
points
2. Support Vectors: Support vectors are the closest data points to the hyperplane,
3. Margin: Margin is the distance between the support vector and hyperplane
4. Kernel: Kernel is the mathematical function, which is used in SVM to map the original
input data points into high-dimensional feature spaces
5. Hard Margin: The maximum-margin hyperplane or the hard margin hyperplane is a
hyperplane that properly separates the data points of different categories without any
misclassifications.
6. Soft Margin: When the data is not perfectly separable or contains outliers, SVM
permits a soft margin technique.
7. C: Margin maximization and misclassification fines are balanced by the regularization
parameter C in SVM.
8. Hinge Loss: A typical loss function in SVMs is hinge loss. It punishes incorrect
classifications or margin violations.
9. Dual Problem: The dual formulation enables the use of kernel tricks and more effective
computing.
Types of Support Vector Machine

Linear SVM: Linear SVMs use a linear decision boundary to separate the data
points of different classes. When the data can be precisely linearly separated,
linear SVMs are very suitable. This means that a single straight line (in 2D) or a
hyperplane (in higher dimensions) can entirely divide the data points into their
respective classes.

Non-Linear SVM: Non-Linear SVM can be used to classify data when it cannot
be separated into two classes by a straight line (in the case of 2D). By using
kernel functions, nonlinear SVMs can handle nonlinearly separable data.
Decision Tree
• A decision tree is a non-parametric supervised learning algorithm for
classification and regression tasks
• Hierarchical tree structure.
(Root node, branches, internal nodes, and leaf nodes)

Root Node
Decision Nodes
Leaf Nodes
Sub-Tree
Pruning
Branch / Sub-Tree
Parent and Child Node
Decision Tree Assumptions
• Binary Splits
• Recursive Partitioning
• Feature Independence
• Homogeneity
• Top-Down Greedy Approach
How decision tree algorithms work? • Categorical and Numerical Features
• Starting at the Root: • Overfitting
• Impurity Measures
• Asking the Best Questions: • No Missing Values
• Branching Out: • Equal Importance of Features
• Repeating the Process: • No Outliers
• Sensitivity to Sample Size
Entropy

Here,
p+ is the probability of positive class
p– is the probability of negative class
S is the subset of the training example
Information Gain
calculate the entropy
Genetic Algorithm.

• Genetic Algorithms(GAs) are adaptive heuristic search algorithms that belong


to the larger part of evolutionary algorithms.
• Natural selection and genetics
Genetic algorithms simulate the process of natural selection ( to changes in their
environment can survive andreproduce and go to the next generation).
• “Survival of the fittest”
• Each generation consists of a population of individuals.
• Foundation of Genetic Algorithms
Foundation of Genetic Algorithms

1. Individuals in the population compete for resources and mate

2. Those individuals who are successful (fittest) then mate to create more

offspring than others

3. Genes from the “fittest” parent propagate throughout the generation, that is

sometimes parents create offspring which is better than either parent.

4. Thus each successive generation is more suited for their environment.


Search space
• Analogous to chromosome
• These variable components are analogous to Genes. Thus a
chromosome (individual) is composed of several genes (variable
components).
Operators of Genetic Algorithms
 Selection Operator
 Crossover Operator
 Mutation Operator
The whole algorithm can be summarized as –
1) Randomly initialize populations p
2) Determine fitness of population
3) Until convergence repeat:
o Select parents from population
o Crossover and generate new population
o Perform mutation on new population
o Calculate fitness for new population
IQ
1. Convex Hull in light of SVMs?
2. Hard Margin SVM and Soft Margin SVM?
3. What is Hinge Loss?
4. What’s the “kernel trick” and how is it useful?
5. What is Polynomial kernel?
6. What is the role of C in SVM? How does it affect the
bias/variance trade-off?
IQ

1. What is a genetic algorithm (GA) and how is it


inspired by biological evolution?
2. explain the terms 'chromosome,' 'gene,' and 'allele' in
the context of GAs?
3. What is 'elitism' in GAs and why might it be used?
4. Define 'hypermutation' and its role in GAs.
5. Explain the concept of 'genetic drift' in GAs.

You might also like