0% found this document useful (0 votes)

31 views

DSF Unit 3

Uploaded by

Gayathri Ramasamy

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views

DSF Unit 3

Uploaded by

Gayathri Ramasamy

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

UNIT III

MACHINE
LEARNING
The modeling process - Types of machine learning -
Supervised learning – Unsupervised learning -Semi-
supervised learning- Classification, regression -
Clustering – Outliers and Outlier Analysis.
MODELING PROCESS
 There are 10 steps are involved to make better
machine learning model.
1. Problem Definition
2. Data Collection
3. Data Exploration and Preprocessing
4. Feature Selection
5. Model Selection
6. Model Training
7. Model Evaluation
8. Model Tuning
9. Model Deployment
10. Model Maintenance
Machine learning
 Machine learning is a subset of AI, which enables the
machine to automatically learn from data, improve
performance from past experiences, and make
predictions.
 Machine learning uses algorithms and data sets to
teach computers to learn from data and improve with
experience.
 In simple words, ML teaches the systems to think and
understand like humans by learning from the data.
Types of machine
learning
1. Supervised Machine Learning
2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning
Supervised Learning
 Supervised machine learning is based on
supervision. It this, we train the machines using
the "labelled" dataset, and based on the
training, the machine predicts the output.
 First, we train the machine with the input and
corresponding output, and then we ask the
machine to predict the output using the test
dataset.
 The main goal of the supervised learning
technique is to map the input variable(x) with
the output variable(y). Some real-world
applications of supervised learning are Risk
Assessment, Fraud Detection, Spam filtering, etc
Example
TYPES OF SUPERVISED MACHINE
LEARNING
 Supervised machine learning can be classified
into two types of problems, which are given
below:
1. Classification
2. Regression
CLASSIFICATION:
 Classification deals with predicting categorical

target variables, which represent discrete

classes or labels.
 Eg: classifying emails as spam or not spam,

or predicting whether a patient has a high risk

of heart disease.
TYPES OF SUPERVISED MACHINE
LEARNING
TYPES OF SUPERVISED MACHINE
LEARNING
There are two types of Classifications:
 Binary Classifier: If the classification problem

has only two possible outcomes, then it is

called as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM
or NOT SPAM, CAT or DOG, etc.
 Multi-class Classifier: If a classification

problem has more than two outcomes, then it is

called as Multi-class Classifier.
Example: Classifications of types of crops,
Classification of types of music.
TYPES OF SUPERVISED MACHINE
LEARNING
Some popular classification algorithms are
given below:
 1. Random Forest Algorithm
 2. Decision Tree Algorithm
 3. Logistic Regression Algorithm
 4. Support Vector Machine Algorithm
TYPES OF SUPERVISED MACHINE
LEARNING
REGRESSION:
 Regression algorithms are used if there is a

relationship between the input variable and the

output variable.
 It is used for the prediction of continuous
variables, such as Weather forecasting, Market
Trends, etc.
 Below are some popular Regression algorithms:

1. Linear Regression
2. Regression Trees
3. Non-Linear Regression
4. Bayesian Linear Regression
5. Polynomial Regression
UNSUPERVISED
LEARNING
 As its name suggests, there is no need for
supervision. It means, in unsupervised
machine learning, the machine is trained
using the unlabeled dataset, and the machine
predicts the output without any supervision.
 Here, the models are trained with the data

that is neither classified nor labelled, and the

model acts on that data without any
supervision.
UNSUPERVISED
LEARNING
 It analyzes and clusters unlabeled datasets
using machine learning algorithms. These
algorithms find hidden patterns and data
without any human intervention.
 It means, we don’t give output to our model.

The training model has only input parameter

values and discovers the groups or patterns
on its own.
TYPES OF UNSUPERVISED MACHINE
LEARNING
 Unsupervised Learning can be further
classified into two types, which are given
below:
1. Clustering
2. Association
CLUSTERING
 Clustering in unsupervised machine learning

is the process of grouping unlabeled data into

clusters based on their similarities.
 The goal of clustering is to identify patterns

and relationships in the data without any prior

knowledge of the data’s meaning.
TYPES OF UNSUPERVISED MACHINE
LEARNING
 Some common clustering algorithms

1) K-means Clustering: Partitioning Data into

K Clusters
2) Hierarchical Clustering: Building a
Hierarchical Structure of Clusters
3) Density-Based Clustering (DBSCAN):
Identifying Clusters Based on Density
4) Mean-Shift Clustering: Finding Clusters
Based on Mode Seeking
5) Spectral Clustering: Utilizing Spectral
Graph Theory for Clustering
TYPES OF UNSUPERVISED MACHINE
LEARNING
 ASSOCIATION:
 Association rule learning finds interesting relations
among variables within a large dataset. The main
aim is to find the dependency of one data item on
another data item and map those variables
accordingly so that it can generate maximum profit.
 Eg: Market Basket analysis, Web usage mining,
continuous production, etc.
 Some common clustering algorithms
 1) Apriori Algorithm
 2) FP-Growth Algorithm
 3) Eclat Algorithm
 4) Efficient Tree-based Algorithms
SEMI-SUPERVISED LEARNING
 Semi-Supervised learning is a type of Machine
Learning algorithm that represents the
intermediate ground between Supervised and
Unsupervised learning algorithms.
 It uses the combination of labeled and
unlabeled datasets during the training period
SEMI-SUPERVISED LEARNING
 Assumptions followed by Semi-Supervised
Learning:
Continuity Assumption:
 Also known as the smoothness assumption, this

assumption states that data points that are close

to each other are likely to have the same label.
Cluster assumption
 This assumption states that data naturally form

discrete clusters, and that points in the same

cluster are more likely to have a common label.
Manifold assumption
 This assumption states that the data lies roughly

in a lower-dimensional space than the input space.

SEMI-SUPERVISED LEARNING
 Text document classification
 Anomaly detection - Fraud detection
 NLP – Sentiment analysis
 Speech recognition
 Medical imaging – tumor detection, disease

classification
 Drug discovery
OUTLIER
 Outliers in machine learning refer to data
points that are significantly different from the
majority of the data. These data points can be
anomalous, noisy, or errors in measurement.
 An outlier is a data point that significantly

deviates from the rest of the data.

 It can be either much higher or much lower

than the other data points, and its presence

can have a significant impact on the results of
machine learning algorithms.
 They can be caused by measurement or

execution errors. The analysis of outlier data

is referred to as outlier analysis or outlier
mining.
OUTLIER
TYPES OF OUTLIERS
1. Global outliers:
 isolated data points that are far away from the

main body of the data.

 often easy to identify and remove.

2. Contextual outliers:
 These are unusual in a specific context but

may not be outliers in a different context.

 often more difficult to identify and may

require additional information or domain

knowledge to determine their significance.
Outlier Detection Methods in
Machine Learning
1. Statistical Methods:
 Z-Score: This method calculates the standard deviation
of the data points and identifies outliers as those with
Z-scores exceeding a certain threshold (typically 3 or -
3).
 Interquartile Range (IQR): IQR identifies outliers as

data points falling outside the range defined by Q1-

k*(Q3-Q1) and Q3+k*(Q3-Q1), where Q1 and Q3 are the
first and third quartiles, and k is a factor (typically 1.5).
2. Distance-Based Methods:
 K-Nearest Neighbors (KNN): KNN identifies outliers as

data points whose K nearest neighbors are far away from

them.
 Local Outlier Factor (LOF): This method calculates the

local density of data points and identifies outliers as

those with significantly lower density compared to their
Outlier Detection Methods in
Machine Learning
3. Clustering-Based Methods:
 Density-Based Spatial Clustering of

Applications with Noise (DBSCAN):

In DBSCAN, clusters data points based on their
density and identifies outliers as points not
belonging to any cluster.
 Hierarchical clustering:

Hierarchical clustering involves building a

hierarchy of clusters by iteratively merging or
splitting clusters based on their
similarity. Outliers can be identified as clusters
containing only a single data point or clusters
significantly smaller than others.
Techniques for Handling Outliers
in Machine Learning
1. Removal:
 This involves identifying and removing outliers

from the dataset before training the

model. Common methods include:
◦ Thresholding: Outliers are identified as data points
exceeding a certain threshold (e.g., Z-score > 3).
◦ Distance-based methods: Outliers are identified
based on their distance from their nearest neighbors.
◦ Clustering: Outliers are identified as points not
belonging to any cluster or belonging to very small
clusters.
Techniques for Handling Outliers
in Machine Learning
2. Transformation:
 This involves transforming the data to reduce

the influence of outliers. Common methods

include:
◦ Scaling: Standardizing or normalizing the data to
have a mean of zero and a standard deviation of one.
◦ Winsorization: Replacing outlier values with the
nearest non-outlier value.
◦ Log transformation: Applying a logarithmic
transformation to compress the data and reduce the
impact of extreme values.
Techniques for Handling Outliers
in Machine Learning
3. Robust Estimation:
 This involves using algorithms that are less

sensitive to outliers. Some examples include:

◦ Robust regression: Algorithms like L1-regularized
regression or Huber regression are less influenced by
outliers than least squares regression.
◦ M-estimators: These algorithms estimate the model
parameters based on a robust objective function that
down weights the influence of outliers.
◦ Outlier-insensitive clustering
algorithms: Algorithms like DBSCAN are less
susceptible to the presence of outliers than K-means
clustering.
Techniques for Handling Outliers
in Machine Learning
4. Modeling Outliers:
 This involves explicitly modeling the outliers as

a separate group. This can be done by:

◦ Adding a separate feature: Create a new feature
indicating whether a data point is an outlier or not.
◦ Using a mixture model: Train a model that assumes
the data comes from a mixture of multiple
distributions, where one distribution represents the
outliers.
TECHNIQUES FOR OUTLIER
ANALYSIS
1. Visual inspection: using plots to identify
outliers
2. Statistical methods: using metrics like
mean, median, and standard deviation to detect
outliers
3. Machine learning algorithms: using
algorithms like One-Class SVM, Local Outlier
Factor (LOF), and Isolation Forest to detect
outliers.

Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
IB Questionbank Mathematical Studies 3rd Edition 1
No ratings yet
IB Questionbank Mathematical Studies 3rd Edition 1
2 pages
Session 3 Types of Machine Learning (1)
No ratings yet
Session 3 Types of Machine Learning (1)
22 pages
Classification of Machine Learning
No ratings yet
Classification of Machine Learning
73 pages
Chapter 1 ML
No ratings yet
Chapter 1 ML
30 pages
ML - Unit 1
No ratings yet
ML - Unit 1
87 pages
3 Introduction To Machine Learning
No ratings yet
3 Introduction To Machine Learning
21 pages
Unit 1 PDF
No ratings yet
Unit 1 PDF
135 pages
CH 01 Intro To ML - Updated
No ratings yet
CH 01 Intro To ML - Updated
66 pages
Lecture 4 Machine Learning - Bcsc
No ratings yet
Lecture 4 Machine Learning - Bcsc
45 pages
ML
No ratings yet
ML
17 pages
Machine Learning - Part -1
No ratings yet
Machine Learning - Part -1
17 pages
ML Doc1
No ratings yet
ML Doc1
14 pages
Unit 1
No ratings yet
Unit 1
52 pages
Pdf&rendition 1 3
No ratings yet
Pdf&rendition 1 3
49 pages
Introduction To Machine Learning-1
No ratings yet
Introduction To Machine Learning-1
28 pages
Unit-5 Machine Learning
No ratings yet
Unit-5 Machine Learning
25 pages
Machine Learning - Brief
No ratings yet
Machine Learning - Brief
12 pages
MLT Unit 1
No ratings yet
MLT Unit 1
15 pages
Pricas Supervised Learning
No ratings yet
Pricas Supervised Learning
18 pages
University Institute of Engineering Department of Computer Science and Engg
No ratings yet
University Institute of Engineering Department of Computer Science and Engg
27 pages
IMTC634_Data Science_Chapter 6
No ratings yet
IMTC634_Data Science_Chapter 6
22 pages
Basics of Machine Learning
No ratings yet
Basics of Machine Learning
38 pages
FDS Assignment
No ratings yet
FDS Assignment
76 pages
ML notes
No ratings yet
ML notes
10 pages
Machine Learning BE Merged Modules
No ratings yet
Machine Learning BE Merged Modules
561 pages
module 1
No ratings yet
module 1
47 pages
UNit 1 Introduction To ML
No ratings yet
UNit 1 Introduction To ML
225 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
61 pages
Colloquium Evaluation: Faculty of Computer Science and Engineering To:Kanika Gupta Ma'Am Bhavya Sethi 16csu082
No ratings yet
Colloquium Evaluation: Faculty of Computer Science and Engineering To:Kanika Gupta Ma'Am Bhavya Sethi 16csu082
12 pages
Python UNIT-5
100% (1)
Python UNIT-5
67 pages
CSL0777 L05dfd
No ratings yet
CSL0777 L05dfd
26 pages
Social Media Analytics Techniques[1] (1)
No ratings yet
Social Media Analytics Techniques[1] (1)
77 pages
DM Chapter 0
No ratings yet
DM Chapter 0
4 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
16 pages
Unit-4object Segmentation Regression Vs Segmentation Supervised and Unsupervised Learning Tree Building Regression Classification Overfitting Pruning and Complexity Multiple Decision Trees
No ratings yet
Unit-4object Segmentation Regression Vs Segmentation Supervised and Unsupervised Learning Tree Building Regression Classification Overfitting Pruning and Complexity Multiple Decision Trees
25 pages
Fulldoc - Dsec Mca - Crime Prediction (1) - 051521
No ratings yet
Fulldoc - Dsec Mca - Crime Prediction (1) - 051521
65 pages
LM #02-ML Concepts & Frameworks
No ratings yet
LM #02-ML Concepts & Frameworks
31 pages
FML AAT (Techtalk)
No ratings yet
FML AAT (Techtalk)
25 pages
AIYA SESSION 4
No ratings yet
AIYA SESSION 4
42 pages
Machine Learning Models
No ratings yet
Machine Learning Models
11 pages
Machine Learning
No ratings yet
Machine Learning
42 pages
Machine Learning Types
No ratings yet
Machine Learning Types
30 pages
Supervised Unsupervised Reinforcement
No ratings yet
Supervised Unsupervised Reinforcement
39 pages
Ch01_ICS422_02
No ratings yet
Ch01_ICS422_02
39 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
Ai Unit-4 ML
No ratings yet
Ai Unit-4 ML
4 pages
ML Unit-1 (CEC)
No ratings yet
ML Unit-1 (CEC)
108 pages
Module 1
No ratings yet
Module 1
122 pages
ML Unit1
No ratings yet
ML Unit1
25 pages
MACHINE LEARNING
No ratings yet
MACHINE LEARNING
42 pages
Machine Learning: BE Sixth Semester 20CS610
No ratings yet
Machine Learning: BE Sixth Semester 20CS610
211 pages
Week 6 - Lecture 11-1
No ratings yet
Week 6 - Lecture 11-1
28 pages
Unit 5 big data
No ratings yet
Unit 5 big data
14 pages
AI lab6 (1)
No ratings yet
AI lab6 (1)
7 pages
Types of Machine Learning - Tpoint Tech
No ratings yet
Types of Machine Learning - Tpoint Tech
10 pages
Machine Learning Algorithms
No ratings yet
Machine Learning Algorithms
25 pages
MACHINE LEARNING
No ratings yet
MACHINE LEARNING
5 pages
Basic of Machine Learning
No ratings yet
Basic of Machine Learning
7 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Dbms Seminar
No ratings yet
Dbms Seminar
24 pages
Cs3491 - Aiml Lab Record
No ratings yet
Cs3491 - Aiml Lab Record
26 pages
AIML Manual _merged
No ratings yet
AIML Manual _merged
41 pages
Question Bank (1)
No ratings yet
Question Bank (1)
18 pages
CS2221 PROGRAMMING IN C Syllabus
No ratings yet
CS2221 PROGRAMMING IN C Syllabus
2 pages
Ex 6
No ratings yet
Ex 6
7 pages
Notes 1
No ratings yet
Notes 1
97 pages
Visualization - Hist and Box
No ratings yet
Visualization - Hist and Box
23 pages
June 13 s1
No ratings yet
June 13 s1
6 pages
Ferreira JCE2013
No ratings yet
Ferreira JCE2013
9 pages
0580 w06 QP 4
No ratings yet
0580 w06 QP 4
8 pages
Measures of Central Tendency and Dispersion Formulas
No ratings yet
Measures of Central Tendency and Dispersion Formulas
13 pages
Basic Stats Questions
No ratings yet
Basic Stats Questions
25 pages
Week 6+7+8
No ratings yet
Week 6+7+8
37 pages
602 Statistics - Tangean Marilou D
No ratings yet
602 Statistics - Tangean Marilou D
16 pages
DA-Interview Reference Material
No ratings yet
DA-Interview Reference Material
8 pages
A Bioinspired Permeable Junction Approach For Sustainable Device Microfabrication
No ratings yet
A Bioinspired Permeable Junction Approach For Sustainable Device Microfabrication
22 pages
Lesson Note 11
No ratings yet
Lesson Note 11
9 pages
Educational Statistics
No ratings yet
Educational Statistics
109 pages
4th 1st Long Test in Mathematics 10 SY 2023 2024
No ratings yet
4th 1st Long Test in Mathematics 10 SY 2023 2024
2 pages
02.2 Graphical Summary Techniques
No ratings yet
02.2 Graphical Summary Techniques
32 pages
80 Câu Hỏi Phỏng Vấn Về Python
No ratings yet
80 Câu Hỏi Phỏng Vấn Về Python
15 pages
Statistics & Probability Theory: Course Instructor: DR. MASOOD ANWAR
No ratings yet
Statistics & Probability Theory: Course Instructor: DR. MASOOD ANWAR
67 pages
Measures of Dispersion (Unit1)
100% (1)
Measures of Dispersion (Unit1)
36 pages
Republic of The Philippines Department of Education
No ratings yet
Republic of The Philippines Department of Education
3 pages
(BBA116) (120-634) - Answer Booklet.
No ratings yet
(BBA116) (120-634) - Answer Booklet.
25 pages
Gated Communities and Property Values
75% (4)
Gated Communities and Property Values
36 pages
Mai SL Practiceq Answers
No ratings yet
Mai SL Practiceq Answers
29 pages
Statistics
No ratings yet
Statistics
7 pages
Additional Mathematics Project Work 2013 (Form 5) : Statistics
88% (48)
Additional Mathematics Project Work 2013 (Form 5) : Statistics
37 pages
S1 Revision Worksheet For Pre-Mock 2 Month: 09 (March 2024) Chapters - 2, 3, 5
No ratings yet
S1 Revision Worksheet For Pre-Mock 2 Month: 09 (March 2024) Chapters - 2, 3, 5
4 pages
Exploring Data: AP Statistics Unit 1: Chapters 1-4
No ratings yet
Exploring Data: AP Statistics Unit 1: Chapters 1-4
83 pages
Introduction To Outlier Analysis Complete
No ratings yet
Introduction To Outlier Analysis Complete
12 pages
Statistics 101 Study Notes
No ratings yet
Statistics 101 Study Notes
33 pages
SM025 GIAT 9 (Solutions)
No ratings yet
SM025 GIAT 9 (Solutions)
3 pages
Unit 7 Measurement and Data
No ratings yet
Unit 7 Measurement and Data
46 pages

DSF Unit 3

Uploaded by

DSF Unit 3

Uploaded by

UNIT III

target variables, which represent discrete

or predicting whether a patient has a high risk

has only two possible outcomes, then it is

problem has more than two outcomes, then it is

relationship between the input variable and the

that is neither classified nor labelled, and the

The training model has only input parameter

is the process of grouping unlabeled data into

and relationships in the data without any prior

1) K-means Clustering: Partitioning Data into

assumption states that data points that are close

discrete clusters, and that points in the same

in a lower-dimensional space than the input space.

deviates from the rest of the data.

than the other data points, and its presence

execution errors. The analysis of outlier data

main body of the data.

may not be outliers in a different context.

require additional information or domain

data points falling outside the range defined by Q1-

data points whose K nearest neighbors are far away from

local density of data points and identifies outliers as

Applications with Noise (DBSCAN):

Hierarchical clustering involves building a

from the dataset before training the

the influence of outliers. Common methods

sensitive to outliers. Some examples include:

a separate group. This can be done by:

You might also like