0% found this document useful (0 votes)

41 views

Algorithm Selection

This document compares various machine learning algorithms and discusses Naive Bayes and Support Vector Machines in more detail. It provides pros and cons of algorithms like Random Forest, Linear Regression, and K-Nearest Neighbors. For Naive Bayes, it explains the calculations and presents accuracy metrics like confusion matrix, accuracy rate, and precision. For Support Vector Machines, it discusses interpreting weights and using visualization to explain the algorithm.

Uploaded by

XdASD

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views

Algorithm Selection

Uploaded by

XdASD

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

1

COMPARISON OF ML ALGORITHMS
VARIOUS ML ALGORITHMS

PROS & CONS

Algorithm Best at Pros Cons
Random Forest Apt at almost any Can work in parallel Difficult to interpret
machine learning
problem Seldom overfits Weaker on regression when
estimating values at the
Bioinformatics Automatically handles extremities of the distribution
missing values of response values

No need to transform any Biased in multiclass problems

variable toward more frequent classes

No need to tweak
parameters

Can be used by almost

anyone with excellent
results
Linear regression Baseline predictions Simple to understand and You have to work hard to make
explain it fit nonlinear functions
Econometric
predictions It seldom overfits Can suffer from outliers

Modelling marketing Using L1 & L2

responses regularization is effective
in feature selection

Fast to train

Easy to train on big data

thanks to its stochastic
version
Support Vector Character recognition Automatic nonlinear Difficult to interpret when
Machines feature creation applying nonlinear kernels
Image recognition
Can approximate complex Suffers from too many
Text classification nonlinear functions examples, after 10,000
examples it starts taking too
long to train
K-nearest Computer vision Fast, lazy training Slow and cumbersome in the
Neighbors predicting phase
Multilabel tagging Can naturally handle

BUSINESS ANALYTICS
2

extreme multiclass
Recommender problems (like tagging Can fail to predict correctly due
systems text) to the curse of dimensionality

Spell checking
problems
Naive Bayes Face recognition Easy and fast to Strong and unrealistic feature
implement, doesn’t independence assumptions
Sentiment analysis require too much memory
and can be used for online Fails estimating rare
Spam detection learning occurrences

Text classification Easy to understand Suffers from irrelevant features

Takes into account prior

knowledge
Logistic regression Ordering results by Simple to understand and You have to work hard to make
probability explain it fit nonlinear functions

Modelling marketing It seldom overfits Can suffer from outliers

responses
Using L1 & L2
regularization is effective
in feature selection

The best algorithm for

predicting probabilities of
an event

Fast to train

Easy to train on big data

thanks to its stochastic
version
K-means Segmentation Fast in finding clusters Suffers from multicollinearity

Can detect outliers in Clusters are spherical, can’t

multiple dimensions detect groups of other shape

Unstable solutions, depends on

initialization

BUSINESS ANALYTICS
3

CALCULATIONS AND INTERPRETATION OF

RESULTS
NAÏVE BAYES

NAÏVE BAYES ALGORITHM: THEORY AND CALCULATIONS

Bayes theorem provides a way of calculating the posterior probability, P(c|x), from P(c), P(x), and P(x|c). Naive Bayes
classifier assume that the effect of the value of a predictor (x) on a given class (c) is independent of the values of
other predictors. This assumption is called class conditional independence.

 P(c|x) is the posterior probability of class (target) given predictor (attribute).

 P(c) is the prior probability of class.
 P(x|c) is the likelihood which is the probability of predictor given class.
 P(x) is the prior probability of predictor.

The posterior probability can be calculated by first, constructing a frequency table for each attribute against the
target. Then, transforming the frequency tables to likelihood tables and finally use the Naive Bayesian equation to
calculate the posterior probability for each class. The class with the highest posterior probability is the outcome of
prediction.

BUSINESS ANALYTICS
4

Joint posterior probability for 8 independent variables is a logical extension of two variables, by taking product of
all 8 conditional probabilities. Since python library or any statistical software would do the math, only
demonstration of logic and a sample calculation is adequate.

PRESENTATION OF RESULTS: CONFUSION MATRIX

Learners are expected to build tables of likelihood for 8 variables. (Depending on time available they may do
so only for one or two variables.)

 True positives (TP): These are cases in which we predicted yes (they have the disease),
and they do have the disease.
 True negatives (TN): We predicted no, and they don't have the disease.
 False positives (FP): We predicted yes, but they don't actually have the disease. (Also
known as a "Type I error.")
 False negatives (FN): We predicted no, but they actually do have the disease. (Also
known as a "Type II error.")

Sample Confusion matrix (exact numbers to be populated from Diabetes case)

BUSINESS ANALYTICS
5

INTERPRETATION OF RE SULTS: TESTING ACCURACY

Accuracy is not a single number but depending on precise question being answered, can be from amongst :-

 Accuracy: Overall, how often is the classifier correct?

o (TP+TN)/total = (100+50)/165 = 0.91
 Misclassification Rate: Overall, how often is it wrong?
o (FP+FN)/total = (10+5)/165 = 0.09
o equivalent to 1 minus Accuracy
o also known as "Error Rate"
 True Positive Rate: When it's actually yes, how often does it predict yes?
o TP/actual yes = 100/105 = 0.95
o also known as "Sensitivity" or "Recall"
 False Positive Rate: When it's actually no, how often does it predict yes?
o FP/actual no = 10/60 = 0.17
 Specificity: When it's actually no, how often does it predict no?
o TN/actual no = 50/60 = 0.83
o equivalent to 1 minus False Positive Rate
 Precision: When it predicts yes, how often is it correct?
o TP/predicted yes = 100/110 = 0.91
 Prevalence: How often does the yes condition actually occur in our sample?
o actual yes/total = 105/165 = 0.64

SUPPORT VECTOR MACHINES

THEORY AND CALCULATIONS

The algorithm is best explained using visualization and basic concepts in coordinate geometry.

BUSINESS ANALYTICS
6

INTERPRETATION OF WE IGHTS

Magnitude of weight after due normalization of variables, helps in deciding feature importance. The interpretation
of weights can be explained using OLS (Ordinary Least Squares) regression as an analogy.

BUSINESS ANALYTICS
7

Detailed working example can be used from :-

https://charlienewey.github.io/manually-calculating-an-svms-support-vectors/

INTERPRETATION OF RE SULTS: TESTING ACCURACY

While exact values of results will vary between Naïve Bayes and SVM, the interpretation of testing accuracy is
same. Previous section on Naïve Bayes accuracy refers.

BUSINESS ANALYTICS
8

HEALTHCARE APPLICATIONS
USING ML ALGORITHMS

The value of machine learning in healthcare is its ability to process huge datasets beyond the scope of human
capability, and then reliably convert analysis of that data into clinical insights that aid physicians in planning and
providing care, ultimately leading to better outcomes, lower costs of care, and increased patient satisfaction.

It has been estimated that big data and machine learning in pharma and medicine could generate a value of up to
$100B annually, based on better decision-making, improved efficiency of research/clinical trials, and new tool
creation for physicians, consumers, insurers, and regulators.

a. Disease identification / diagnosis - Creation of a platform to analyze data, and loop it back in real time to
physicians to aid in clinical decision making is CDSS. A physician sees a patient and enters symptoms, data,
and test results into the EMR, there’s machine learning behind the scenes looking at everything about that
patient, and prompting the doctor with useful information for making a diagnosis, ordering a test, or
suggesting a preventive screening. In the long term, we will be able to incorporate bigger sets of data that
can be analyzed in real time to provide all kinds of information to the provider and patient.
b. Show causal relationships in disease prognosis and help in predictions.
c. Patient risk profile – depending on various signs and symptoms and lifestyle factors.
d. Gather public health data and predict epidemic outbreaks.
e. Reduce 1-year mortality - Health systems can reduce 1-year mortality rates by predicting the likelihood of
death within one year of discharge and then match patients with appropriate interventions, care providers,
and support.

NEW SERVICES & PRODUC TS- CLINICAL

1. Application of ML classification algorithms on diabetes dataset is representative case study of
classifying any clinical dataset into infected/not-infected categories. For example, likelihood of
cancer, hypertension etc, through careful selection of variables relevant to predicting that
particular disease.
2. Predict chronic disease - Machine learning can help hospital systems identify patients with
undiagnosed or misdiagnosed chronic disease, predict the likelihood that patients will develop
chronic disease, and present patient-specific prevention interventions.
3. If time-series data is available then diseases can be predicted based on ML from historic data.
4. New products can be developed by integrating ML algorithms into existing diagnostic solutions.
For example, medical image classification uses similar ML classifiers to classify retinal images into
diabetic retinopathy absent/present. Such diagnostic reports can be made more informative for
radiologists by providing ML-provided insights.

NEW SERVICES & PRODUC TS- HEALTHCARE MANAGEMENT

a. Reduce readmissions - Machine learning can reduce readmissions in a targeted, efficient, and patient-
centered manner. Clinicians can receive daily guidance as to which patients are most likely to be readmitted
and how they might be able to reduce that risk.
b. Prevent hospital acquired infections (HAIs). Clinicians can monitor high risk patients and intervene to
reduce that risk by focusing on patient-specific risk factors.

BUSINESS ANALYTICS
9

c. Reduce hospital Length-of-Stay (LOS). Health systems can reduce LOS and improve other outcomes like
patient satisfaction by identifying patients that are likely to have an increased LOS and then ensure that
best practices are followed.
d. Predict propensity-to-pay - Health systems can determine who needs reminders, who needs financial
assistance, and how the likelihood of payment changes over time and after particular events.
e. Predict no-shows - Health systems can create accurate predictive models to assess, with each scheduled
appointment, the risk of a no-show, ultimately improving patient care and the efficient use of resources.

BUSINESS ANALYTICS

Nietzsche, Friedrich - On The Genealogy of Morality (Hackett, 1998)
100% (3)
Nietzsche, Friedrich - On The Genealogy of Morality (Hackett, 1998)
221 pages
SVM Using Python
No ratings yet
SVM Using Python
24 pages
Troubleshooting A Plate Heat Exchanger
No ratings yet
Troubleshooting A Plate Heat Exchanger
3 pages
Machine Learning Note (2)
No ratings yet
Machine Learning Note (2)
40 pages
mla_unit-5'2 (1)
No ratings yet
mla_unit-5'2 (1)
74 pages
Machine Learning
No ratings yet
Machine Learning
10 pages
Unit 3 PPT
No ratings yet
Unit 3 PPT
20 pages
Purva Rawale _ BDA Practical No 2
No ratings yet
Purva Rawale _ BDA Practical No 2
9 pages
Evaluation of Different Classifier
No ratings yet
Evaluation of Different Classifier
4 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
Evaluation of Predictive Models Final
No ratings yet
Evaluation of Predictive Models Final
6 pages
40 Interview Questions On Machine Learning - AnalyticsVidhya
100% (1)
40 Interview Questions On Machine Learning - AnalyticsVidhya
21 pages
5.2
No ratings yet
5.2
62 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
Lecture 9
No ratings yet
Lecture 9
27 pages
Interview Questions On Machine Learning
100% (4)
Interview Questions On Machine Learning
22 pages
Unit 2
No ratings yet
Unit 2
28 pages
Big Data Analytics - Unit 3
No ratings yet
Big Data Analytics - Unit 3
55 pages
6.Data Mining - Classification Ppt
No ratings yet
6.Data Mining - Classification Ppt
37 pages
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
No ratings yet
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
17 pages
A Gentle Introduction To Statistical Hypothesis Tests
No ratings yet
A Gentle Introduction To Statistical Hypothesis Tests
6 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
40 Interview Questions On Machine Learning From Analytics Vidhya
No ratings yet
40 Interview Questions On Machine Learning From Analytics Vidhya
14 pages
Module - 4 - ECE3047 - Machine Learning
No ratings yet
Module - 4 - ECE3047 - Machine Learning
81 pages
Lec05 - Supervised
No ratings yet
Lec05 - Supervised
26 pages
Noida Institute of Engineering and Technology
No ratings yet
Noida Institute of Engineering and Technology
24 pages
DM - Ch4 - Classification (Part1)
No ratings yet
DM - Ch4 - Classification (Part1)
20 pages
SUPERVISED-LEARNING
No ratings yet
SUPERVISED-LEARNING
30 pages
Chapter 7 - LAST
No ratings yet
Chapter 7 - LAST
29 pages
KCA 034 - Unit 2
No ratings yet
KCA 034 - Unit 2
97 pages
ML viva questions
No ratings yet
ML viva questions
25 pages
ML 7th Sem AIML ITE Notes Complete LONG
No ratings yet
ML 7th Sem AIML ITE Notes Complete LONG
202 pages
unit 6 ai
No ratings yet
unit 6 ai
28 pages
Data Mining Chapter
No ratings yet
Data Mining Chapter
6 pages
Chapter 3 Model Evaluation Final
No ratings yet
Chapter 3 Model Evaluation Final
30 pages
Practical # 11
No ratings yet
Practical # 11
10 pages
Machine Learning Study Experiment
No ratings yet
Machine Learning Study Experiment
5 pages
Introduction Class
No ratings yet
Introduction Class
134 pages
Quiz 1 Materials
No ratings yet
Quiz 1 Materials
159 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Classification
100% (2)
Classification
105 pages
Chapter 02 Overview - 4
No ratings yet
Chapter 02 Overview - 4
43 pages
Model Evaluation-I
No ratings yet
Model Evaluation-I
68 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Data Mining 4th Is
No ratings yet
Data Mining 4th Is
24 pages
U21amg05 Aif and ML Unit 04 Notes
No ratings yet
U21amg05 Aif and ML Unit 04 Notes
42 pages
Classification
No ratings yet
Classification
53 pages
Machine Learning Cheatsheet
No ratings yet
Machine Learning Cheatsheet
12 pages
Unit 5
No ratings yet
Unit 5
28 pages
Machine learning algorithms laiki
No ratings yet
Machine learning algorithms laiki
123 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
6th_SEM Machine Learning Notes PDF
100% (1)
6th_SEM Machine Learning Notes PDF
36 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
QUESTIONS
No ratings yet
QUESTIONS
20 pages
Naive Bayes
No ratings yet
Naive Bayes
38 pages
Mastering Dynamic Programming in Java
From Everand
Mastering Dynamic Programming in Java
Ed A Norex
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
From Everand
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
Yuxi (Hayden) Liu
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet
OsiSense Sensing Guide
No ratings yet
OsiSense Sensing Guide
7 pages
Introduction To Anatomy Trans
No ratings yet
Introduction To Anatomy Trans
19 pages
Delhi Forecast For Next Seven Days: 0530-1130 (IST) 1130 - 1730 (IST) 1730-2330 (IST)
No ratings yet
Delhi Forecast For Next Seven Days: 0530-1130 (IST) 1130 - 1730 (IST) 1730-2330 (IST)
2 pages
unknowncitizenQUESTIONS AND ANSWERS
100% (3)
unknowncitizenQUESTIONS AND ANSWERS
3 pages
KHBD Week 3
No ratings yet
KHBD Week 3
48 pages
BullsEye 2
No ratings yet
BullsEye 2
39 pages
Quiet Rage Assignment
No ratings yet
Quiet Rage Assignment
3 pages
A Design Tool For Timber Gridshell
100% (2)
A Design Tool For Timber Gridshell
237 pages
Author Bio Prof - Dr.nour 30-6-2022
No ratings yet
Author Bio Prof - Dr.nour 30-6-2022
39 pages
7 Steps to a Successful Aluminum Extrusion Design eBook
No ratings yet
7 Steps to a Successful Aluminum Extrusion Design eBook
22 pages
Exam Practice Using A Passage To Africa: What Do We Learn About George Alagiah's Role As A Journalist in This Passage?
No ratings yet
Exam Practice Using A Passage To Africa: What Do We Learn About George Alagiah's Role As A Journalist in This Passage?
4 pages
Statement 11-MAY-23 AC 40256846 13042507
No ratings yet
Statement 11-MAY-23 AC 40256846 13042507
4 pages
Solar Energy Types and Uses
No ratings yet
Solar Energy Types and Uses
12 pages
Lab Manual - Surveying Practice - 21CVL48A
No ratings yet
Lab Manual - Surveying Practice - 21CVL48A
26 pages
Further Trigonometric Equations
No ratings yet
Further Trigonometric Equations
3 pages
The Detective Is Already Dead, Vol. 5
No ratings yet
The Detective Is Already Dead, Vol. 5
208 pages
Cubic Spline
No ratings yet
Cubic Spline
2 pages
Treadmill Generator
No ratings yet
Treadmill Generator
3 pages
Intern
No ratings yet
Intern
15 pages
Dcs Bios Overview
No ratings yet
Dcs Bios Overview
3 pages
A Brief History of Entrepreneurship
No ratings yet
A Brief History of Entrepreneurship
14 pages
Effect of Flip-Chip Die-Attach On The Thermal Behavior of Power GaAs Diodes
No ratings yet
Effect of Flip-Chip Die-Attach On The Thermal Behavior of Power GaAs Diodes
5 pages
FREE WORKSHEET
No ratings yet
FREE WORKSHEET
2 pages
DRAFT OPSS - PROV 904 Nov 2024
No ratings yet
DRAFT OPSS - PROV 904 Nov 2024
29 pages
6 Academic Writing (Transitions)
No ratings yet
6 Academic Writing (Transitions)
2 pages
DOC-20240711-WA0000.
No ratings yet
DOC-20240711-WA0000.
40 pages
RRP-02 (1.5x2.5-8 - 8.68lps-63.21m) 20HP - Curva
No ratings yet
RRP-02 (1.5x2.5-8 - 8.68lps-63.21m) 20HP - Curva
1 page
Instructions Related To ISE Exams
No ratings yet
Instructions Related To ISE Exams
1 page

Algorithm Selection

Uploaded by

Algorithm Selection

Uploaded by

1

PROS & CONS

No need to transform any Biased in multiclass problems

Can be used by almost

Modelling marketing Using L1 & L2

Easy to train on big data

Text classification Easy to understand Suffers from irrelevant features

Takes into account prior

Modelling marketing It seldom overfits Can suffer from outliers

The best algorithm for

Easy to train on big data

Can detect outliers in Clusters are spherical, can’t

Unstable solutions, depends on

CALCULATIONS AND INTERPRETATION OF

NAÏVE BAYES ALGORITHM: THEORY AND CALCULATIONS

 P(c|x) is the posterior probability of class (target) given predictor (attribute).

PRESENTATION OF RESULTS: CONFUSION MATRIX

Sample Confusion matrix (exact numbers to be populated from Diabetes case)

INTERPRETATION OF RE SULTS: TESTING ACCURACY

 Accuracy: Overall, how often is the classifier correct?

SUPPORT VECTOR MACHINES

THEORY AND CALCULATIONS

Detailed working example can be used from :-

INTERPRETATION OF RE SULTS: TESTING ACCURACY

NEW SERVICES & PRODUC TS- CLINICAL

NEW SERVICES & PRODUC TS- HEALTHCARE MANAGEMENT

You might also like