0% found this document useful (0 votes)

3 views

Study of Ensemble Classifers

This report compares ensemble and nonlinear classifiers for breast cancer diagnosis using the Breast Cancer Wisconsin dataset. It evaluates models such as Decision Trees, Random Forest, AdaBoost, Support Vector Machine, and k-Nearest Neighbors, highlighting that Random Forest and AdaBoost achieved the highest accuracy and robustness. The findings suggest that ensemble methods are preferable for complex datasets, while k-NN is suitable for simpler cases.

Uploaded by

Kärthïçk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Study of Ensemble Classifers

Uploaded by

Kärthïçk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Machine Learning Classification

for Breast Cancer Diagnosis: A

Comparative Study of Ensemble and
Nonlinear Classifiers

Name :Karthick R Roll No: 242SP015

1. Introduction
In machine learning, classification tasks are fundamental to
predictive modeling, where the goal is to predict labels for input
samples based on their features. In binary classification, the task
is to categorize data into one of two classes, making it common in
fields like healthcare, finance, and more. Healthcare, for example,
often uses binary classification to identify cancerous and non-
cancerous cells.

Ensemble learning methods and nonlinear classifiers have gained

significant attention due to their ability to improve predictive
performance, particularly for complex, high-dimensional datasets.
These methods combine the results of multiple models to produce more
robust and accurate predictions. This report explores various
classifiers, including ensemble methods—Decision Trees, Random
Forest, and AdaBoost—alongside nonlinear models such as Support
Vector Machine (SVM) and k-Nearest Neighbors (k-NN). These models are
evaluated using the well-known Breast Cancer Wisconsin dataset.

2. Dataset Description
The dataset used for evaluation is the Breast Cancer Wisconsin
(Diagnostic) dataset, which has been widely used for benchmarking
binary classification models.

Number of Samples: 569

Features: 30 numeric attributes, including measurements like

radius, texture, perimeter, area, smoothness, compactness,
concavity, and symmetry.
Target Classes:

0: Benign (non-cancerous)

1: Malignant (cancerous)

The dataset is well-balanced with equal representation of

benign and malignant tumors, making it ideal for testing
classification models.

3. Ensemble Classifiers
Ensemble methods combine multiple individual models to produce a
final prediction, often improving accuracy and robustness.

3.1 Decision Tree

A decision tree splits the data into subsets based on the most
significant feature at each node. These splits are recursively
performed until the data in each subset is pure, i.e., it contains
instances of only one class.

Pros:

Easy to interpret and visualize.

Can model both linear and nonlinear relationships.

Cons:

Can overfit if not pruned or limited in depth.

Sensitive to noise in the data.

3.2 Random Forest

Random Forest is an ensemble of decision trees. Each tree is built on

a random subset of the data and features, and their predictions are
aggregated (via voting or averaging) to provide a final result.
Pros:

Reduces overfitting compared to individual decision trees.

Robust to noise and outliers.

Works well for both classification and regression.

Cons:

Less interpretable than a single decision tree.

Can be slower to train and predict due to the large

number of trees.

3.3 AdaBoost

AdaBoost (Adaptive Boosting) works by iteratively training weak

learners, typically decision trees, and focusing on the errors made
by previous learners. Each subsequent model is trained to correct the
mistakes made by its predecessors.

Pros:

Often achieves high accuracy, even with weak learners.

Can reduce bias and variance.

Cons:

Sensitive to outliers and noisy data.

May overfit if the model complexity is too high.

4. Nonlinear Classifiers
Nonlinear classifiers are especially useful when the relationship
between features and classes cannot be described by a straight line
or hyperplane.
4.1 Support Vector Machine (SVM) with RBF Kernel

SVM aims to find a hyperplane that separates the classes with the
maximum margin. The Radial Basis Function (RBF) kernel allows SVM to
operate in a higher-dimensional space, enabling it to model complex
decision boundaries.

Pros:

Effective in high-dimensional spaces.

Performs well in cases where the classes are not linearly

separable.

Cons:

Computationally expensive, especially with large datasets.

Requires careful tuning of hyperparameters (C and gamma).

4.2 k-Nearest Neighbors (k-NN)

k-NN classifies a sample by finding the majority class among the k-

nearest training samples in the feature space. It is a simple,
intuitive algorithm that makes predictions based on the proximity of
data points.

Pros:

No training phase (instance-based learning).

Simple and easy to understand.

Cons:

Computationally expensive during inference (as it

requires calculating distances to all training points).

Sensitive to the scale of features and the choice of k.

Can be slow with large datasets and noisy data.

5. Evaluation Metrics
To assess the performance of each classifier, we use several standard
evaluation metrics:

Where:

TP: True positives

TN: True negatives

FP: False positives

FN: False negatives

6. Code
7. Results and Conclusion
The performance of the models on the Breast Cancer Wisconsin dataset
was evaluated using the metrics mentioned above.

Best Models:

Random Forest and AdaBoost achieved the highest accuracy

and robustness. Random Forest's ensemble nature helped
reduce overfitting, while AdaBoost's focus on correcting
errors led to superior performance, especially with weak
learners.

SVM with RBF kernel also performed well, especially when

the data was not linearly separable. However, it required
more tuning and computational power.

Model Comparison:

Decision Tree performed well but was prone to overfitting

and had lower accuracy compared to ensemble methods.

k-NN showed good performance on small datasets but

struggled with larger, more complex data due to its
sensitivity to scaling and computational cost.
8.Conclusion:
Ensemble learning techniques like Random Forest and AdaBoost offer
significant robustness and accuracy, making them ideal for real-world
applications in domains such as healthcare, where precision is
critical. Nonlinear models like SVM are valuable when the data is
highly complex and non-linearly separable, while k-NN provides a
simple but effective solution for smaller, less complex datasets. The
choice of model depends on the dataset's characteristics, problem
requirements, and computational resources.

End of the Report

Policy Analysis Models
100% (3)
Policy Analysis Models
8 pages
Acid-Base Titrations in MATLAB PDF
No ratings yet
Acid-Base Titrations in MATLAB PDF
4 pages
Asian Parliamentary Debate
No ratings yet
Asian Parliamentary Debate
9 pages
ITM Document Format_Vedant
No ratings yet
ITM Document Format_Vedant
5 pages
5 markd
No ratings yet
5 markd
24 pages
Wart Treatment Using Machine Learning Support Vector Algorithm
No ratings yet
Wart Treatment Using Machine Learning Support Vector Algorithm
6 pages
ML Report2
No ratings yet
ML Report2
21 pages
ML Acti
No ratings yet
ML Acti
23 pages
INT524 unit3
No ratings yet
INT524 unit3
35 pages
Review Paper[2]
No ratings yet
Review Paper[2]
3 pages
BSAN Case 3
No ratings yet
BSAN Case 3
9 pages
Breast Cancer Classification
100% (2)
Breast Cancer Classification
16 pages
Machine Learning 1707965934
No ratings yet
Machine Learning 1707965934
15 pages
DL PPR3
No ratings yet
DL PPR3
57 pages
(IJCST-V12I3P13) :thachayani M, Chaitanya Sai Jangam, Kalyan T, SriManjunadh Maddukuri, Sangadi Manikanta
No ratings yet
(IJCST-V12I3P13) :thachayani M, Chaitanya Sai Jangam, Kalyan T, SriManjunadh Maddukuri, Sangadi Manikanta
4 pages
Breast_Cancer_Classification_Report
No ratings yet
Breast_Cancer_Classification_Report
16 pages
ML Unit-3
No ratings yet
ML Unit-3
16 pages
Disease Prediction Using Machine Learning
No ratings yet
Disease Prediction Using Machine Learning
4 pages
A Computational Study On Classification of Malignant
No ratings yet
A Computational Study On Classification of Malignant
63 pages
Clasificatin in Organic CHN
No ratings yet
Clasificatin in Organic CHN
3 pages
Clasificatin in Organic CHN
No ratings yet
Clasificatin in Organic CHN
3 pages
Machine Learning Pros and Cons
No ratings yet
Machine Learning Pros and Cons
3 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
16 pages
Machine learning lecture 2,3,4
No ratings yet
Machine learning lecture 2,3,4
26 pages
ML4 - Decision Trees & Random Forest
No ratings yet
ML4 - Decision Trees & Random Forest
44 pages
machine learning notes
No ratings yet
machine learning notes
19 pages
IJERT Developing A Web Based System For
No ratings yet
IJERT Developing A Web Based System For
5 pages
Breast Cancer Classifier Using Machine Learning
No ratings yet
Breast Cancer Classifier Using Machine Learning
7 pages
Chatgpt Unit - 3
No ratings yet
Chatgpt Unit - 3
4 pages
ML FDP Over All Summary
No ratings yet
ML FDP Over All Summary
44 pages
Classification
No ratings yet
Classification
4 pages
11 W11NSE6220 - Fall 2023 - Zeng
No ratings yet
11 W11NSE6220 - Fall 2023 - Zeng
43 pages
LS_Project_Report
No ratings yet
LS_Project_Report
10 pages
U21amg05 Aif and ML Unit 04 Notes
No ratings yet
U21amg05 Aif and ML Unit 04 Notes
42 pages
Supervised Classification Notes
No ratings yet
Supervised Classification Notes
31 pages
13 PracticalMachineLearning
100% (1)
13 PracticalMachineLearning
84 pages
PeerEval Classification
No ratings yet
PeerEval Classification
5 pages
minor project
No ratings yet
minor project
21 pages
Machine Learning Algorithms For Breast Cancer Prediction
No ratings yet
Machine Learning Algorithms For Breast Cancer Prediction
8 pages
Lec 17 -Dsfa23
No ratings yet
Lec 17 -Dsfa23
32 pages
ML CheatSheet
No ratings yet
ML CheatSheet
14 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
Thong Kam 2008
No ratings yet
Thong Kam 2008
8 pages
"Classifiers": R & D Project by Under The Guidance of
No ratings yet
"Classifiers": R & D Project by Under The Guidance of
59 pages
ML Assigment 3
No ratings yet
ML Assigment 3
4 pages
7 Types of Classification Algorithms
No ratings yet
7 Types of Classification Algorithms
9 pages
Modern Machine Learning in Python
No ratings yet
Modern Machine Learning in Python
50 pages
CH01
No ratings yet
CH01
12 pages
Jntuk Machine Learning 3-2 Unit-3
No ratings yet
Jntuk Machine Learning 3-2 Unit-3
33 pages
Draft Xai
No ratings yet
Draft Xai
16 pages
Dhyey V Desai Supervised Machine Learning Approaches
No ratings yet
Dhyey V Desai Supervised Machine Learning Approaches
5 pages
Multi-Disease Prediction With Machine Learning
No ratings yet
Multi-Disease Prediction With Machine Learning
7 pages
Aditya Predictive
No ratings yet
Aditya Predictive
12 pages
Classification Random Forest
No ratings yet
Classification Random Forest
13 pages
Classification
No ratings yet
Classification
4 pages
Applications of Machine Learning Techniques To Predict Diagnostic Breast Cancer
No ratings yet
Applications of Machine Learning Techniques To Predict Diagnostic Breast Cancer
11 pages
(REPORT) LAB - 2 - Decision - Tree
No ratings yet
(REPORT) LAB - 2 - Decision - Tree
17 pages
ppt on daibeteg
No ratings yet
ppt on daibeteg
27 pages
Breast Cancer Detection and Prediction: Created by
No ratings yet
Breast Cancer Detection and Prediction: Created by
20 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet
Initialize Registers With Addi
No ratings yet
Initialize Registers With Addi
1 page
hpca
No ratings yet
hpca
14 pages
ieee
No ratings yet
ieee
2 pages
Current 230v
No ratings yet
Current 230v
1 page
Updated - Syllabus For Written Test and Selection Criteria For Recruitment To The Post of Technical Assistant ECE
No ratings yet
Updated - Syllabus For Written Test and Selection Criteria For Recruitment To The Post of Technical Assistant ECE
3 pages
National Institute of Technology Rourkela: ST ND
No ratings yet
National Institute of Technology Rourkela: ST ND
7 pages
1000 Frequent Sight Words For All Classes by R Gopinath SG Teacher Ict Education Tools - 3
No ratings yet
1000 Frequent Sight Words For All Classes by R Gopinath SG Teacher Ict Education Tools - 3
1 page
1000 Frequent Sight Words
No ratings yet
1000 Frequent Sight Words
1 page
CCC502 Course Outline
100% (1)
CCC502 Course Outline
3 pages
What Is The Use of Voltage Stabilizer?
No ratings yet
What Is The Use of Voltage Stabilizer?
9 pages
Igcse Physics 3ed TR Coursebook Answers Cambridge Physics 3
No ratings yet
Igcse Physics 3ed TR Coursebook Answers Cambridge Physics 3
39 pages
Ecstatic Dance Terry B teaters
No ratings yet
Ecstatic Dance Terry B teaters
24 pages
Stock Customer CPRR Cabang Dexa Palembang 31 Juli 2023 (AutoRecovered)
No ratings yet
Stock Customer CPRR Cabang Dexa Palembang 31 Juli 2023 (AutoRecovered)
5 pages
Documentation_Practices_in_Agile_Software_Developm
No ratings yet
Documentation_Practices_in_Agile_Software_Developm
10 pages
Weekly Home Learning Plan Grade 8 Quarter 1 Week 1 Date: September 20-25, 2021
No ratings yet
Weekly Home Learning Plan Grade 8 Quarter 1 Week 1 Date: September 20-25, 2021
1 page
Chapter 8 Extraction
100% (1)
Chapter 8 Extraction
11 pages
Hawkins, Jaq - Defining Chaos PDF
No ratings yet
Hawkins, Jaq - Defining Chaos PDF
6 pages
The Impact of Social Media Marketing On Purchase Decisions in The Tyre Industry
100% (1)
The Impact of Social Media Marketing On Purchase Decisions in The Tyre Industry
130 pages
Pushbutton Relay Selector
No ratings yet
Pushbutton Relay Selector
2 pages
Sagar Hanamant Ilager: Sagarilager16
No ratings yet
Sagar Hanamant Ilager: Sagarilager16
2 pages
The Wow Project
No ratings yet
The Wow Project
12 pages
EEE (May 2023)
No ratings yet
EEE (May 2023)
298 pages
Multimeter Parts and Functions
No ratings yet
Multimeter Parts and Functions
3 pages
R 70 Technical Data: Diesel Forklift Trucks
No ratings yet
R 70 Technical Data: Diesel Forklift Trucks
6 pages
Survival in Southern Sudan
No ratings yet
Survival in Southern Sudan
14 pages
Physics Cover
100% (4)
Physics Cover
1,081 pages
Elemental Proxies For Paleosalinity Analysis of Ancient Shales.. - Wei and Algeo - 2020
No ratings yet
Elemental Proxies For Paleosalinity Analysis of Ancient Shales.. - Wei and Algeo - 2020
26 pages
Abstrak Pelaksanaan Peran Dan Fungsi Faskel (Fasilitator Kelurahan) Program Kota Tanpa Kumuh (Kotaku) (Studi Kelurahan Pematang Gubernur Kec. Muara Bangkahulu Kota Bengkulu) Ellin Yuspita D1F003029
No ratings yet
Abstrak Pelaksanaan Peran Dan Fungsi Faskel (Fasilitator Kelurahan) Program Kota Tanpa Kumuh (Kotaku) (Studi Kelurahan Pematang Gubernur Kec. Muara Bangkahulu Kota Bengkulu) Ellin Yuspita D1F003029
13 pages
Lab Activity Physical and Chemical Change
No ratings yet
Lab Activity Physical and Chemical Change
2 pages
Solving Quadratics Using Graphs
No ratings yet
Solving Quadratics Using Graphs
2 pages
EE 2400 Homework 1 - Circuits Chapter 1
No ratings yet
EE 2400 Homework 1 - Circuits Chapter 1
8 pages
TD User Guide V4.2 English-3
No ratings yet
TD User Guide V4.2 English-3
231 pages
Module 4 - Purposive Communication
No ratings yet
Module 4 - Purposive Communication
9 pages
Science10 Q2 Mod4 LightMirrosAndLenses v2
78% (9)
Science10 Q2 Mod4 LightMirrosAndLenses v2
58 pages
Mayur Resume
No ratings yet
Mayur Resume
1 page

Study of Ensemble Classifers

Uploaded by

Study of Ensemble Classifers

Uploaded by

Machine Learning Classification

for Breast Cancer Diagnosis: A

Name :Karthick R Roll No: 242SP015

Ensemble learning methods and nonlinear classifiers have gained

Number of Samples: 569

Features: 30 numeric attributes, including measurements like

The dataset is well-balanced with equal representation of

3.1 Decision Tree

Easy to interpret and visualize.

Can model both linear and nonlinear relationships.

Can overfit if not pruned or limited in depth.

Sensitive to noise in the data.

3.2 Random Forest

Random Forest is an ensemble of decision trees. Each tree is built on

Reduces overfitting compared to individual decision trees.

Robust to noise and outliers.

Works well for both classification and regression.

Less interpretable than a single decision tree.

Can be slower to train and predict due to the large

AdaBoost (Adaptive Boosting) works by iteratively training weak

Often achieves high accuracy, even with weak learners.

Can reduce bias and variance.

Sensitive to outliers and noisy data.

May overfit if the model complexity is too high.

Effective in high-dimensional spaces.

Performs well in cases where the classes are not linearly

Computationally expensive, especially with large datasets.

Requires careful tuning of hyperparameters (C and gamma).

4.2 k-Nearest Neighbors (k-NN)

k-NN classifies a sample by finding the majority class among the k-

No training phase (instance-based learning).

Simple and easy to understand.

Computationally expensive during inference (as it

Sensitive to the scale of features and the choice of k.

Can be slow with large datasets and noisy data.

TP: True positives

TN: True negatives

FP: False positives

FN: False negatives

Random Forest and AdaBoost achieved the highest accuracy

SVM with RBF kernel also performed well, especially when

Decision Tree performed well but was prone to overfitting

k-NN showed good performance on small datasets but

End of the Report

You might also like