Study of Ensemble Classifers
Study of Ensemble Classifers
1. Introduction
In machine learning, classification tasks are fundamental to
predictive modeling, where the goal is to predict labels for input
samples based on their features. In binary classification, the task
is to categorize data into one of two classes, making it common in
fields like healthcare, finance, and more. Healthcare, for example,
often uses binary classification to identify cancerous and non-
cancerous cells.
2. Dataset Description
The dataset used for evaluation is the Breast Cancer Wisconsin
(Diagnostic) dataset, which has been widely used for benchmarking
binary classification models.
0: Benign (non-cancerous)
1: Malignant (cancerous)
3. Ensemble Classifiers
Ensemble methods combine multiple individual models to produce a
final prediction, often improving accuracy and robustness.
A decision tree splits the data into subsets based on the most
significant feature at each node. These splits are recursively
performed until the data in each subset is pure, i.e., it contains
instances of only one class.
Pros:
Cons:
Cons:
3.3 AdaBoost
Pros:
Cons:
4. Nonlinear Classifiers
Nonlinear classifiers are especially useful when the relationship
between features and classes cannot be described by a straight line
or hyperplane.
4.1 Support Vector Machine (SVM) with RBF Kernel
SVM aims to find a hyperplane that separates the classes with the
maximum margin. The Radial Basis Function (RBF) kernel allows SVM to
operate in a higher-dimensional space, enabling it to model complex
decision boundaries.
Pros:
Cons:
Pros:
Cons:
Where:
6. Code
7. Results and Conclusion
The performance of the models on the Breast Cancer Wisconsin dataset
was evaluated using the metrics mentioned above.
Best Models:
Model Comparison: