Ensemble Classifier | Data Mining
Last Updated :
10 Jan, 2022
Ensemble learning helps improve machine learning results by combining several models. This approach allows the production of better predictive performance compared to a single model. Basic idea is to learn a set of classifiers (experts) and to allow them to vote.
Advantage : Improvement in predictive accuracy.
Disadvantage : It is difficult to understand an ensemble of classifiers.
Why do ensembles work?
Dietterich(2002) showed that ensembles overcome three problems -
- Statistical Problem -
The Statistical Problem arises when the hypothesis space is too large for the amount of available data. Hence, there are many hypotheses with the same accuracy on the data and the learning algorithm chooses only one of them! There is a risk that the accuracy of the chosen hypothesis is low on unseen data!
- Computational Problem -
The Computational Problem arises when the learning algorithm cannot guarantees finding the best hypothesis.
- Representational Problem -
The Representational Problem arises when the hypothesis space does not contain any good approximation of the target class(es).
Main Challenge for Developing Ensemble Models?
The main challenge is not to obtain highly accurate base models, but rather to obtain base models which make different kinds of errors. For example, if ensembles are used for classification, high accuracies can be accomplished if different base models misclassify different training examples, even if the base classifier accuracy is low.
Methods for Independently Constructing Ensembles -
- Majority Vote
- Bagging and Random Forest
- Randomness Injection
- Feature-Selection Ensembles
- Error-Correcting Output Coding
Methods for Coordinated Construction of Ensembles -
Reliable Classification: Meta-Classifier Approach
Co-Training and Self-Training
Types of Ensemble Classifier -
Bagging:
Bagging (Bootstrap Aggregation) is used to reduce the variance of a decision tree. Suppose a set D of d tuples, at each iteration
i, a training set D
i of d tuples is sampled with replacement from D (i.e., bootstrap). Then a classifier model M
i is learned for each training set D < i. Each classifier M
i returns its class prediction. The bagged classifier M* counts the votes and assigns the class with the most votes to X (unknown sample).
Implementation steps of Bagging -
- Multiple subsets are created from the original data set with equal tuples, selecting observations with replacement.
- A base model is created on each of these subsets.
- Each model is learned in parallel from each training set and independent of each other.
- The final predictions are determined by combining the predictions from all the models.
Random Forest:
Random Forest is an extension over bagging. Each classifier in the ensemble is a decision tree classifier and is generated using a random selection of attributes at each node to determine the split. During classification, each tree votes and the most popular class is returned.
Implementation steps of Random Forest -
- Multiple subsets are created from the original data set, selecting observations with replacement.
- A subset of features is selected randomly and whichever feature gives the best split is used to split the node iteratively.
- The tree is grown to the largest.
- Repeat the above steps and prediction is given based on the aggregation of predictions from n number of trees.
You can learn read more about in the sklearn documentation.
Similar Reads
Easy Ensemble Classifier in Machine Learning The Easy Ensemble Classifier (EEC) is an advanced ensemble learning algorithm specifically designed to address class imbalance issues in classification tasks. It enhances the performance of models on imbalanced datasets by leveraging oversampling and ensembling techniques to improve classification a
5 min read
Phishing Classification using Ensemble model With the rise of digital technology usage, it is becoming easier for attackers to steal personal information from users by committing phishing, one of the most common and dangerous cybercrimes. In this context, our exploration is related to phishing classification using an ensemble model. In this ar
7 min read
ML | Voting Classifier using Sklearn A Voting Classifier is a ensemble learning technique that combines multiple individual models to make predictions. It predicts output based on majority decision of the models. Instead of using a single model to make predictions, a Voting Classifier trains multiple models and makes the final predicti
4 min read
Voting Classifier We can create prediction models using a variety of machine learning algorithms and approaches, which is an exciting subject. Scikit-Learn Voting Classifier is one such method that may dramatically improve the performance of your models. An ensemble learning approach combines many base models to get
7 min read
Classification of Data Mining Systems Data Mining is considered as an interdisciplinary field. It includes a set of various disciplines such as statistics, database systems, machine learning, visualization and information sciences.Classification of the data mining system helps users to understand the system and match their requirements
1 min read
Basic Concept of Classification (Data Mining) Data Mining: Data mining in general terms means mining or digging deep into data that is in different forms to gain patterns, and to gain knowledge on that pattern. In the process of data mining, large data sets are first sorted, then patterns are identified and relationships are established to perf
10 min read