2020_BigData_Hybrid_Decision_Tree-Neural_Network
2020_BigData_Hybrid_Decision_Tree-Neural_Network
Abstract—As the Age of Information has evolved over the A hybrid model which is a combination of two models,
last several decades, the demand for technology which stores, has started gaining significant attention in machine learning
analyzes, and utilizes data has increased substantially. Countless to classify / predict performance as compared to a single
industries such as the medical, the retail, and the aircraft
rely on this technology to guide their decision making. In the learning model [4]–[7]. Neural networks already have been
present paper, we propose a hybrid machine learning algorithm used in numerous fields of science and engineering due to
consisting of Decision Trees and Neural Networks which can its capability of prediction for new data after the training is
effectively and efficiently classify data of varying volume and performed with existing dataset. On the other hand, decision
variety. The structure of the hybrid algorithm consists of a tree has capability for feature classification. In the present
decision tree where each node of the tree is a neural network
trained to classify a specific category of the output using binary work, a hybrid approach is proposed combining decision
classification. The data with which we used to train and test tree and neural networks; a neural network is integrated in
the classification ability of our algorithm is the Federal Aviation each node of the decision tree. The method utilizes multiple
Administration’s (FAA’s) Boeing 737 maintenance dataset which neural models which are each individually trained and meshed
consists of 137,236 unique records each composed of 72 variables. together in a way to where the results actually perform better
We perform this by classifying the discrepancy, or cause, of
the incident into whether or not the incident occurred during than if the hybrid method was not implemented. The children
scheduled maintenance operations and then further classifying of any given node represent subcategories derived from the
specific details relating to the incident. Our results indicate that parent node which may be classified as well. This allows for
our hybrid algorithm is able to effectively classify incidents with the algorithm to classify data of varying degrees of granularity.
high accuracy and precision. Additionally the algorithm is able The hybrid approach that we propose utilizes the binary
to identify the most significant inputs regarding a classification
allowing for higher performance and greater optimization. This classification category of machine learning and transforms
demonstrates the algorithm’s applicability in real-world scenarios each neural network into an individual node in a binary tree.
while also showcasing the benefits of combining decision trees and Each node in the tree is predicting for a specific detail. The
neural networks as opposed to using them individually. further down the tree that the predictions go the more detailed
Index Terms—Decision Trees, Machine Learning, Neural Net- the prediction actually becomes. The height of the binary tree
works, Hybrid Learning, Supervised Learning
that is structured represents the amount of details that the
outcome actually predicts for.
I. I NTRODUCTION
Machine Learning (ML), a subfield of Artificial Intelligence This ensemble learning method that is proposed is tested on
(AI) and Data Science, provides effective algorithms which a public Boeing 737 dataset. It is essential to train and test
accomplish the task of prediction and classification from the neural model that is created on a public dataset rather
structured and unstructured data. However, due to the increase than a dataset that was made in a controlled environment.
in data volume and complexity, in recent years this field faces Training the models on a dataset that was obtained from
many challenges when designing and implementing algorithms Boeing 737 validates the results that are produced and shows
capable of efficiently and effectively processing the data. As that this ensemble learning approach can provide improved
such there has been significant research conducted to create results when used on a dataset that is found in the real world.
more robust algorithms. ML consists of two main stages: The dataset that was used included 72 unique variables and
training/learning and testing/predicting. Training/learning in- 137,236 records from the Boeing 737 that were used as input
volves inserting a dataset with known characteristics into a ML for the network to train on with each network having a single
learning model in order to “train” it. Afterwards the model will binary neuron classifying whether the record belongs to the
be able to make predictions based on similar data given to it in specified category or not after being preprocessed. After being
the testing/predicting stage. There are a few recent overviews preprocessed, the data was then split into training, testing, and
of ML in different domains [1]–[3]. To alleviate the weakness validation. The testing data is used to test the accuracy and
in individual machine learning techniques, hybrid approaches F1 of the neural network. The validation data makes sure that
are used. there is no overfitting.
II. R ELATED W ORK fuzzy inference system (ANFIS), relevance vector machine
(RVM), and elastic net (EN). Moreover, the method also used
The first concepts of machine learning started out with the particle swarm optimization (PSO) and sequential quadratic
study of the brain and how neurons fire off when certain optimization (SQP) to achieve the best combination of weights
external events occur [8]. Ensemble learning is the concept to be used in base learners. An extension of the above study is
of using multiple neural models/training models together to done by Li et al. [19]. The study used directed acyclic graph
create a more accurate algorithm than the original one working (DAG) hybridized with long short term memory (LSTM) and
alone. Our algorithm involves using binary classifying neural a convolutional neural network (CNN) to predict the RUL.
models in ensemble with a binary decision tree in order to The method is tested with a turbofan engine degradation
provide for a more accurate prediction when being compared simulation dataset provided by NASA. We [20] proposed a
to just using a single neural model for the predictions. predictive maintenance strategy for Boeing 737 aircraft using
Fatath proposed a hybrid model integrating the maximum the integrated decision tree-neural network model. Marcello et
entropy model, support vector machine, and naive-Bayes for al. [21] proposed an ensemble learning based big data model
multi document text summarization [9]. To improve the clas- for failure rate of equipment subject to different operating
sification accuracy, Polat and Gunes [10] proposed a hybrid conditions. Another recently developed method [22] also used
machine learning model for multi-class problems. The method ensemble learning, which is also capable to handle imbalance
consists of the C4.5 decision tree classifier and one-against- data. The method uses adaptive boosting (AdaBoost) and
all approach. The efficacy of the hybrid method was shown random forests (RF). The state of the art ensemble learning
on image segmentation, dermatology, and lymphography open can be found in [23]–[25]. Taking the vast literature into
source datasets. A recently developed hybrid method also used consideration, we propose a technique by integrating a neural
decision tree, random forest, and gradient boosting for water network in each node of the decision tree.
quality prediction. The method is known as complete ensemble
empirical mode decomposition with adaptive noise (CEEM- III. D ECISION T REE -N EURAL N ETWORK (DT-NN)
DAN). The variants of the method are proposed; one is based A. Integrating Decision Trees and Neural Networks
on gradient boost (CEEMDAN-XGBoost) while the other We hybridize a neural network with a decision tree in the
is based on random forest (CEEMDAN-RF). Interestingly, present work. The motivation is the optimization of a single
both the methods proved their superiority to predict different decision in a classification. A neural network is integrated
parameters [11]. Arabasadi et al. [12] combined a neural at each node of the decision tree as shown in Fig. 1. This
network with a genetic algorithm to increase the performance integration of both the decision tree and the neural network
of the neural networks and the hybrid method was applied approach are superior as compared to both individual methods.
on a heart disease dataset. A machine learning ensemble The performance of the neural networks are well suited for
technique is proposed by Pham et al. [13] to assess landslide classification into categories where the boundaries of classifi-
susceptibility. In the ensemble method, multi layer perceptron cation are less distinct. However, the performance of the neural
(MLP) neural network is integrated with AdaBoost, Bagging, network decreases with the increase in number of categories.
Dagging, MultiBoost, Rotation Forest, and Random SubSpace. The decision tree works with a large number of categories
Another hybrid prediction model is developed by Chen et al. which are distinctly classified. The decision tree is constructed
[14]. In that method, K-means clustering is integrated with the based on a set of binary possible outcomes such as 0, 1. To
J48 decision tree for the diagnosis of Type 2 diabetes. Lin et achieve the same, the best possible result attribute with the
al. [15] dealt with the data from manufacturing industries for highest information gain is selected. To define information
condition based maintenance using MapReduce and multiple gain, we define a measure commonly used in information
classifier types decision trees with dynamic weight adjustment. theory, called entropy, which characterizes the (im)purity of
A just-in-time defect prediction method was proposed by Yang an arbitrary collection of examples.
et al. [16] using ensemble learning. The prediction method is Entropy H(S) is defined as a measure of the amount of
also capable of handling redundancy and data imbalance in uncertainty in any dataset S
addition to ensure robustness. Another study based on just-
in-time prediction also used ensemble learning. The authors
proposed a two-layer ensemble learning approach (TLEL) X
based on decision trees [17]. The outer layer uses different H(S) = −p(c)log2 (p(c))
c∈C
Random Forest models for training whereas the inner layer is
an integration of decision tree and bagging to build a Ran- Where
dom Forest model. Another recent study [18] used ensemble S - The dataset for which entropy is being calculated in the
learning for better predictive performance of remaining useful current iteration.
life (RUL) of aircraft engines and used several methods like C - The set of the classes in S, C = 0, 1.
multiple base learners, including random forests (RFs), classi- p(c) - The proportion of the number of elements in class c
fication and regression tree (CART), recurrent neural networks to the number of elements in set S.
(RNN), autoregressive (AR) model, adaptive network-based If H(S) = 0 then the set S is perfectly classified.
Fig. 1: Hybrid Decision Tree-Neural Network (DT-NN) Model
Information gain IG(A) is defined as the measure of the the network can predict on. Each level of the tree is trained
difference in entropy from before to after the set S is split on different data that is pulled from the original dataset. If the
on a result attribute A. This quantity measures the extent first node in the tree was predicting for a crack in the airplane,
of uncertainty S was reduced after splitting set S on result it would assign the original dataset with a crack problem a 1
attribute A. and all of the other entries without a crack problem 0. If we
continued the tree to predict for something with a crack and
X a fuselage problem, we would separate the original dataset
IG(A, S) = H(S) − p(t)H(t) into two datasets, one with crack data and one without crack
t∈T data, and assign each entry with a fuselage problem a 1 or a
Where, 0 and train a new model for each of the two subcategories.
H(S) - Entropy of set S. Since we split the dataset multiple times, this method would
T - The subsets created from splitting set S by result work best in an environment where there is a lot of data. This
attribute A such that method of predicting for crack then predicting for fuselage
[ in the binary tree format actually performs better than just
S= t.
predicting for entries that contain crack and fuselage or not
t∈T
from the beginning.
p(t) - The proportion of the number of elements in t to the
number of elements in S. C. Data Preprocessing
H(t) - Entropy of subset t. Preprocessing the data is an essential part in order for the
The information gain can be estimated for each remaining neural network to accurately predict. In order for the neural
attribute. The attribute with the largest information gain can be network to process the data, it must be converted into a
used to split the set S on each iteration. Thereafter, with the numeric form. To do this, the data is processed in chunks
largest information gain, a neural network is built. For each 5000 records at a time. Each record in each chunk is looked
binary classification, a neural network which is constructed at and every unique value is added to a dictionary and kept
to classify only if the problem occurs. Each neural network track of. This process of looking in the dictionary to see if that
at each node of the decision tree consists of all the result unique value has been added or not is done for each record.
attributes which could lead to either 0 or 1. As a new value is added to the dictionary it is given a unique
identifier which starts at 100 and increases by 100 for every
B. Outline of the Method unique value recorded that is a part of that record. If any
The structure of the hybrid binary tree classifier gives us the entry in the record is found to be null, it is then assigned
flexibility to find more detailed or less detailed classification a -1 to separate it from the data that is not null. If a value
problems. The structure is formatted in a way so that the height is come across that is actually already in a numeric form,
of the tree represents the complexity of the prediction of the is ignored. After the dictionary is created, the dictionary is
problem. The larger the tree, the more detailed an issue that used with a mapping function alongside with the dataframe
to map each of the values found in the dictionary with its TABLE I: Accident and Incident Data
corresponding numeric value. This process of remapping all
Operator Control Number Difficulty Date
of the corresponding variables is done with each 5000 record Submission Date Operator Designator
chunk in the dataset at a time. Doing this with only 5000 Submitter Designator Submitter Type Code
Receiving Region Code Receiving District Office
records at a time ensures that there is not a memory error SDR Type JASC Code
with reading too many values into memory at a time. This Nature Of Condition A Nature Of Condition B
Nature Of Condition C Precautionary Procedure A
technique of assigning a unique numeric value to all of the Precautionary Procedure B Precautionary Procedure C
non-numeric values in the data is an essential step in order Precautionary Procedure D Stage Of Operation Code
How Discovered Code Registry N Number
for the data to be able to be fed into the neural network for Aircraft Make Aircraft Model
training and producing a working model. Aircraft Serial Number Aircraft Total Time
Aircraft Total Cycles Engine Make
Engine Model Engine Serial Number
D. Pseudocode for Preprocessing Engine Total Time Engine Total Cycles
Propeller Total Time Propeller Total Cycles
The pseudo code for Neural Network used Part Make Part Name
Part Number Part Serial Number
in hybrid approach is given in the following Part Condition Part Location
algorithm: Part Total Time Part Total Cycles
Part Time Since Part Since Code
1: CSV Data: Data read in from a csv file. Component Make Component Model
2: trainData: Subset of CSVData used for training neural Component Name Component Part Number
Component Serial Number Component Location
network. Component Total Time Component Total Cycles
3: testData: Subset of CSVData used for testing neural Component Time Since Component Since Code
Fuselage Station From Fuselage Station To
network. Stringer From Stringer From Side
4: validData: Subset of CSVData used for validating neural Stringer To Stringer To Side
Wing Station From Wing Station From Side
network. Wing Station To Wing Station To Side
5: nn: H2O DeepLearningEstimator neural network model. Butt Line From Butt Line From Side
Butt Line To Butt Line To Side
6: nnMetrics: H2O dataframe containing performance met- Water Line From Water Line To
rics from testing the neural network. Crack Length Number Of Cracks
Corrosion Level Structural Other
7: resultsCSV : CSV file for containing neural network Discrepancy
performance metrics.
8: ImportH2O, H2ODeepLearningEstimator
9: CSV Data = open(”csvfile.csv”,”read”)
10: H2O.init()
sectioning off randomly chosen data into 75% training, 15%
11: H2O.read(CSV Data)
testing, and 10% validation. Using these percentages for the
12: nn = DeepLearningEstimator(hiddenLayers, activationF
neural network will allow the network to be trained and have
unction) the final output model be the best in terms of accuracy and
13: nn.train(invars, outvars, trainData, validData)
F1 score.
14: nnMetrics = nn.test(testData).performanceMetrics The first 72 variables are given as input into the neural
15: resultsCSV = open(”results.csv”, ”write”) network with the last variable, discrepancy, being the output
16: resultsCSV.write(nnMetrics) =0 for the network. To feed the data that we are working with,
we must first format it using a preprocessing algorithm. The
IV. E XPERIMENTS goal that our preprocessing algorithm accomplishes is that it
turns string data into corresponding numerical keys which are
A. Dataset and Preprocessing all recorded in a dictionary. Performing this algorithm starts
The dataset that we used for the experiments, Boeing 737 off with reading the data from a CSV file in chunks of 5000
data, which came from the Federal Aviation Administration, records at a time using the Pandas Dataframe located in the
contains 137,236 records with each record having 73 variables. Pandas library. Each of these records that are read in will
These records included data from aircraft that suffered issues then have corresponding values in a dictionary. Each unique
dealing with mechanical issues with the aircraft. We chose entry per column will be entered into a dictionary with a
this data for two main reasons. The first is because it is a corresponding numerical value. This unique identifying value
real world dataset that was hand documented for what the for that column will then have a corresponding value of the
issue is. Showing that this algorithm can work on a hand string that was originally there. After another unique value
recorded dataset shows the robustness of the given algorithm. is found in that column, this value increases by 100 and is
The second reason is because of the sheer size of the data assigned that. If a value in the Pandas dataframe is a NULL
that we are allowing our algorithm to be trained on. Having a value the number -1 is assigned to it. If a numerical value is
dataset which contains a large number of real world records encountered, that value is ignored due to the fact that it could
allows for us to make sure that the algorithm is able to handle be an important value in determining the result. After all of
complex inputs and perform with high accuracy. The way that this data is translated to its unique numerical identifier, it is
we sectioned off this data to use for the neural network was by then used to generate a CSV file with all of the numerical
values. The final step in the preprocessing stage of the data correlation was used to identify any linear change relationships
is the final creation of the text file which contains the key to between the input variables. The Spearman correlation was
what numerical value maps to what for each column. If for employed to identify monotonic relationships between the in-
example, the data that was encountered was 115.31, and the put variables. For both correlation algorithms, heatmaps were
type for that cell was a real number, then that value will be produced showing the level of correlation of the variables.
ignored. If a piece of data that was a string, the value would Additionally the variables with the highest correlation with
be entered into the text file just as it was documented like in the Discrepancy output were identified using both correlation
the CSV file. algorithms.
VI. R ESULTS not improve after the 11 inputs which were identified as the
important variables.
A. Number of Significant Inputs Correlation The AUC difference can be viewed as a less accurate value
Figure 2a shows the comparison of classification results into for measuring performance. In this case, it can be attributed to
Maintenance and Non-Maintenance categories by the number the low number of values measured to create the curve. This
of inputs. In particular, the figure shows the Accuracy and could explain the difference when measuring the area with
AUC results when using all 72 input variables as opposed AUC versus comparing a single Accuracy result.
to using the top eleven mean weight input variables. The Figure 3c shows that there is a small drop in recall from
results are further broken down by the six activation functions 100% to 85% as more values are added. Figure 3d shows
tested in the NN. The figure shows that the accuracy is nearly the Precision as the number of variables increases with the
identical for the activation functions regardless of whether mean weight decreasing. The precision value increases and
all 72 input variables are used or only the top eleven mean then stabilizes once the top 11 weighted variables are used.
input variables are used. The accuracy difference ranged from The results show that precision is determined by the previously
0.12% less accurate with all 72 input variables with the tanh identified significant variables while Recall is only slightly
activation function to 4.77% more accurate when only the top affected by an increase in input variables.
eleven mean weight input variables were used with the rectifier When viewing the F1 value appearing in Figure 4a, these
with dropout activation function. Figure 2a also shows that the findings are more evident. F1 increases as strongly weighted
AUC comparisons were more varied. In this case, the range variables are introduced and peaks at 11 variables. The F1
in AUC was -1.76% using the tanh activation function with becomes stable at around 90% using just 11 variables. Addi-
all variables to 4.44% using the tanh with dropout for the top tionally, Figure 4b shows that most outcomes are clustered in
eleven mean weight input variables. the top right except for the high-recall initial values.
In Figure 2b we show the average mean for the leading The results show the correct identification of the significant
inputs weight value between the input layer and the first layer inputs by the method of classifying mean weights in descend-
of the NN. The top 11 variables in the circumference box have ing order. The additional input variables, which do not seem
a mean weight of 0.15 or greater and are the main variables to improve the results, can be attributed to constant values,
relevant for high accuracy results from the NN. The results variables which are dependent on other inputs, or values which
show that issues such as Part Make, Receiving Region Code, are inconsistent with the expected results.
and Part Total Time can clearly be identified as the most C. Correlation of Variables to All Other Variables
relevant classifiers. The Aircraft Model is already identified
as a less unique classifier for the type of issue involved. The heatmaps that are represented in Tables II and III
demonstrate the correlation of each variable to every other
variable using the Pearson/Spearman statistical correlation
B. Number of Total Inputs Correlation
algorithm to find out how closely associated each variable
Figure 3 details our findings for the correlation of the is with another. In these tables, a 25 input sampling of the
number of inputs to AUC, Accuracy, Recall, and Precision. 72 input variables is shown. The closer the number is to +1,
Figures 3a and 3b show that both the NN’s AUC and accuracy the higher the positive linear correlation is with the variable
increase initially as the number of inputs into the NN increase. being compared and the greener the area is in the heatmap.
However, both AUC and accuracy are less affected as the The closer the number is to -1, the higher the the negative
number of inputs continue to increase. AUC increases only linear correlation is with the variable being compared and the
until 16 total inputs, are used and accuracy of the NN does redder the area is on the heatmap. When the number is zero,
(a) AUC vs. Inputs (b) Accuracy vs. Inputs
it means that there is no linear correlation between the two significant variable with both Pearson and the Spearman test.
variables and the area is yellow. So, closer the number is to These results could be implemented/applied in the future to
1, the more association the variables have with each other. determine which significant inputs could be used for training
the neural networks.
Using both the Spearman and Pearson correlation tests, the
variables with the highest correlation to Discrepancy were also VII. C ONCLUSION
determined. The Pearson test results shown in Figure 5 show
a linear correlation coefficient of 0.15 or greater between the In this paper we propose a hybrid learning strategy strategy
input variables AircraftTotalCycles, PartSinceCode, Aircraft- by integrating a neural network with decision tree. The hybrid
TotalTime, StringerFromSide. The Spearman test shown in algorithm is tested with the Federal Aviation Administration
Figure 6 shows a correlation coefficient of 0.15 or greater (FAA) data for Boeing 737. Several simulated experiments
for the input variables AircraftTotalCycles, AircraftTotalTime, have been performed to test the efficacy of the proposed
PartSinceCode, StringerFromSide, StringerFrom, and Submit- hybrid method. The method is tested with various network
terDesignator. Analyzing the results of the combined tests architectures, activation functions, and different hidden layers.
reveals that AircraftTotalCycles, PartSinceCode, and Aircraft- The hybrid method is also verified by selecting the contributing
TotalTime are the three variables with the highest degree of input features, and the similar prediction results confirm that
correlation. AircraftTotalCycles was shown to be the most it successfully identified the redundant features.
To optimize our NN, three correlations were examined [15] C.-C. Lin, L. Shu, D.-J. Deng, T.-L. Yeh, Y.-H. Chen, and H.-L. Hsieh,
with regard to classifying Discrepancy; 1) the number of “A mapreduce-based ensemble learning method with multiple classifier
types and diversity for condition-based maintenance with concept drifts,”
significant inputs to classification accuracy, 2) the total number IEEE Cloud Computing, vol. 4, no. 6, pp. 38–48, 2017.
of inputs to classification accuracy, and 3) the correlation of [16] I. H. Laradji, M. Alshayeb, and L. Ghouti, “Software defect prediction
variables to all other variables. The first correlation showed using ensemble learning on selected features,” Information and Software
Technology, vol. 58, pp. 388–402, 2015.
that the number of inputs could be reduced from 72 to eleven [17] X. Yang, D. Lo, X. Xia, and J. Sun, “Tlel: A two-layer ensemble learning
significant inputs without a reduction in accuracy of the NN. approach for just-in-time defect prediction,” Information and Software
The second correlation further validated this by showing that Technology, vol. 87, pp. 206–220, 2017.
[18] Z. Li, K. Goebel, and D. Wu, “Degradation modeling and remaining
AUC, Accuracy, Recall and Precision stabilize with the eleven useful life prediction of aircraft engines using ensemble learning,”
significant inputs and gradually deteriorate with the addition of Journal of Engineering for Gas Turbines and Power, vol. 141, no. 4,
new inputs. The third correlation further showed via heatmaps 2019.
[19] J. Li, X. Li, and D. He, “A directed acyclic graph network combined
that only a small number of inputs have a high correlation with cnn and lstm for remaining useful life prediction,” IEEE Access,
with Discrepancy. Using these three correlations, we show vol. 7, pp. 75464–75475, 2019.
that the significant input features can be identified and that [20] J. Carson, K. Hollingsworth, R. Datta, and A. Segev, “Failing &! falling
(f&! f): Learning to classify accidents and incidents in aircraft data,” in
the total number of features can be reduced without affecting 2019 IEEE International Conference on Big Data (Big Data), pp. 4357–
the accuracy of the NN. The hybrid learning method can be 4365, IEEE, 2019.
tested in more case studies in the future. Moreover, the method [21] B. Marcello, C. Davide, F. Marco, G. Roberto, M. Leonardo, and
P. Luca, “An ensemble-learning model for failure rate prediction,”
can also be tested for transfer learning in different domains. Procedia Manufacturing, vol. 42, pp. 41–48, 2020.
[22] P. Zuvela, M. Lovric, A. Yousefian-Jazi, and J. J. Liu, “Ensemble
R EFERENCES learning approaches to data imbalance and competing objectives in
[1] A. H. Vo, T. R. Van Vleet, R. R. Gupta, M. J. Liguori, and M. S. design of an industrial machine vision system,” Industrial & Engineering
Rao, “An overview of machine learning and big data for drug toxicity Chemistry Research, vol. 59, no. 10, pp. 4636–4645, 2020.
evaluation,” Chemical Research in Toxicology, vol. 33, no. 1, pp. 20–37, [23] R. Polikar, “Ensemble learning,” in Ensemble machine learning, pp. 1–
2019. 34, Springer, 2012.
[2] F. Samie, L. Bauer, and J. Henkel, “From cloud down to things: An [24] O. Sagi and L. Rokach, “Ensemble learning: A survey,” Wiley Interdisci-
overview of machine learning in internet of things,” IEEE Internet of plinary Reviews: Data Mining and Knowledge Discovery, vol. 8, no. 4,
Things Journal, vol. 6, no. 3, pp. 4921–4934, 2019. p. e1249, 2018.
[3] I. H. Sarker, A. Kayes, S. Badsha, H. Alqahtani, P. Watters, and [25] X. Dong, Z. Yu, W. Cao, Y. Shi, and Q. Ma, “A survey on ensemble
A. Ng, “Cybersecurity data science: an overview from machine learning learning,” Frontiers of Computer Science, pp. 1–18, 2020.
perspective,” Journal of Big Data, vol. 7, no. 1, pp. 1–29, 2020.
[4] T. Shon and J. Moon, “A hybrid machine learning approach to network
anomaly detection,” Information Sciences, vol. 177, no. 18, pp. 3799–
3821, 2007.
[5] R. R. Bies, M. F. Muldoon, B. G. Pollock, S. Manuck, G. Smith,
and M. E. Sale, “A genetic algorithm-based, hybrid machine learning
approach to model selection,” Journal of pharmacokinetics and phar-
macodynamics, vol. 33, no. 2, pp. 195–221, 2006.
[6] S. Mohan, C. Thirumalai, and G. Srivastava, “Effective heart disease
prediction using hybrid machine learning techniques,” IEEE Access,
vol. 7, pp. 81542–81554, 2019.
[7] C. Qi, H.-B. Ly, Q. Chen, T.-T. Le, V. M. Le, and B. T. Pham,
“Flocculation-dewatering prediction of fine mineral tailings using a
hybrid machine learning approach,” Chemosphere, vol. 244, p. 125450,
2020.
[8] J. Qiu, Q. Wu, G. Ding, Y. Xu, and S. Feng, “A survey of machine
learning for big data processing,” EURASIP Journal on Advances in
Signal Processing, vol. 2016, no. 1, p. 67, 2016.
[9] M. A. Fattah, “A hybrid machine learning model for multi-document
summarization,” Applied intelligence, vol. 40, no. 4, pp. 592–600, 2014.
[10] K. Polat and S. Güneş, “A novel hybrid intelligent method based on
c4. 5 decision tree classifier and one-against-all approach for multi-
class classification problems,” Expert Systems with Applications, vol. 36,
no. 2, pp. 1587–1592, 2009.
[11] H. Lu and X. Ma, “Hybrid decision tree-based machine learning
models for short-term water quality prediction,” Chemosphere, vol. 249,
p. 126169, 2020.
[12] Z. Arabasadi, R. Alizadehsani, M. Roshanzamir, H. Moosaei, and A. A.
Yarifard, “Computer aided decision making for heart disease detection
using hybrid neural network-genetic algorithm,” Computer methods and
programs in biomedicine, vol. 141, pp. 19–26, 2017.
[13] B. T. Pham, D. T. Bui, I. Prakash, and M. Dholakia, “Hybrid integration
of multilayer perceptron neural networks and machine learning ensem-
bles for landslide susceptibility assessment at himalayan area (india)
using gis,” Catena, vol. 149, pp. 52–63, 2017.
[14] W. Chen, S. Chen, H. Zhang, and T. Wu, “A hybrid prediction model
for type 2 diabetes using k-means and decision tree,” in 2017 8th IEEE
International Conference on Software Engineering and Service Science
(ICSESS), pp. 386–390, IEEE, 2017.