2020_BigData_Hybrid_Decision_Tree-Neural_Network

Uploaded by

Rero

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

2020_BigData_Hybrid_Decision_Tree-Neural_Network

Uploaded by

Rero

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

A Hybrid Decision Tree-Neural Network (DT-NN)

Model for Large-Scale Classification Problems

Jarrod Carson Kane Hollingsworth Rituparna Datta George Clark Aviv Segev
Department of Computer Science
University of South Alabama
Mobile, AL, USA
{jmc1627@jagmail., kmh1622@jagmail., rdatta@, georgewclark@, segev@}southalabama.edu

Abstract—As the Age of Information has evolved over the A hybrid model which is a combination of two models,
last several decades, the demand for technology which stores, has started gaining significant attention in machine learning
analyzes, and utilizes data has increased substantially. Countless to classify / predict performance as compared to a single
industries such as the medical, the retail, and the aircraft
rely on this technology to guide their decision making. In the learning model [4]–[7]. Neural networks already have been
present paper, we propose a hybrid machine learning algorithm used in numerous fields of science and engineering due to
consisting of Decision Trees and Neural Networks which can its capability of prediction for new data after the training is
effectively and efficiently classify data of varying volume and performed with existing dataset. On the other hand, decision
variety. The structure of the hybrid algorithm consists of a tree has capability for feature classification. In the present
decision tree where each node of the tree is a neural network
trained to classify a specific category of the output using binary work, a hybrid approach is proposed combining decision
classification. The data with which we used to train and test tree and neural networks; a neural network is integrated in
the classification ability of our algorithm is the Federal Aviation each node of the decision tree. The method utilizes multiple
Administration’s (FAA’s) Boeing 737 maintenance dataset which neural models which are each individually trained and meshed
consists of 137,236 unique records each composed of 72 variables. together in a way to where the results actually perform better
We perform this by classifying the discrepancy, or cause, of
the incident into whether or not the incident occurred during than if the hybrid method was not implemented. The children
scheduled maintenance operations and then further classifying of any given node represent subcategories derived from the
specific details relating to the incident. Our results indicate that parent node which may be classified as well. This allows for
our hybrid algorithm is able to effectively classify incidents with the algorithm to classify data of varying degrees of granularity.
high accuracy and precision. Additionally the algorithm is able The hybrid approach that we propose utilizes the binary
to identify the most significant inputs regarding a classification
allowing for higher performance and greater optimization. This classification category of machine learning and transforms
demonstrates the algorithm’s applicability in real-world scenarios each neural network into an individual node in a binary tree.
while also showcasing the benefits of combining decision trees and Each node in the tree is predicting for a specific detail. The
neural networks as opposed to using them individually. further down the tree that the predictions go the more detailed
Index Terms—Decision Trees, Machine Learning, Neural Net- the prediction actually becomes. The height of the binary tree
works, Hybrid Learning, Supervised Learning
that is structured represents the amount of details that the
outcome actually predicts for.
I. I NTRODUCTION
Machine Learning (ML), a subfield of Artificial Intelligence This ensemble learning method that is proposed is tested on
(AI) and Data Science, provides effective algorithms which a public Boeing 737 dataset. It is essential to train and test
accomplish the task of prediction and classification from the neural model that is created on a public dataset rather
structured and unstructured data. However, due to the increase than a dataset that was made in a controlled environment.
in data volume and complexity, in recent years this field faces Training the models on a dataset that was obtained from
many challenges when designing and implementing algorithms Boeing 737 validates the results that are produced and shows
capable of efficiently and effectively processing the data. As that this ensemble learning approach can provide improved
such there has been significant research conducted to create results when used on a dataset that is found in the real world.
more robust algorithms. ML consists of two main stages: The dataset that was used included 72 unique variables and
training/learning and testing/predicting. Training/learning in- 137,236 records from the Boeing 737 that were used as input
volves inserting a dataset with known characteristics into a ML for the network to train on with each network having a single
learning model in order to “train” it. Afterwards the model will binary neuron classifying whether the record belongs to the
be able to make predictions based on similar data given to it in specified category or not after being preprocessed. After being
the testing/predicting stage. There are a few recent overviews preprocessed, the data was then split into training, testing, and
of ML in different domains [1]–[3]. To alleviate the weakness validation. The testing data is used to test the accuracy and
in individual machine learning techniques, hybrid approaches F1 of the neural network. The validation data makes sure that
are used. there is no overfitting.
II. R ELATED W ORK fuzzy inference system (ANFIS), relevance vector machine
(RVM), and elastic net (EN). Moreover, the method also used
The first concepts of machine learning started out with the particle swarm optimization (PSO) and sequential quadratic
study of the brain and how neurons fire off when certain optimization (SQP) to achieve the best combination of weights
external events occur [8]. Ensemble learning is the concept to be used in base learners. An extension of the above study is
of using multiple neural models/training models together to done by Li et al. [19]. The study used directed acyclic graph
create a more accurate algorithm than the original one working (DAG) hybridized with long short term memory (LSTM) and
alone. Our algorithm involves using binary classifying neural a convolutional neural network (CNN) to predict the RUL.
models in ensemble with a binary decision tree in order to The method is tested with a turbofan engine degradation
provide for a more accurate prediction when being compared simulation dataset provided by NASA. We [20] proposed a
to just using a single neural model for the predictions. predictive maintenance strategy for Boeing 737 aircraft using
Fatath proposed a hybrid model integrating the maximum the integrated decision tree-neural network model. Marcello et
entropy model, support vector machine, and naive-Bayes for al. [21] proposed an ensemble learning based big data model
multi document text summarization [9]. To improve the clas- for failure rate of equipment subject to different operating
sification accuracy, Polat and Gunes [10] proposed a hybrid conditions. Another recently developed method [22] also used
machine learning model for multi-class problems. The method ensemble learning, which is also capable to handle imbalance
consists of the C4.5 decision tree classifier and one-against- data. The method uses adaptive boosting (AdaBoost) and
all approach. The efficacy of the hybrid method was shown random forests (RF). The state of the art ensemble learning
on image segmentation, dermatology, and lymphography open can be found in [23]–[25]. Taking the vast literature into
source datasets. A recently developed hybrid method also used consideration, we propose a technique by integrating a neural
decision tree, random forest, and gradient boosting for water network in each node of the decision tree.
quality prediction. The method is known as complete ensemble
empirical mode decomposition with adaptive noise (CEEM- III. D ECISION T REE -N EURAL N ETWORK (DT-NN)
DAN). The variants of the method are proposed; one is based A. Integrating Decision Trees and Neural Networks
on gradient boost (CEEMDAN-XGBoost) while the other We hybridize a neural network with a decision tree in the
is based on random forest (CEEMDAN-RF). Interestingly, present work. The motivation is the optimization of a single
both the methods proved their superiority to predict different decision in a classification. A neural network is integrated
parameters [11]. Arabasadi et al. [12] combined a neural at each node of the decision tree as shown in Fig. 1. This
network with a genetic algorithm to increase the performance integration of both the decision tree and the neural network
of the neural networks and the hybrid method was applied approach are superior as compared to both individual methods.
on a heart disease dataset. A machine learning ensemble The performance of the neural networks are well suited for
technique is proposed by Pham et al. [13] to assess landslide classification into categories where the boundaries of classifi-
susceptibility. In the ensemble method, multi layer perceptron cation are less distinct. However, the performance of the neural
(MLP) neural network is integrated with AdaBoost, Bagging, network decreases with the increase in number of categories.
Dagging, MultiBoost, Rotation Forest, and Random SubSpace. The decision tree works with a large number of categories
Another hybrid prediction model is developed by Chen et al. which are distinctly classified. The decision tree is constructed
[14]. In that method, K-means clustering is integrated with the based on a set of binary possible outcomes such as 0, 1. To
J48 decision tree for the diagnosis of Type 2 diabetes. Lin et achieve the same, the best possible result attribute with the
al. [15] dealt with the data from manufacturing industries for highest information gain is selected. To define information
condition based maintenance using MapReduce and multiple gain, we define a measure commonly used in information
classifier types decision trees with dynamic weight adjustment. theory, called entropy, which characterizes the (im)purity of
A just-in-time defect prediction method was proposed by Yang an arbitrary collection of examples.
et al. [16] using ensemble learning. The prediction method is Entropy H(S) is defined as a measure of the amount of
also capable of handling redundancy and data imbalance in uncertainty in any dataset S
addition to ensure robustness. Another study based on just-
in-time prediction also used ensemble learning. The authors
proposed a two-layer ensemble learning approach (TLEL) X
based on decision trees [17]. The outer layer uses different H(S) = −p(c)log2 (p(c))
c∈C
Random Forest models for training whereas the inner layer is
an integration of decision tree and bagging to build a Ran- Where
dom Forest model. Another recent study [18] used ensemble S - The dataset for which entropy is being calculated in the
learning for better predictive performance of remaining useful current iteration.
life (RUL) of aircraft engines and used several methods like C - The set of the classes in S, C = 0, 1.
multiple base learners, including random forests (RFs), classi- p(c) - The proportion of the number of elements in class c
fication and regression tree (CART), recurrent neural networks to the number of elements in set S.
(RNN), autoregressive (AR) model, adaptive network-based If H(S) = 0 then the set S is perfectly classified.
Fig. 1: Hybrid Decision Tree-Neural Network (DT-NN) Model

Information gain IG(A) is defined as the measure of the the network can predict on. Each level of the tree is trained
difference in entropy from before to after the set S is split on different data that is pulled from the original dataset. If the
on a result attribute A. This quantity measures the extent first node in the tree was predicting for a crack in the airplane,
of uncertainty S was reduced after splitting set S on result it would assign the original dataset with a crack problem a 1
attribute A. and all of the other entries without a crack problem 0. If we
continued the tree to predict for something with a crack and
X a fuselage problem, we would separate the original dataset
IG(A, S) = H(S) − p(t)H(t) into two datasets, one with crack data and one without crack
t∈T data, and assign each entry with a fuselage problem a 1 or a
Where, 0 and train a new model for each of the two subcategories.
H(S) - Entropy of set S. Since we split the dataset multiple times, this method would
T - The subsets created from splitting set S by result work best in an environment where there is a lot of data. This
attribute A such that method of predicting for crack then predicting for fuselage
[ in the binary tree format actually performs better than just
S= t.
predicting for entries that contain crack and fuselage or not
t∈T
from the beginning.
p(t) - The proportion of the number of elements in t to the
number of elements in S. C. Data Preprocessing
H(t) - Entropy of subset t. Preprocessing the data is an essential part in order for the
The information gain can be estimated for each remaining neural network to accurately predict. In order for the neural
attribute. The attribute with the largest information gain can be network to process the data, it must be converted into a
used to split the set S on each iteration. Thereafter, with the numeric form. To do this, the data is processed in chunks
largest information gain, a neural network is built. For each 5000 records at a time. Each record in each chunk is looked
binary classification, a neural network which is constructed at and every unique value is added to a dictionary and kept
to classify only if the problem occurs. Each neural network track of. This process of looking in the dictionary to see if that
at each node of the decision tree consists of all the result unique value has been added or not is done for each record.
attributes which could lead to either 0 or 1. As a new value is added to the dictionary it is given a unique
identifier which starts at 100 and increases by 100 for every
B. Outline of the Method unique value recorded that is a part of that record. If any
The structure of the hybrid binary tree classifier gives us the entry in the record is found to be null, it is then assigned
flexibility to find more detailed or less detailed classification a -1 to separate it from the data that is not null. If a value
problems. The structure is formatted in a way so that the height is come across that is actually already in a numeric form,
of the tree represents the complexity of the prediction of the is ignored. After the dictionary is created, the dictionary is
problem. The larger the tree, the more detailed an issue that used with a mapping function alongside with the dataframe
to map each of the values found in the dictionary with its TABLE I: Accident and Incident Data
corresponding numeric value. This process of remapping all
Operator Control Number Difficulty Date
of the corresponding variables is done with each 5000 record Submission Date Operator Designator
chunk in the dataset at a time. Doing this with only 5000 Submitter Designator Submitter Type Code
Receiving Region Code Receiving District Office
records at a time ensures that there is not a memory error SDR Type JASC Code
with reading too many values into memory at a time. This Nature Of Condition A Nature Of Condition B
Nature Of Condition C Precautionary Procedure A
technique of assigning a unique numeric value to all of the Precautionary Procedure B Precautionary Procedure C
non-numeric values in the data is an essential step in order Precautionary Procedure D Stage Of Operation Code
How Discovered Code Registry N Number
for the data to be able to be fed into the neural network for Aircraft Make Aircraft Model
training and producing a working model. Aircraft Serial Number Aircraft Total Time
Aircraft Total Cycles Engine Make
Engine Model Engine Serial Number
D. Pseudocode for Preprocessing Engine Total Time Engine Total Cycles
Propeller Total Time Propeller Total Cycles
The pseudo code for Neural Network used Part Make Part Name
Part Number Part Serial Number
in hybrid approach is given in the following Part Condition Part Location
algorithm: Part Total Time Part Total Cycles
Part Time Since Part Since Code
1: CSV Data: Data read in from a csv file. Component Make Component Model
2: trainData: Subset of CSVData used for training neural Component Name Component Part Number
Component Serial Number Component Location
network. Component Total Time Component Total Cycles
3: testData: Subset of CSVData used for testing neural Component Time Since Component Since Code
Fuselage Station From Fuselage Station To
network. Stringer From Stringer From Side
4: validData: Subset of CSVData used for validating neural Stringer To Stringer To Side
Wing Station From Wing Station From Side
network. Wing Station To Wing Station To Side
5: nn: H2O DeepLearningEstimator neural network model. Butt Line From Butt Line From Side
Butt Line To Butt Line To Side
6: nnMetrics: H2O dataframe containing performance met- Water Line From Water Line To
rics from testing the neural network. Crack Length Number Of Cracks
Corrosion Level Structural Other
7: resultsCSV : CSV file for containing neural network Discrepancy
performance metrics.
8: ImportH2O, H2ODeepLearningEstimator
9: CSV Data = open(”csvfile.csv”,”read”)
10: H2O.init()
sectioning off randomly chosen data into 75% training, 15%
11: H2O.read(CSV Data)
testing, and 10% validation. Using these percentages for the
12: nn = DeepLearningEstimator(hiddenLayers, activationF
neural network will allow the network to be trained and have
unction) the final output model be the best in terms of accuracy and
13: nn.train(invars, outvars, trainData, validData)
F1 score.
14: nnMetrics = nn.test(testData).performanceMetrics The first 72 variables are given as input into the neural
15: resultsCSV = open(”results.csv”, ”write”) network with the last variable, discrepancy, being the output
16: resultsCSV.write(nnMetrics) =0 for the network. To feed the data that we are working with,
we must first format it using a preprocessing algorithm. The
IV. E XPERIMENTS goal that our preprocessing algorithm accomplishes is that it
turns string data into corresponding numerical keys which are
A. Dataset and Preprocessing all recorded in a dictionary. Performing this algorithm starts
The dataset that we used for the experiments, Boeing 737 off with reading the data from a CSV file in chunks of 5000
data, which came from the Federal Aviation Administration, records at a time using the Pandas Dataframe located in the
contains 137,236 records with each record having 73 variables. Pandas library. Each of these records that are read in will
These records included data from aircraft that suffered issues then have corresponding values in a dictionary. Each unique
dealing with mechanical issues with the aircraft. We chose entry per column will be entered into a dictionary with a
this data for two main reasons. The first is because it is a corresponding numerical value. This unique identifying value
real world dataset that was hand documented for what the for that column will then have a corresponding value of the
issue is. Showing that this algorithm can work on a hand string that was originally there. After another unique value
recorded dataset shows the robustness of the given algorithm. is found in that column, this value increases by 100 and is
The second reason is because of the sheer size of the data assigned that. If a value in the Pandas dataframe is a NULL
that we are allowing our algorithm to be trained on. Having a value the number -1 is assigned to it. If a numerical value is
dataset which contains a large number of real world records encountered, that value is ignored due to the fact that it could
allows for us to make sure that the algorithm is able to handle be an important value in determining the result. After all of
complex inputs and perform with high accuracy. The way that this data is translated to its unique numerical identifier, it is
we sectioned off this data to use for the neural network was by then used to generate a CSV file with all of the numerical
values. The final step in the preprocessing stage of the data correlation was used to identify any linear change relationships
is the final creation of the text file which contains the key to between the input variables. The Spearman correlation was
what numerical value maps to what for each column. If for employed to identify monotonic relationships between the in-
example, the data that was encountered was 115.31, and the put variables. For both correlation algorithms, heatmaps were
type for that cell was a real number, then that value will be produced showing the level of correlation of the variables.
ignored. If a piece of data that was a string, the value would Additionally the variables with the highest correlation with
be entered into the text file just as it was documented like in the Discrepancy output were identified using both correlation
the CSV file. algorithms.

V. C ORRELATIONS TABLE II: Pearson Correlation

The Boeing 737 incident data contains 73 variables relating Variable Names Discrepancy
to mechanical issues. These variables are listed in Table I. OperatorControlNumber -0.162920734
These variables were categorized into 72 input variables and OperatorDesignator -0.191397581
SubmitterDesignator -0.139558448
the remaining variable discrepancy, which represents the actual SubmitterTypeCode -0.114425491
incident in the reports, was categorized as the lone output ReceivingRegionCode -0.030500605
variable. The structure of our complete Neural Network (NN) JASCCode 0.098274881
consists of 72 inputs in an input layer that is connected to StageOfOperationCode -0.238238549
HowDiscoveredCode 0.032754564
layers of hidden neurons producing one output for classifica- RegistryNNumber -0.351326718
tion. During the training process, the NN performs two tasks, AircraftModel -0.248164469
it determines the weight associated with each input and opti- AircraftSerialNumber -0.061595756
AircraftTotalTime 0.225559051
mizes the classification decision. These tasks were examined
AircraftTotalCycles 0.425399651
with the goal of identifying and testing three correlations; 1) EngineMake -0.217505143
number of significant inputs to classification accuracy, 2) total PropellerTotalCycles -0.035912637
number of inputs to classification accuracy, and 3) variables PartSerialNumber -0.015229973
PartTotalTime -0.084394422
to all other variables. PartTimeSince -0.062439797
PartSinceCode 0.28765802
A. Number of Significant Inputs Correlation ComponentSinceCode -0.017284174
In order to determine the correlation of significant inputs FuselageStationFrom 0.029334189
FuselageStationTo -0.028950023
to classification accuracy, the NN’s determination of input StringerFrom 0.009530162
weights was analyzed. During the NN training phase, input StringerFromSide 0.19310386
variables that do not contribute to the optimization of the clas- CorrosionLevel -0.148433297
sification are increasingly ignored and therefore the weights
between the input layer and the first hidden layer of the NN are
reduced. It is expected that in the final NN that input variables TABLE III: Spearman Correlation
or features with low mean weight values have less affect on Variable Names Discrepancy
the optimized classification. For our analysis, we organize the OperatorControlNumber -0.162920813
input variables in descending weight order and test the NN OperatorDesignator 0.106873514
SubmitterDesignator 0.158101481
as input variables are added and removed one at a time. The SubmitterTypeCode -0.129345355
values for Area Under the Curve (AUC) and Accuracy for each ReceivingRegionCode 0.048436281
of the six types of activation functions were then compared. JASCCode 0.118254811
StageOfOperationCode -0.271904763
B. Number of Total Inputs Correlation HowDiscoveredCode 0.048619706
RegistryNNumber -0.340778447
To determine the correlation of the total number of the AircraftModel -0.199877064
input variables to classification accuracy, we analyzed the AircraftSerialNumber 0.010545024
AircraftTotalTime 0.34909541
NN’s accuracy as the amount of inputs are increased. The AircraftTotalCycles 0.43531384
input variables were again organized in descending order of EngineMake -0.236875842
the mean value of the weight connecting the input layer and PropellerTotalCycles -0.035912637
the first hidden layer. Again, input variables were added and PartSerialNumber -0.071002955
PartTotalTime -0.239128945
removed one at a time. A comparison was made for the values PartTimeSince -0.117941956
of Area Under the Curve (AUC), Accuracy, Precision, Recall, PartSinceCode 0.341907897
and F1 versus the total number of inputs. ComponentSinceCode -0.021837315
FuselageStationFrom 0.111152493
C. Correlation of Variables to all Other Variables FuselageStationTo -0.0815809
StringerFrom 0.188622223
The correlation of each input variable with every other input StringerFromSide 0.206177495
variable of the NN was determined by using the Pearson CorrosionLevel -0.151251426
and Spearman statistical correlation algorithms. The Pearson
(a) AUC/Accuracy of All Inputs vs. Significant Inputs (b) Average Mean for Leading First Layer Input Weights
Fig. 2: Significant Inputs

VI. R ESULTS not improve after the 11 inputs which were identified as the
important variables.
A. Number of Significant Inputs Correlation The AUC difference can be viewed as a less accurate value
Figure 2a shows the comparison of classification results into for measuring performance. In this case, it can be attributed to
Maintenance and Non-Maintenance categories by the number the low number of values measured to create the curve. This
of inputs. In particular, the figure shows the Accuracy and could explain the difference when measuring the area with
AUC results when using all 72 input variables as opposed AUC versus comparing a single Accuracy result.
to using the top eleven mean weight input variables. The Figure 3c shows that there is a small drop in recall from
results are further broken down by the six activation functions 100% to 85% as more values are added. Figure 3d shows
tested in the NN. The figure shows that the accuracy is nearly the Precision as the number of variables increases with the
identical for the activation functions regardless of whether mean weight decreasing. The precision value increases and
all 72 input variables are used or only the top eleven mean then stabilizes once the top 11 weighted variables are used.
input variables are used. The accuracy difference ranged from The results show that precision is determined by the previously
0.12% less accurate with all 72 input variables with the tanh identified significant variables while Recall is only slightly
activation function to 4.77% more accurate when only the top affected by an increase in input variables.
eleven mean weight input variables were used with the rectifier When viewing the F1 value appearing in Figure 4a, these
with dropout activation function. Figure 2a also shows that the findings are more evident. F1 increases as strongly weighted
AUC comparisons were more varied. In this case, the range variables are introduced and peaks at 11 variables. The F1
in AUC was -1.76% using the tanh activation function with becomes stable at around 90% using just 11 variables. Addi-
all variables to 4.44% using the tanh with dropout for the top tionally, Figure 4b shows that most outcomes are clustered in
eleven mean weight input variables. the top right except for the high-recall initial values.
In Figure 2b we show the average mean for the leading The results show the correct identification of the significant
inputs weight value between the input layer and the first layer inputs by the method of classifying mean weights in descend-
of the NN. The top 11 variables in the circumference box have ing order. The additional input variables, which do not seem
a mean weight of 0.15 or greater and are the main variables to improve the results, can be attributed to constant values,
relevant for high accuracy results from the NN. The results variables which are dependent on other inputs, or values which
show that issues such as Part Make, Receiving Region Code, are inconsistent with the expected results.
and Part Total Time can clearly be identified as the most C. Correlation of Variables to All Other Variables
relevant classifiers. The Aircraft Model is already identified
as a less unique classifier for the type of issue involved. The heatmaps that are represented in Tables II and III
demonstrate the correlation of each variable to every other
variable using the Pearson/Spearman statistical correlation
B. Number of Total Inputs Correlation
algorithm to find out how closely associated each variable
Figure 3 details our findings for the correlation of the is with another. In these tables, a 25 input sampling of the
number of inputs to AUC, Accuracy, Recall, and Precision. 72 input variables is shown. The closer the number is to +1,
Figures 3a and 3b show that both the NN’s AUC and accuracy the higher the positive linear correlation is with the variable
increase initially as the number of inputs into the NN increase. being compared and the greener the area is in the heatmap.
However, both AUC and accuracy are less affected as the The closer the number is to -1, the higher the the negative
number of inputs continue to increase. AUC increases only linear correlation is with the variable being compared and the
until 16 total inputs, are used and accuracy of the NN does redder the area is on the heatmap. When the number is zero,
(a) AUC vs. Inputs (b) Accuracy vs. Inputs

(c) Recall vs. Inputs (d) Precision vs. Inputs

Fig. 3: AUC, Accuracy, Recall, and Precision vs. Number of Inputs

(a) F1 vs. Inputs (b) Precision vs. Recall

Fig. 4: F1, Precision vs. Recall vs. Number of Inputs
Fig. 5: Pearson correlation data bar chart

Fig. 6: Spearman correlation data bar chart

it means that there is no linear correlation between the two significant variable with both Pearson and the Spearman test.
variables and the area is yellow. So, closer the number is to These results could be implemented/applied in the future to
1, the more association the variables have with each other. determine which significant inputs could be used for training
the neural networks.
Using both the Spearman and Pearson correlation tests, the
variables with the highest correlation to Discrepancy were also VII. C ONCLUSION
determined. The Pearson test results shown in Figure 5 show
a linear correlation coefficient of 0.15 or greater between the In this paper we propose a hybrid learning strategy strategy
input variables AircraftTotalCycles, PartSinceCode, Aircraft- by integrating a neural network with decision tree. The hybrid
TotalTime, StringerFromSide. The Spearman test shown in algorithm is tested with the Federal Aviation Administration
Figure 6 shows a correlation coefficient of 0.15 or greater (FAA) data for Boeing 737. Several simulated experiments
for the input variables AircraftTotalCycles, AircraftTotalTime, have been performed to test the efficacy of the proposed
PartSinceCode, StringerFromSide, StringerFrom, and Submit- hybrid method. The method is tested with various network
terDesignator. Analyzing the results of the combined tests architectures, activation functions, and different hidden layers.
reveals that AircraftTotalCycles, PartSinceCode, and Aircraft- The hybrid method is also verified by selecting the contributing
TotalTime are the three variables with the highest degree of input features, and the similar prediction results confirm that
correlation. AircraftTotalCycles was shown to be the most it successfully identified the redundant features.
To optimize our NN, three correlations were examined [15] C.-C. Lin, L. Shu, D.-J. Deng, T.-L. Yeh, Y.-H. Chen, and H.-L. Hsieh,
with regard to classifying Discrepancy; 1) the number of “A mapreduce-based ensemble learning method with multiple classifier
types and diversity for condition-based maintenance with concept drifts,”
significant inputs to classification accuracy, 2) the total number IEEE Cloud Computing, vol. 4, no. 6, pp. 38–48, 2017.
of inputs to classification accuracy, and 3) the correlation of [16] I. H. Laradji, M. Alshayeb, and L. Ghouti, “Software defect prediction
variables to all other variables. The first correlation showed using ensemble learning on selected features,” Information and Software
Technology, vol. 58, pp. 388–402, 2015.
that the number of inputs could be reduced from 72 to eleven [17] X. Yang, D. Lo, X. Xia, and J. Sun, “Tlel: A two-layer ensemble learning
significant inputs without a reduction in accuracy of the NN. approach for just-in-time defect prediction,” Information and Software
The second correlation further validated this by showing that Technology, vol. 87, pp. 206–220, 2017.
[18] Z. Li, K. Goebel, and D. Wu, “Degradation modeling and remaining
AUC, Accuracy, Recall and Precision stabilize with the eleven useful life prediction of aircraft engines using ensemble learning,”
significant inputs and gradually deteriorate with the addition of Journal of Engineering for Gas Turbines and Power, vol. 141, no. 4,
new inputs. The third correlation further showed via heatmaps 2019.
[19] J. Li, X. Li, and D. He, “A directed acyclic graph network combined
that only a small number of inputs have a high correlation with cnn and lstm for remaining useful life prediction,” IEEE Access,
with Discrepancy. Using these three correlations, we show vol. 7, pp. 75464–75475, 2019.
that the significant input features can be identified and that [20] J. Carson, K. Hollingsworth, R. Datta, and A. Segev, “Failing &! falling
(f&! f): Learning to classify accidents and incidents in aircraft data,” in
the total number of features can be reduced without affecting 2019 IEEE International Conference on Big Data (Big Data), pp. 4357–
the accuracy of the NN. The hybrid learning method can be 4365, IEEE, 2019.
tested in more case studies in the future. Moreover, the method [21] B. Marcello, C. Davide, F. Marco, G. Roberto, M. Leonardo, and
P. Luca, “An ensemble-learning model for failure rate prediction,”
can also be tested for transfer learning in different domains. Procedia Manufacturing, vol. 42, pp. 41–48, 2020.
[22] P. Zuvela, M. Lovric, A. Yousefian-Jazi, and J. J. Liu, “Ensemble
R EFERENCES learning approaches to data imbalance and competing objectives in
[1] A. H. Vo, T. R. Van Vleet, R. R. Gupta, M. J. Liguori, and M. S. design of an industrial machine vision system,” Industrial & Engineering
Rao, “An overview of machine learning and big data for drug toxicity Chemistry Research, vol. 59, no. 10, pp. 4636–4645, 2020.
evaluation,” Chemical Research in Toxicology, vol. 33, no. 1, pp. 20–37, [23] R. Polikar, “Ensemble learning,” in Ensemble machine learning, pp. 1–
2019. 34, Springer, 2012.
[2] F. Samie, L. Bauer, and J. Henkel, “From cloud down to things: An [24] O. Sagi and L. Rokach, “Ensemble learning: A survey,” Wiley Interdisci-
overview of machine learning in internet of things,” IEEE Internet of plinary Reviews: Data Mining and Knowledge Discovery, vol. 8, no. 4,
Things Journal, vol. 6, no. 3, pp. 4921–4934, 2019. p. e1249, 2018.
[3] I. H. Sarker, A. Kayes, S. Badsha, H. Alqahtani, P. Watters, and [25] X. Dong, Z. Yu, W. Cao, Y. Shi, and Q. Ma, “A survey on ensemble
A. Ng, “Cybersecurity data science: an overview from machine learning learning,” Frontiers of Computer Science, pp. 1–18, 2020.
perspective,” Journal of Big Data, vol. 7, no. 1, pp. 1–29, 2020.
[4] T. Shon and J. Moon, “A hybrid machine learning approach to network
anomaly detection,” Information Sciences, vol. 177, no. 18, pp. 3799–
3821, 2007.
[5] R. R. Bies, M. F. Muldoon, B. G. Pollock, S. Manuck, G. Smith,
and M. E. Sale, “A genetic algorithm-based, hybrid machine learning
approach to model selection,” Journal of pharmacokinetics and phar-
macodynamics, vol. 33, no. 2, pp. 195–221, 2006.
[6] S. Mohan, C. Thirumalai, and G. Srivastava, “Effective heart disease
prediction using hybrid machine learning techniques,” IEEE Access,
vol. 7, pp. 81542–81554, 2019.
[7] C. Qi, H.-B. Ly, Q. Chen, T.-T. Le, V. M. Le, and B. T. Pham,
“Flocculation-dewatering prediction of fine mineral tailings using a
hybrid machine learning approach,” Chemosphere, vol. 244, p. 125450,
2020.
[8] J. Qiu, Q. Wu, G. Ding, Y. Xu, and S. Feng, “A survey of machine
learning for big data processing,” EURASIP Journal on Advances in
Signal Processing, vol. 2016, no. 1, p. 67, 2016.
[9] M. A. Fattah, “A hybrid machine learning model for multi-document
summarization,” Applied intelligence, vol. 40, no. 4, pp. 592–600, 2014.
[10] K. Polat and S. Güneş, “A novel hybrid intelligent method based on
c4. 5 decision tree classifier and one-against-all approach for multi-
class classification problems,” Expert Systems with Applications, vol. 36,
no. 2, pp. 1587–1592, 2009.
[11] H. Lu and X. Ma, “Hybrid decision tree-based machine learning
models for short-term water quality prediction,” Chemosphere, vol. 249,
p. 126169, 2020.
[12] Z. Arabasadi, R. Alizadehsani, M. Roshanzamir, H. Moosaei, and A. A.
Yarifard, “Computer aided decision making for heart disease detection
using hybrid neural network-genetic algorithm,” Computer methods and
programs in biomedicine, vol. 141, pp. 19–26, 2017.
[13] B. T. Pham, D. T. Bui, I. Prakash, and M. Dholakia, “Hybrid integration
of multilayer perceptron neural networks and machine learning ensem-
bles for landslide susceptibility assessment at himalayan area (india)
using gis,” Catena, vol. 149, pp. 52–63, 2017.
[14] W. Chen, S. Chen, H. Zhang, and T. Wu, “A hybrid prediction model
for type 2 diabetes using k-means and decision tree,” in 2017 8th IEEE
International Conference on Software Engineering and Service Science
(ICSESS), pp. 386–390, IEEE, 2017.

SC Exp 8 - 102
No ratings yet
SC Exp 8 - 102
6 pages
1133
No ratings yet
1133
12 pages
sensors-21-08003
No ratings yet
sensors-21-08003
16 pages
Dss On Life Insurance
No ratings yet
Dss On Life Insurance
20 pages
A Modified Adam Algorithm For Deep Neural Network Optimization
No ratings yet
A Modified Adam Algorithm For Deep Neural Network Optimization
18 pages
Evolutionary Neural Networks For Product Design Tasks
No ratings yet
Evolutionary Neural Networks For Product Design Tasks
11 pages
Generalized Flow Performance Analysis of Intrusion Detection Using Azure Machine Learning Classification
No ratings yet
Generalized Flow Performance Analysis of Intrusion Detection Using Azure Machine Learning Classification
6 pages
Enhanced Network Anomaly Detection Based On Deep Neural Networks
No ratings yet
Enhanced Network Anomaly Detection Based On Deep Neural Networks
16 pages
Room Classification Using Machine Learning
No ratings yet
Room Classification Using Machine Learning
16 pages
Deep Neural Network Regularization For Feature Selection in Learning-to-Rank
No ratings yet
Deep Neural Network Regularization For Feature Selection in Learning-to-Rank
19 pages
University of Computer Studies, Mandalay (UCSM)
No ratings yet
University of Computer Studies, Mandalay (UCSM)
23 pages
Survey of FNN
No ratings yet
Survey of FNN
25 pages
10.3390@aerospace7090132
No ratings yet
10.3390@aerospace7090132
32 pages
Constructive Neural Networks: A Review: Sudhir Kumar Sharma
No ratings yet
Constructive Neural Networks: A Review: Sudhir Kumar Sharma
9 pages
Parameters_optimization_of_deep_learning_models_using_Particle_swarm_optimization
No ratings yet
Parameters_optimization_of_deep_learning_models_using_Particle_swarm_optimization
6 pages
High Performance of Optimizers in Deep Learning For Cloth Patterns Detection
No ratings yet
High Performance of Optimizers in Deep Learning For Cloth Patterns Detection
12 pages
Swarm Optimized Modular Neural Network Based Diagnostic System For Breast Cancer Diagnosis
No ratings yet
Swarm Optimized Modular Neural Network Based Diagnostic System For Breast Cancer Diagnosis
10 pages
Deep Learning As A Frontier of Machine Learning A
No ratings yet
Deep Learning As A Frontier of Machine Learning A
10 pages
ResearchPaper2_1_David_Laredo
No ratings yet
ResearchPaper2_1_David_Laredo
31 pages
Application of Neural Networks For Software Quality Prediction Using Object-Oriented Metrics
No ratings yet
Application of Neural Networks For Software Quality Prediction Using Object-Oriented Metrics
10 pages
Predictive Maintenance of Electromechanical Systems Based On Enhanced Generative Adversarial Neural Network With Convolutional Neural Network
No ratings yet
Predictive Maintenance of Electromechanical Systems Based On Enhanced Generative Adversarial Neural Network With Convolutional Neural Network
9 pages
“Transfer Learning” for Bridging the Gap Between Data Sciences and the Deep Learning
No ratings yet
“Transfer Learning” for Bridging the Gap Between Data Sciences and the Deep Learning
9 pages
Data Science for Civil Engineering Unit 5 Notes
No ratings yet
Data Science for Civil Engineering Unit 5 Notes
17 pages
(1)_IJAIML23022024P0A3_(p.1-8)
No ratings yet
(1)_IJAIML23022024P0A3_(p.1-8)
8 pages
Ijet V3i5p39
No ratings yet
Ijet V3i5p39
15 pages
Master Thesis Neural Network
100% (1)
Master Thesis Neural Network
4 pages
2_notes (2)
No ratings yet
2_notes (2)
2 pages
Neliswa Mkhwanazi Methodology Applied Research 402 Final Methodology
No ratings yet
Neliswa Mkhwanazi Methodology Applied Research 402 Final Methodology
5 pages
Applying Data Mining in Prediction and Classification of Urban Traffic
No ratings yet
Applying Data Mining in Prediction and Classification of Urban Traffic
5 pages
grando2015
No ratings yet
grando2015
8 pages
ssrn-3358252
No ratings yet
ssrn-3358252
7 pages
Domain Generalization On Constrained
No ratings yet
Domain Generalization On Constrained
12 pages
Mehdipourghazi 2017
No ratings yet
Mehdipourghazi 2017
8 pages
Akay 2021
No ratings yet
Akay 2021
66 pages
Electronics 11 02707 v2
No ratings yet
Electronics 11 02707 v2
13 pages
Yaseen 2018
No ratings yet
Yaseen 2018
12 pages
Robust Aggregation For Federated Learning
No ratings yet
Robust Aggregation For Federated Learning
13 pages
Quality Prediction in Object Oriented System by Using ANN: A Brief Survey
No ratings yet
Quality Prediction in Object Oriented System by Using ANN: A Brief Survey
6 pages
Research Paper1
No ratings yet
Research Paper1
12 pages
Data Augmentation Strategies For Eeg Based Motor Imagery Decoding
No ratings yet
Data Augmentation Strategies For Eeg Based Motor Imagery Decoding
14 pages
Deep Learning in Data Science Theoretical Foundati
No ratings yet
Deep Learning in Data Science Theoretical Foundati
6 pages
Intelligent Decision Support Systems
No ratings yet
Intelligent Decision Support Systems
26 pages
Thepaper 2
No ratings yet
Thepaper 2
11 pages
Secrets of Deep Learning 1716536527
No ratings yet
Secrets of Deep Learning 1716536527
12 pages
ip2024_12_002
No ratings yet
ip2024_12_002
12 pages
Evaluating DNN and Classical ML Algorithms For Nids
No ratings yet
Evaluating DNN and Classical ML Algorithms For Nids
24 pages
Beery_Synthetic_Examples_Improve_Generalization_for_Rare_Classes_WACV_2020_paper
No ratings yet
Beery_Synthetic_Examples_Improve_Generalization_for_Rare_Classes_WACV_2020_paper
11 pages
CTY-I2A-20230403
No ratings yet
CTY-I2A-20230403
6 pages
Project Report On A Learning Framework For Morphological Operators Using CounterHarmonic Mean
No ratings yet
Project Report On A Learning Framework For Morphological Operators Using CounterHarmonic Mean
114 pages
Summary of articles
No ratings yet
Summary of articles
9 pages
Classification of Symbolic Objects Using Adaptive Auto-Configuring RBF Neural Networks
No ratings yet
Classification of Symbolic Objects Using Adaptive Auto-Configuring RBF Neural Networks
5 pages
07961149transfer Learning
No ratings yet
07961149transfer Learning
11 pages
Performance Enhancement Using Combinatorial Approach of Classification and Clustering in Machine Learning
No ratings yet
Performance Enhancement Using Combinatorial Approach of Classification and Clustering in Machine Learning
8 pages
Lyu 2019
No ratings yet
Lyu 2019
8 pages
Can Neural Networks Be Easily Interpreted in Software Cost Estimation?
No ratings yet
Can Neural Networks Be Easily Interpreted in Software Cost Estimation?
6 pages
ARTICLE 3
No ratings yet
ARTICLE 3
5 pages
Multi-Layer Perceptrons
No ratings yet
Multi-Layer Perceptrons
8 pages
Final Version on IEEE
No ratings yet
Final Version on IEEE
16 pages
Detection of Forest Fire Using Wireless Sensor Network
No ratings yet
Detection of Forest Fire Using Wireless Sensor Network
5 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
PCB_Defect_Detection_Using_Deep_Learning_Methods
No ratings yet
PCB_Defect_Detection_Using_Deep_Learning_Methods
7 pages
Data Augmentation On Plant Leaf Disease Image Dataset Using Image Manipulation and Deep Learning Techniques
No ratings yet
Data Augmentation On Plant Leaf Disease Image Dataset Using Image Manipulation and Deep Learning Techniques
6 pages
software defect prediction_final_doc_Phase 1
No ratings yet
software defect prediction_final_doc_Phase 1
36 pages
Nptel Week 5
No ratings yet
Nptel Week 5
4 pages
10 Best Data Annotation & Labeling Service Providers in 2024
No ratings yet
10 Best Data Annotation & Labeling Service Providers in 2024
18 pages
Artificial Intelligence Approach For Modeling House Price Prediction
No ratings yet
Artificial Intelligence Approach For Modeling House Price Prediction
5 pages
Analysis and Prediction of Soccer Games - An Application To The Kaggle European Soccer Database
No ratings yet
Analysis and Prediction of Soccer Games - An Application To The Kaggle European Soccer Database
6 pages
6.A_3D_Probabilistic_Deep_Learning_System_for_Detection_and_Diagnosis_of_Lung_Cancer_Using_Low-Dose_CT_Scans
No ratings yet
6.A_3D_Probabilistic_Deep_Learning_System_for_Detection_and_Diagnosis_of_Lung_Cancer_Using_Low-Dose_CT_Scans
11 pages
Blue Book Format
No ratings yet
Blue Book Format
49 pages
09 - AI-900 1-35 - M - Answered
No ratings yet
09 - AI-900 1-35 - M - Answered
9 pages
Arabic Sentiment Analysis of YouTube Comments NLP-Based Machine Learning
No ratings yet
Arabic Sentiment Analysis of YouTube Comments NLP-Based Machine Learning
16 pages
Football - Match - Result - Prediction - Using - Neural - Networks - and - Deep - Learning Yeah
No ratings yet
Football - Match - Result - Prediction - Using - Neural - Networks - and - Deep - Learning Yeah
4 pages
MLDA Syllabus
No ratings yet
MLDA Syllabus
20 pages
An Analysis of Machine Learning Algorithms and Deep Neural Networks For Email Spam Classification U
No ratings yet
An Analysis of Machine Learning Algorithms and Deep Neural Networks For Email Spam Classification U
6 pages
IET Renewable Power Gen - 2024 - Dokur - An Integrated Methodology For Significant Wave Height Forecasting Based On
No ratings yet
IET Renewable Power Gen - 2024 - Dokur - An Integrated Methodology For Significant Wave Height Forecasting Based On
13 pages
Heart Disease Analysis
No ratings yet
Heart Disease Analysis
45 pages
CYBER ATTACKS DETECTION USING GoogleNet MODEL FOR ENVIRONMENTAL AWARE SMART CITY APPLICATIONS
No ratings yet
CYBER ATTACKS DETECTION USING GoogleNet MODEL FOR ENVIRONMENTAL AWARE SMART CITY APPLICATIONS
10 pages
Detection of Traffic Congestion Based On Twitter Using Convolutional Neural Network Model
No ratings yet
Detection of Traffic Congestion Based On Twitter Using Convolutional Neural Network Model
12 pages
Algo-Trading Research Paper
No ratings yet
Algo-Trading Research Paper
20 pages
Thangaraj Et Al. - 2023
No ratings yet
Thangaraj Et Al. - 2023
14 pages
BCA 5005 Minor Project - Synopsis 1
No ratings yet
BCA 5005 Minor Project - Synopsis 1
9 pages
Silva 2019
No ratings yet
Silva 2019
8 pages
ResearchProposalFinalVer1 4 33
No ratings yet
ResearchProposalFinalVer1 4 33
30 pages
Fpls 13 1023515
No ratings yet
Fpls 13 1023515
15 pages
Animish CV File
No ratings yet
Animish CV File
85 pages
21091F0026 B.neelimaRani
No ratings yet
21091F0026 B.neelimaRani
70 pages
Beginner Machine Learning Projects
No ratings yet
Beginner Machine Learning Projects
4 pages
ESN
No ratings yet
ESN
9 pages
Machine Learning in Antenna Design
No ratings yet
Machine Learning in Antenna Design
9 pages
2233 A Transformer Based Framework
No ratings yet
2233 A Transformer Based Framework
19 pages

2020_BigData_Hybrid_Decision_Tree-Neural_Network

Uploaded by

2020_BigData_Hybrid_Decision_Tree-Neural_Network

Uploaded by

A Hybrid Decision Tree-Neural Network (DT-NN)

Model for Large-Scale Classification Problems

V. C ORRELATIONS TABLE II: Pearson Correlation

(c) Recall vs. Inputs (d) Precision vs. Inputs

(a) F1 vs. Inputs (b) Precision vs. Recall

Fig. 6: Spearman correlation data bar chart

You might also like