A Hybrid Machine-Learning Ensemble For Anomaly Detection in Real-Time Industry 4.0 Systems
A Hybrid Machine-Learning Ensemble For Anomaly Detection in Real-Time Industry 4.0 Systems
ABSTRACT Detecting faults and anomalies in real-time industrial systems is a challenge due to the
difficulty of sufficiently covering an industrial system’s complexity. Today, Industry 4.0 makes it possible
to tackle these problems through emerging technologies such as the Internet of Things and Machine
Learning. This paper proposes a hybrid machine-learning ensemble real-time anomaly-detection pipeline
that combines three Machine Learning models –Local Outlier Factor, One-Class Support Vector Machine,
and Autoencoder–, through a weighted average to improve anomaly detection. The ensemble model was
tested with three air-blowing machines obtaining a F1 -score value of 0.904, 0.890, and 0.887, respectively.
The results of the ensemble model showed improved performance metrics concerning the individual
metrics. A novelty of this model is that it consists of two stages inspired by a standard industrial system:
i) a manufacturing stage and ii) an operation stage.
INDEX TERMS Anomaly detection, industry 4.0, machine learning, predictive maintenance, real-time.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
72024 VOLUME 10, 2022
D. Velásquez et al.: Hybrid Machine-Learning Ensemble for Anomaly Detection
i) Models that take into account physical principles and approaches and research for anomaly detection in real-time.
ii) models based on historical observations. One of the tech- Next, the third section shows a detailed explanation of
niques used in the second group consists of the early detection the proposed hybrid anomaly detection. Finally, the results
of abnormal behavior in industrial equipment. This early section describes the scores obtained by applying the hybrid
detection can avoid possible breakdowns of equipment and anomaly detection methodology to a testing data set. A Con-
reduce associated maintenance costs. clusions section ends this paper, showing some concluding
Anomaly detection is being researched in several appli- remarks and a future work proposal.
cation fields. Some of the associated research fields are
disease detection, intrusion detection, fraud prediction, and II. STATE OF THE ART
fault detection in industrial equipment [4]. It is possible to According to [6], [7], an anomaly can be defined as a point in
identify anomalous states that do not match the normality time where the system’s behavior is unusual and significantly
data, which usually corresponds to the predominant states different from previous, normal behavior. An anomaly may
through anomaly detection. imply an adverse change in the system, for instance, a fluc-
The detection of anomalous states presents a challenging tuation in a jet engine’s turbine rotation frequency, which
task. The detection becomes more complicated than usual if possibly means an imminent failure. An anomaly may also
it is to be done in real-time due to the restrictive features of the mean positive behavior; for instance, many web clicks on
streaming data. Unlike batch learning, where all the historical a new product page imply higher demand. In both cases,
data are available, and no new information is added to the anomalies in data provide an insight into abnormal behavior
models already built, stream learning has five restrictions that that can be translated into potentially useful information.
must be taken into account [5]. i) Streaming data samples The challenge of detecting anomalies –in an industrial
arrive online and can be read at most one time, which is a environment– can be twofold. Firstly, to propose a method
strong restriction for processing them since the system has to understand different data obtained from various sensors,
to decide whether the current data sample is discarded or often with excessive noise. Secondly, to obtain an overview of
archived. ii) Past data samples can only be accessed if stored normal behavior to characterize such behavior from historical
in memory. Otherwise, a forgetting mechanism in charge of data. Therefore, to correctly detect anomalies in a data set,
discarding past samples is applied. iii) Since not all data one must first characterize and define normal data behav-
samples can be stored, a decision made on past samples ior [8]. In addition, normal behavior can be characterized
cannot be undone. iv) The data processing time of each data by the following three stages. (i) Consider data describing
sample should be short and constant. v) The data processing normal behavior through historical data (without considering
algorithm must produce a model equivalent to what a batch anomalies) segmented into different classes according to the
algorithm would produce. context in which they were recorded. (ii) Extract the most
The former five restrictions are why most anomaly detec- frequent behaviors, thus characterizing each class. (iii) Detect
tion algorithms –for batch processing– do not apply to stream anomalies in newly recorded data based on previous knowl-
processing. Nonetheless, there are hybrid approaches that use edge.
batch-learning algorithms to build an initial model as the first In general, anomalies are classified into three types: spe-
step and then apply streaming anomaly-detection algorithms cific, contextual, and collective [9]–[11]. It is considered a
as the second step. point anomaly when this single data point is recognized as
The contribution of this work is the evaluation and com- anomalous concerning the rest of the data. According to [10],
parison of different methods to detect anomalies that, due to these anomalies must be identified before processing or ana-
their performance-control metrics, establish the weight (or lyzing the data.
incidence) of each method in the final combined model, thus
responding better and efficiently to the challenge of real-time • Contextual anomalies are those where the data are con-
anomaly detection. Specifically, the present work combines sidered anomalous in a specific context (e.g., the same
the predicted output of three Machine Learning (ML) mod- sample data are ‘‘normal’’ in a given scenario but anoma-
els: Local Outlier Factor (LOF), One-Class Support Vector lous in another context). These types of anomalies are
Machine (OCSVM), and Autoencoder employing a weighted more common in time-series data flows [10].
average –using as weight the F1 -score value of each model. • Collective anomalies are those that occur when a collec-
The goal of the combined model is the detection of anomalies tion of related data are considered anomalous to the total
in industrial systems in real-time. The proposed hybrid model data. Collective anomalies can also be spatial if they are
was implemented using a data set from a real industrial outside a typical range or temporal, where the value is
system of air-blowing machines. Thus, it can be said that the not outside the typical range. However, the sequence in
proposed hybrid anomaly detection model applies to Industry which it occurs is unusual.
4.0 systems as well as other industrial frameworks where Anomaly detection methods can be distinguished as super-
real-time data acquisition systems are available. vised, semi-supervised and unsupervised. Using one method
The following sections of the article are divided into or another usually depends on the existence or not of descrip-
four sections. The state-of-the-art section shows existing tive labels of the anomaly. The labels can be categorical,
e.g., we can have a case of binary or all/nothing labels such TABLE 1. Classification of the different techniques for anomaly
detection [13].
as ‘‘anomalous behavior’’ (1) and ‘‘non-anomalous / normal
behaviour (0)’’, or numerical, e.g., a value of ‘‘anomaly
score’’ ranging from 0 (‘‘non-anomalous / normal’’) to 1
(‘‘totally anomalous’’). While anomaly detection could be
posed as a supervised learning problem, this is –generally–
not the case, as there is often no or little data labeled with the
anomalous behavior [12].
Once the data is available, normally, a series of transfor-
mations of the data needs to be performed before starting the
anomaly detection process [13].
• Aggregation methods: A set of consecutive values from
a time-series data is replaced by a corresponding rep-
resentative value. It provides benefits such as reducing
dimensionality, although it can make detecting anoma-
lies in subsequent steps difficult.
• Discretisation methods: Time-series data are converted
into a discrete sequence of finite alphabets. Techniques
such as symbolic sequence and editing distance can be
applied to detect anomalies.
• Digital Signal Processing (DSP) techniques (such as
Fourier transform, Gabor, and Wavelets filters): Time- The two typologies covered by this technique are para-
series data are transformed into a lower-dimensional metric and non-parametric. The first assumes an underlying
representation of the input data where anomaly detection data distribution. Although somewhat less efficient in finding
can take place. anomalies, the second is preferred because, a priori, it does
A common type of problem detected, which may be present not define any model structure as this is determined from the
in the data, is noise and outliers. Noise among normal data data.
may cause the model not to obtain the desired optimal pre- The most common parametric techniques are divided into
dictions. Outliers are data points that may be caused by noise those based on Gaussian models and those based on regres-
or may have an irregular pattern of behavior. Therefore, this sion models. If a non-parametric approach is to be followed,
unusual behavior must first be identified and decided whether such a classification can be made based on histograms or
it should be considered an anomaly or an outlier. kernels.
Usually, data are created by one or more generation pro- Statistical techniques work well for simple structured data
cesses, representing system’s activities. When the generation with small dimensions and volume. In such cases, sev-
process behaves unusually, it creates anomalies. Therefore, eral methods can be used [13], such as Box-plots, Blum
an anomaly often contains valuable information about the Floyd Pratt Rivest Tarjan (BFPRT) algorithm, and similar
abnormal characteristics of the systems and elements that central-value estimations on data streams; Medcouple and
impact the generation process [11]. Grubbs test (for univariate data); Comparison of distributions
(QQ charts, Kolmogorov-Smirnov test, Kruskal-Wallis test,
and Wilcoxon signed range tests); Auto-regressive techniques
A. CLASSIFICATION OF TECHNIQUES FOR ANOMALY
(Auto-regressive Integrated Moving Average - ARIMA,
DETECTION
Auto-regressive Moving Average - ARMA); ML-based meth-
There are currently six techniques to detect anomalies. These ods; Bayesian networks. Principal Components Analysis
techniques are i) Statistics, ii) Classification, iii) Clustering, (PCA) / Independent Component Analysis (ICA) (e.g.,
iv) Similarity-based, v) Soft Computing, and vi) Knowl- sequence micro-batch analysis).
edge and Combined Techniques based, as explained in [13].
In Table 1, these techniques –and some examples of the 2) CLASSIFICATION BASED ANOMALY DETECTION
algorithms– used can be seen in detail. The most relevant ones TECHNIQUES
for this work will be detailed next.
Classification-based anomaly detection techniques perform
two main stages called training and testing. In the training
1) STATISTICS BASED ANOMALY DETECTION TECHNIQUES phase, the system learns from the available samples and
Statistical techniques adjust a predefined distribution to generates a classifier. In the testing phase, samples that the
a given data and apply statistical inference to determine classifier has not seen are tested to measure the model’s
whether an instance belongs to that model. Instances with a performance. According to the labels available for training,
low probability are reported as anomalies [14]. classifiers can be grouped into two categories: i) one-class
and ii) multi-class. Examples of single and multi-class classi- The above is because the Euclidean distance does not work
fiers are neural networks, Bayesian networks, Support Vector well in high-dimensional sets, and measurements such as
Machines (SVM), and decision trees. These, together with Mahalanobis, Hamming, or Chebyshev distances are used
fuzzy logic, are also methods that present a good performance instead. The k-NN algorithm is based on the data score given
in the presence of strong noise [15]–[18]. by the distance to most of the data around it. So, new data
Classification-based techniques have the advantage of are classified according to this score. Although, there are
being able to distinguish between observations that belong some considerations to be taken into account in this type
to different anomalies (instead of an overall class called of technique [13]: i) A shortage of data can be seen as an
‘‘anomaly’’), and their testing phase is quick, as the test anomaly in unsupervised techniques. ii) The performance is a
instance is compared to the predefined model [19]. Although, function of the distance method chosen; therefore, the criteria
classification techniques are based on the availability of must be clear when choosing a metric. iii) It is valid only
assigning labels to various normal and abnormal classes, in cases of low-dimensional data. Defining a measure of the
which is a difficult task. Also, these techniques assign labels distance between instances can be complicated when the data
to test data, which can be a disadvantage when an anomaly dimension is increased.
score is desired. Another essential similarity-based anomaly detection tech-
Classification-based techniques can also be categorized nique is based on relative density rather than distance. This
according to the type of anomaly. Radial-Base Functions technique estimates the neighborhoods’ density so that a
(RBF), SVM, and derivates are commonly used for individual data item in a low-density neighborhood will be anomalous
anomalies. RBFs are very accurate and fast, particularly for while one in a high-density neighborhood will be considered
the supervised classification of individual anomalies. For normal. An existing method for the above is the Local-Outlier
multiple anomalies, Deep Neural Networks (DNN), induction Factor (LOF), which introduces the concept of local outliers
rules, and decision trees are used. DNNs can provide excep- and is based on scoring a data sample according to the
tional recognition rates in static scenarios but can give data average ratio of the neighborhood’s density to the instance’s
problems that vary over time. density [20].
windows. In this case, the authors start from the ‘‘concept The manufacturing stage or pipeline of the Hybrid
drift’’, which is a common occurrence handling the streaming Anomaly Detection model construction process takes its
of data in dynamic and non-stationary environments pro- name from the manufacturing process of an industrial
ducing a change in the distribution of the data [27]. The machine. At this stage, an ML model is trained on machines’
‘‘concept drift’’ is a problem that occurs when the statistical quality control process data to validate whether the machine
properties of the target variable change over time and the meets its design standards or not [34]. Thus, the objective of
anomaly detection model is no longer compatible with the completing this manufacturing stage model construction task
data the model handles, resulting in less accurate predictions. is double: (i) to use the trained model for detecting machine
Therefore, to maintain the anomaly detection effectively, the design/manufacturing anomalies; (ii) to later deploy it in the
model needs to be retrained and updated based on the new operation stage of the machine when it is integrated into
data the model receives [27]. an industrial production process, for performing a machine
Another research work on anomaly detection is proposed operation anomaly detection task. This model construction
by [28], which is based on an HT (Hoeffding tree). It is manufacturing stage is equivalent to the design phase of a
an inductive-incremental decision-tree algorithm used for classical ML workflow. The metric chosen for measuring
anomaly detection. A handicap of this algorithm is that it models’ performance is the F1 -score of label L. The data
needs class labels to be available for training. set available is a slightly imbalanced (see Table 2 for class
Another work to be highlighted would be that carried out sizes percentage), where more machine’s ‘‘normal data’’ than
by a group of Yahoo researchers [29]. Their system –called ‘‘anomalous data’’ exists, for which the F1 -score metric is
Extensible Generic Anomaly Detection System (EGADS)– considered appropriate. The F1 -score is a value in the [0, 1]
allows precise, flexible, scalable, and extensible detection of range, and it’s calculated as the harmonic mean of the estima-
anomalies, taking into account time series. The system makes tor’s precision and recall with respect to L (see Equation (1))
it possible to separate forecasting, anomaly detection, and 2 × precisionL × recallL
alerts into three separate components. F1 −scoreL = (1)
precisionL + recallL
Finally, another interesting work is that contributed by [30]
in which, through the integration of various technologies, the Finally, models’ F1 -score (F1i )P performance ratio with
development of a disease in the leaf of a Colombian-coffee respect to the sum of all F1 -scores ( j F1j ) (see Equation 2)
variety is evaluated and diagnosed. The project contribution is calculated and used as the weight (wi ) for the weighted
relied on a model ensemble comprising four sub-models average of the prediction done by each model multiplied by
that received the data according to their nature. Once the the computed weights. This weighted average assembles the
prediction of each sub-model was made, its results were Hybrid Anomaly Detection model at the manufacturing stage.
combined, calculating the weighted average. The weight of F1 − scorei
each sub-model was a value associated with its F1 -score value wi = P (2)
j F1 − scorej
in the final model.
Most of the approaches to detect anomalies existing in the The operation stage or pipeline refers to the phase when
literature are based on models that first build a profile of the machine is already running in production; in terms of a
what is ‘‘normal’’ and then point out those instances that do classical ML pipeline, it represents the deployment phase.
not fit that normal profile as anomalies (statistical methods, Thus, this pipeline requires the machine to be able to measure
classification-based methods, or cluster-based methods use the same variables taken at the manufacturing stage through
this approach). industrial sensors. Once these sensors’ data are captured in
A contribution of this work is to build an ensemble model real-time, they are used as inputs for the Hybrid Anomaly
that uses different algorithms that, by combining their results, Detector, already trained during the manufacturing stage.
will generate a new model to detect anomalies. Ensemble This detector will diagnose based on the data received to
learning, either for classification or regression, refers to generate an alarm for the operator in case of an anomaly. This
methods that generate multiple models that are combined to detector can also be tuned in operation through a supervised
make a prediction [31]. Ensembles have been –extensively– action of the operator. If this action is triggered, the data are
used in the last decades as they are considered to provide captured during a time window and labeled as ‘‘normal’’ data.
greater accuracy and increased robustness [32]. Additionally, The models are retrained within the hybrid anomaly detector
multiple ensemble approaches have been proposed, and sev- when the data capture is complete. Once the calibration is fin-
eral studies have reported that model diversity enhances the ished, the system will be able to continue detecting anomalies
ensemble model’s performance as different learners general- in real-time.
ize in different ways [33].
A. MANUFACTURING-STAGE PIPELINE
III. PROPOSED METHODOLOGY As previously mentioned, this stage is executed when the
The proposed ML hybrid pipeline for real-time anomaly machine is in the factory. The proposed pipeline requires
detection, as seen in Fig. 1, consists of two stages: i) the that the manufactured machine goes through a quality control
Manufacturing stage and ii) the Operation stage. process [34], where sensors can capture information about the
TABLE 2. Air-Blowing machines’ data set characteristics. TABLE 3. Variables pre-processing at manufacturing stage.
with cross-validation. Finally, the newly trained models are The sensors’ data set was composed of the variables mea-
updated in the Hybrid Anomaly Detector. It should be noted sured by sensors installed in each machine in the Quality-
that only the weights (obtained through the F1 -scores) that Control stage. The measured variables were Flow Rate,
were acquired in the manufacturing process are used because, Power, Water Temperature, Nozzle Temperature, Input Pres-
in the operation process, usually, there are no anomalous data sure, Output Pressure, Flow Temperature, Machine Vibra-
to measure this performance. The operation stage pipeline can tions, RPM, Active Power, Cos Phi, Motor Current, Motor
be seen in Fig. 3. Voltage, Ambient Humidity, Ambient Temperature, Atmo-
spheric Pressure.
C. EXPERIMENTAL SETUP The pre-processing step selects the shared variables for
The proposed ML Hybrid real-time anomaly detection the manufacturing and operation stages. The variables’ pre-
pipeline was tested for three different industrial air-blowing processing can be seen in Table 3, with a total of 11 variables
machines from the local industry, with a data set generated by selected (those with ticks in both manufacturing and oper-
the quality-control process, and these machines are currently ation). Additionally, samples with invalid or missing values
operational. were checked and removed from the data set in the pre-
The period for collecting machines’ data is between processing stage.
7 January 2020 and 2 October 2020. The data are recorded Afterward, the pre-processed data set was normalized to
and stored at 2-second intervals. The final data set comprises scale variables’ values, as it is recommended for data prepa-
16 columns (15 variables and timestamps) with 1990 obser- ration in ML since some of the variables have different
vations for Machine A, 2009 observations for Machine B, and ranges [40]. The normalization used for this experiment was
2132 observations for Machine C. The above-mentioned data the Min-Max scaling, which scaled the data to values between
set characteristics are shown in the table 2. 0 and 1.
TABLE 4. Outlier detection using DBSCAN. TABLE 5. Labelled data sets final samples observations.
TABLE 6. Hyper-parameters selection table. TABLE 9. Hyperparameters and F1 -score for each generated submodel of
Machine C.
The labeled data set was then separated into three sets: IV. RESULTS
20% Validation set, 60% Training set (with only normal data), In addition to the pipeline proposed for real-time anomaly
and 20% Test set, as explained in the Manufacturing stage detection, the proposed hybrid model must present improved
pipeline section. For the Training set, a grid search with performance metrics for the individual models. In this case,
cross-validation was performed with five folds (k = 5), the precision, recall, and F1 -score values, as well as the Area
where a set of hyper-parameters for each model was defined Under the ROC Curve (AUC) of all models, were compared.
so that the search algorithm finds the best ones according to
their respective F1 -score. These initial hyper-parameters are A. MANUFACTURING-PIPELINE RESULTS
displayed on Table 6. Three machines were selected corresponding to three differ-
Tables 7, 8, and 9 show the selected hyper-parameters and ent model versions to check that the hybrid models worked
the obtained F1 -score values for the three machines. equally well on heterogeneous equipment.
The last step of the proposed ML pipeline consisted of The confusion matrix allows checking which types of hits
implementing an ensemble of three models: LOF, OCSVM, and errors (type I or false-negative errors and type II or
and Autoencoder, through a weighted average distribution. false-positive errors) the current models have through their
Autoencoder’s architecture is detailed in Table 10. Table 11 different metrics, such as accuracy, precision, sensitivity, and
shows the weights for the predictions of each model, which specificity. Finally, the confusion matrix of the ensemble
were determined as the ratio of each F1 -score value in model was analyzed to check whether it improves the indi-
Tables 7, 8, and 9 with respect to the sum of all F1 -score vidual models’ performance or not. In this respect, we focus
values for each class (‘‘-1’’ and ‘‘1’’). As an illustrative on two metrics: i) Precision: Anomaly data are classified as
example, for a given sample, the LOF model predicted an normal. Also known as the False Positive Rate (FP) or Type
anomaly (-1), the OCSVM predicted normality (1), and the I error. ii) Recall: Normal data are classified as an anomaly,
Autoencoder predicted an anomaly (-1) again, each output also known as False Negative Rate (FN) or Type II error.
TABLE 12. Machine A - confusion matrix (test set). TABLE 15. Machine A - metrics table (test set).
TABLE 18. Performance results of each model in microseconds. meta-classifiers using different base classifiers such as recur-
rent neural networks, like LSTMs, where time series need
to be considered. Furthermore, a study with a larger number
of machines must be carried out to see how well the hybrid
model generalizes against the individual sub-models. In cases
where the hybrid model does not provide any improvement,
same data for the hybrid model and analyzed the computation other ensemble strategies such as taking the best of the indi-
time needed to process the data. The results are presented in vidual sub-models are considered.
table 18. Finally, as this project focuses on single-type anomaly
As expected, the hybrid model was slower than the indi- detection, a challenge to be addressed in future work will be
vidual ones. Nevertheless, its time response is still over the to be able to classify or categorize different types of faults.
real-time response threshold defined for a run-of-the-mill For that, the authors might use appropriate methods such as
computer of 2020 (under 200 milliseconds in the worst loop explainable ML or correspondingly labeled datasets.
of the batch analysis), thus achieving the objective established ACKNOWLEDGMENT
for the operation stage: real-time anomaly detection. The authors would like to thank the Vicomtech Foundation
for providing the necessary resources for the proper execution
V. CONCLUSION
of this research project and University EAFIT for the research
This research work has developed and presented a Hybrid
grant awarded to the principal author.
Machine-Learning Ensemble for Anomaly Detection for a
Real-Time Industry 4.0 System. This ensemble consists of REFERENCES
implementing two stages inspired by a standard industrial [1] M. Xu, J. M. David, and S. H. Kim, ‘‘The fourth industrial revolu-
tion: Opportunities and challenges,’’ Int. J. Financial Res., vol. 9, no. 2,
system: i) A Manufacturing Stage and ii) An Operation pp. 92–95, 2018.
Stage. Up to our knowledge, there are no other ML meth- [2] M. Reis and G. Gins, ‘‘Industrial process monitoring in the big
ods that consider these industrial stages. The ensemble sys- data/industry 4.0 era: From detection, to diagnosis, to prognosis,’’
Processes, vol. 5, p. 35, Jun. 2017. [Online]. Available: http://www.
tem was tested on three machines, presenting an increased mdpi.com/2227-9717/5/3/35
F1 -score value and AUC concerning individual ML sub- [3] S. H. An, G. Heo, and S. H. Chang, ‘‘Detection of process anomalies using
models (LOF, OCSVM, and Autoencoder). The ensemble an improved statistical learning framework,’’ Expert Syst. Appl., vol. 38,
no. 3, pp. 1356–1363, Mar. 2011.
model for Machine A presented a F1 -score value of 0.904 for [4] A. Boukerche, L. Zheng, and O. Alfandi, ‘‘Outlier detection: Methods,
anomalies (-1), a F1 -score value of 0.944 for normal data models, and classification,’’ ACM Comput. Surv., vol. 53, no. 3, pp. 1–37,
(1), and an AUC value of 0.913; the ensemble model for May 2021.
[5] J. A. Silva, E. R. Faria, R. C. Barros, E. R. Hruschka, A. C. D. Carvalho,
Machine B presented a F1 -score value of 0.890 for anomalies and J. Gama, ‘‘Data stream clustering: A survey,’’ ACM Comput. Surv.,
(-1), a F1 -score value of 0.946 for normal data (1), and an vol. 46, no. 1, pp. 1–31, 2013.
[6] S. Ahmad, A. Lavin, S. Purdy, and Z. Agha, ‘‘Unsupervised real-
AUC value of 0.905; finally, the ensemble model for Machine time anomaly detection for streaming data,’’ Neurocomputing, vol. 262,
C presented a F1 -score value of 0.887 for anomalies (-1), pp. 134–147, Nov. 2017.
a F1 -score value of 0.889 for normal data (1), and an AUC [7] V. Chandola, V. Mithal, and V. Kumar, ‘‘Comparative evaluation of
anomaly detection techniques for sequence data,’’ in Proc. 8th IEEE Int.
value of 0.897. Conf. Data Mining, Dec. 2008, pp. 743–748.
The proposed system allows vertical scaling in the number [8] J. Rabatel, S. Bringay, and P. Poncelet, ‘‘Anomaly detection in monitoring
sensor data for preventive maintenance,’’ Expert Syst. Appl., vol. 38, no. 6,
of algorithms used for the ensemble. As seen in section pp. 7003–7015, Jun. 2011.
Results, subsection B, the hybrid model presented a maxi- [9] V. Vercruyssen, W. Meert, G. Verbruggen, K. Maes, R. Baumer, and
mum computation time of approximately 190 milliseconds, J. Davis, ‘‘Semi-supervised anomaly detection with an application to water
analytics,’’ in Proc. IEEE Int. Conf. Data Mining (ICDM), Nov. 2018,
fast enough for real-time anomaly detection. Concerning pp. 527–536.
individual models’ performance, the Autoencoder results [10] M. Fahim and A. Sillitti, ‘‘Anomaly detection, analysis and
showed a low F1 -score value, so it is proposed to test prediction techniques in IoT environment: A systematic literature
review,’’ IEEE Access, vol. 7, pp. 81664–81681, 2019. [Online].
other algorithms (e.g., Isolation Forest, Elliptic Envelope) Available: https://ieeexplore.ieee.org/document/8733806, doi: 10.1109/
to improve the overall performance of the whole assembly. ACCESS.2019.2921912.
[11] B. R. Priyanga and D. Kumari, ‘‘A survey on anomaly detection
However, a study of the computational cost linked to the
using unsupervised learning techniques,’’ Int. J. Creative Res. Thoughts
retraining of more types of algorithms must be carried out. (IJCRT), vol. 6, no. 2, pp. 2320–2882, 2018. [Online]. Available:
Future work is proposed to study system retraining in http://www.ijcrt.org/papers/IJCRT1812118.pdf
[12] V. Chandola, A. Banerjee, and V. Kumar, ‘‘Anomaly detection for
the Operation Stage pipeline and its computational cost. discrete sequences: A survey,’’ IEEE Trans. Knowl. Data Eng.,
It is also proposed to study the proposed system devel- vol. 24, no. 5, pp. 823–839, 2012. [Online]. Available: https://
oped on machines with different levels of degradation. Addi- ieeexplore.ieee.org/document/5645624, doi: 10.1109/TKDE.2010.235.
[13] A. I. Rana, G. Estrada, M. Sole, and V. Muntes, ‘‘Anomaly detection
tionally, a data imputation study should be carried out to guidelines for data streams in big data,’’ in Proc. 3rd Int. Conf. Soft Comput.
generate synthetic samples for systems where some infor- Mach. Intell. (ISCMI), Nov. 2016, pp. 94–98.
mation is missing (a loss of data due to communication [14] M. Hubert and E. Vandervieren, ‘‘An adjusted boxplot for skewed
distributions,’’ Comput. Statist. Data Anal., vol. 52, no. 12,
breakdowns is a common problem in industrial systems). pp. 5186–5201, 2008. [Online]. Available: http://www.sciencedirect.
Deep Learning techniques could be considered when creating com/science/article/pii/S0167947307004434
[15] S. Agrawal and J. Agrawal, ‘‘Survey on anomaly detection using [39] T. Amarbayasgalan, B. Jargalsaikhan, and K. Ryu, ‘‘Unsupervised novelty
data mining techniques,’’ Proc. Comput. Sci., vol. 60, pp. 708–713, detection using deep autoencoders with density based clustering,’’ Appl.
Jan. 2015. [Online]. Available: https://www.sciencedirect.com/science/ Sci., vol. 8, no. 9, p. 1468, Aug. 2018.
article/pii/S1877050915023479 [40] A. Zheng and A. Casari, Feature Engineering for Machine Learning:
[16] S. Ferreiro, B. Sierra, I. Irigoien, and E. Gorritxategi, ‘‘A Bayesian network Principles and Techniques for Data Scientists, 1st ed. Sebastopol, CA,
for burr detection in the drilling process,’’ J. Intell. Manuf., vol. 23, no. 5, USA: O’Reilly Media, 2018.
pp. 1463–1475, Oct. 2012, doi: 10.1007/s10845-011-0502-z. [41] D. Freedman, R. Pisani, and R. Purves, Statistics: Fourth Inter-
[17] B. Sierra, E. Lazkano, E. Jauregi, and I. Irigoien, ‘‘Histogram distance- national Student Edition (International Student Edition). New York,
based Bayesian network structure learning: A supervised classification NY, USA: W. W. Norton & Company, 2007. [Online]. Available:
specific approach,’’ Decis. Support Syst., vol. 48, no. 1, pp. 180–190, https://books.google.es/books?id=mviJQgAACAAJ
Dec. 2009.
[18] Y. Yuan, S. Li, X. Zhang, and J. Sun, ‘‘A comparative analysis of SVM,
naive Bayes and GBDT for data faults detection in WSNs,’’ in Proc.
IEEE Int. Conf. Softw. Qual., Rel. Secur. Companion (QRS-C), Jul. 2018, DAVID VELÁSQUEZ received the B.S. degree
pp. 394–399. in mechatronics engineering from the Univer-
[19] V. Chandola, A. Banerjee, and V. Kumar, ‘‘Anomaly detection: A sur- sity Escuela de Ingeniería de Antioquia (EIA),
vey,’’ ACM Comput. Surv., vol. 41, no. 3, pp. 1–58, Jul. 2009, doi:
in 2011, and the master’s degree in engineering
10.1145/1541880.1541882.
[20] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, ‘‘LOF: Identi- from Universidad EAFIT, with emphasis on tech-
fying density-based local outliers,’’ ACM SIGMOD Rec., vol. 29, no. 2, nical systems integrated design, in 2014. He is
pp. 93–104, Jun. 2000. currently pursuing the Ph.D. degree in informat-
[21] P.-Y. Chen, S. Yang, and J. A. McCann, ‘‘Distributed real-time anomaly ics with the University of the Basque Country,
detection in networked industrial sensing systems,’’ IEEE Trans. Ind. Spain, in collaboration with research projects from
Electron., vol. 62, no. 6, pp. 3832–3842, Jun. 2015. the VICOMTECH Research Center. He is also
[22] S. Lee, G. Kim, and S. Kim, ‘‘Self-adaptive and dynamic clustering working as an Assistant Professor with the Department of Systems and
for online anomaly detection,’’ Expert Syst. Appl., vol. 38, no. 12, Informatics Engineering and as a Researcher with the TICs Development
pp. 14891–14898, Nov. 2011. and Innovation Research Group (GIDITIC) and the Design Engineering
[23] E. H. M. Pena, S. Barbon, J. J. P. C. Rodrigues, and M. L. Proenca, Research Group (GRID), Universidad EAFIT. His research interests include
‘‘Anomaly detection using digital signature of network segment with adaptive systems control design, mechatronics design, industry 4.0, machine
adaptive ARIMA model and paraconsistent logic,’’ in Proc. IEEE Symp. learning, computer vision, electronics optimization, embedded systems,
Comput. Commun. (ISCC), Jun. 2014, pp. 1–6.
the Internet of Things implementation, and biomedical signal processing
[24] S. C. Tan, K. M. Ting, and T. F. Liu, ‘‘Fast anomaly detection for streaming
data,’’ in Proc. IJCAI Int. Joint Conf. Artif. Intell., 2011, pp. 1511–1516.
applications.
[25] N. Ding, H. Ma, H. Gao, Y. Ma, and G. Tan, ‘‘Real-time anomaly detection
based on long short-term memory and Gaussian mixture model,’’ Comput.
Electr. Eng., vol. 79, Oct. 2019, Art. no. 106458.
[26] F. T. Liu, K. M. Ting, and Z. H. Zhou, ‘‘Isolation-based anomaly detec- ENRIQUE PÉREZ received the graduate degree
tion,’’ ACM Trans. Knowl. Discovery Data, vol. 6, no. 1, pp. 1–39, 2012. in information technology engineering from the
[27] J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bouchachia, Universidad Nacional de Educación a Distancia
‘‘A survey on concept drift adaptation,’’ ACM Comput. Surv., (UNED), in 2019. He is currently pursuing the
vol. 46, no. 4, Mar. 2014, Art. no. 44. [Online]. Available: master’s degree in data science with the Univer-
https://dl.acm.org/doi/10.1145/2523813, doi: 10.1145/2523813. sitat Oberta de Catalunya (UOC), carrying out
[28] G. Hulten, L. Spencer, and P. Domingos, ‘‘Mining time-changing data the external end of master’s work in the field of
streams,’’ in Proc. 7th ACM SIGKDD Int. Conf. Knowl. Discovery Data artificial intelligence, developing a proposal for
Mining (KDD), 2001, pp. 97–106. intelligent services for industrial blowers with the
[29] N. Laptev, S. Amizadeh, and I. Flint, ‘‘Generic and scalable framework for VICOMTECH Research Center, Data Intelligence
automated time-series anomaly detection,’’ in Proc. 21st ACM SIGKDD for Energy and Industrial Processes Department, through an educational
Int. Conf. Knowl. Discovery Data Mining, Aug. 2015, pp. 1939–1947.
cooperation agreement between the university and company. He carried
[30] D. Velásquez, A. Sánchez, S. Sarmiento, M. Toro, M. Maiza, and B. Sierra,
‘‘A method for detecting coffee leaf rust through wireless sensor networks,
out the end-of-degree project (PFG) in the field of machine learning (ML)
remote sensing, and deep learning: Case study of the caturra variety in associated with predictive maintenance in industry 4.0 environments. His
Colombia,’’ Appl. Sci., vol. 10, no. 2, p. 697, Jan. 2020. research interests include the field of machine learning and deep learning
[31] J. Mendes-Moreira, C. Soares, A. M. Jorge, and J. F. D. Sousa, ‘‘Ensemble (DL), creation of predictive models through advanced analytics in the Indus-
approaches for regression: A survey,’’ ACM Comput. Surv., vol. 45, no. 1, trial Internet of Things (IIoT) systems, data visualization, and its practical
p. 10, Nov. 2012. application for the industry 4.0.
[32] N. Garcia-Pedrajas, C. Hervas-Martinez, and D. Ortiz-Boyer, ‘‘Coopera-
tive coevolution of artificial neural network ensembles for pattern classifi-
cation,’’ IEEE Trans. Evol. Comput., vol. 9, no. 3, pp. 271–302, Jun. 2005.
[33] J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas, ‘‘On combining classi- XABIER OREGUI received the Ph.D. degree in
fiers,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 3, pp. 226–239, telecommunications engineering from the Cen-
Mar. 1998. tro de Estudios e Investicación Técnicas (CEIT),
[34] H. Judi, R. Jenal, and D. Genasan, Quality Control Implementation in University of Navarra, more precisely in the
Manufacturing Companies: Motivating Factors and Challenges. London, area of electronics and communications, where
U.K.: IntechOpen, Apr. 2011, ch. 25. he researched about multi-source virtual machine
[35] A. Jovic, K. Brkic, and N. Bogunovic, ‘‘A review of feature selection meth- management and automatic scaling. After a
ods with applications,’’ in Proc. 38th Int. Conv. Inf. Commun. Technol., small recess from research, where he spent
Electron. Microelectron. (MIPRO), May 2015, pp. 1200–1205.
his acquired knowledge developing educational
[36] E. Schubert, J. Sander, M. Ester, H. Kriegel, and X. Xu, ‘‘DBSCAN
games and ‘‘serious-games’’ for the company
revisited, revisited: Why and how you should (still) use DBSCAN,’’ ACM
Trans. Database Syst., vol. 42, no. 3, pp. 1–21, 2017. Ikasplay, in 2016, he camed back to the research world in Vicomtech to the
[37] B. Schölkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and area of data intelligence and industrial processes. Back on his incorporation
R. C. Williamson, ‘‘Estimating the support of a high-dimensional on Vicomtech, he has been working on multiple projects focused on data
distribution,’’ Neural Comput., vol. 13, no. 7, pp. 1443–1471, Jul. 2001. management on industrial environments using different kinds of protocols,
[38] D. M. Tax and R. P. Duin, ‘‘Uniform object generation for optimizing one- and projects oriented on the management and analysis on big data and the
class classifiers,’’ J. Mach. Learn. Res., vol. 2, pp. 155–173, Dec. 2002. visualization of that information from that same environment.
ARKAITZ ARTETXE received the degree in com- MAURICIO TORO received the B.S. degree in
puter engineering and the M.Sc. degree in compu- computer science and engineering from Pontificia
tational engineering and intelligent systems from Universidad Javeriana, Colombia, in 2009, and the
the University of the Basque Country (UPV/EHU), Ph.D. degree in computer science from the Univer-
San Sebastian, in 2011 and 2014, respectively, and sité de Bordeux, France, with emphasis on artifcial
the Ph.D. degree in computer science with empha- intelligence, in 2012. He has been a Postdoctoral
sis on the application of knowledge engineering Fellow with the Computer-Science Departament,
and machine learning to the medical domain from University of Cyprus, since 2013. Since 2014,
the University of the Basque Country, in 2017. he has been working as an Assistant Professor
Since 2011, he has been working as a Researcher with the Department of Systems and Informatics
in the field of biomedical applications with the Technological Centre Engineering and as a Researcher with the TICs Development and Innova-
Vicomtech. Since 2018, he has been working as a Researcher with the Data tion Research Group (GIDITIC), Universidad EAFIT. His research inter-
Intelligence for Energy and Industrial Processes Department, Vicomtech. His ests include artificial intelligence, industry 4.0, machine learning, computer
research interests include machine learning, imbalanced classification, and vision, and agricultural applications.
data fusion techniques in the context of industry 4.0.