0% found this document useful (0 votes)
44 views13 pages

A Hybrid Machine-Learning Ensemble For Anomaly Detection in Real-Time Industry 4.0 Systems

The document proposes a hybrid machine learning ensemble for real-time anomaly detection in Industry 4.0 systems. It combines three machine learning models - Local Outlier Factor, One-Class Support Vector Machine, and Autoencoder - through a weighted average to improve anomaly detection performance. The ensemble model was tested on three air-blowing machines, achieving F1-scores between 0.887-0.904. The ensemble model showed improved performance over the individual models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views13 pages

A Hybrid Machine-Learning Ensemble For Anomaly Detection in Real-Time Industry 4.0 Systems

The document proposes a hybrid machine learning ensemble for real-time anomaly detection in Industry 4.0 systems. It combines three machine learning models - Local Outlier Factor, One-Class Support Vector Machine, and Autoencoder - through a weighted average to improve anomaly detection performance. The ensemble model was tested on three air-blowing machines, achieving F1-scores between 0.887-0.904. The ensemble model showed improved performance over the individual models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Received 9 June 2022, accepted 29 June 2022, date of publication 4 July 2022, date of current version 13 July 2022.

Digital Object Identifier 10.1109/ACCESS.2022.3188102

A Hybrid Machine-Learning Ensemble for


Anomaly Detection in Real-Time
Industry 4.0 Systems
DAVID VELÁSQUEZ 1,2,3 , ENRIQUE PÉREZ2 , XABIER OREGUI2 , ARKAITZ ARTETXE 2 ,
JORGE MANTECA4 , JORDI ESCAYOLA MANSILLA5 , MAURICIO TORO1 , MIKEL MAIZA2 ,
AND BASILIO SIERRA 3
1 RID on Information Technologies and Communications Research Group, Universidad EAFIT, Medellín 050022, Colombia
2 Department of Data Intelligence for Energy and Industrial Processes, Vicomtech Foundation, Basque Research and Technology Alliance (BRTA), 20014
Donostia-San Sebastián, Spain
3 Department of Computer Science and Artificial Intelligence, University of Basque Country (UPV/EHU), 20018 Donostia-San Sebastián, Spain
4 Technical Department Direction, Mapner, 20115 Astigarraga, Spain
5 Department of Statistics and Operational Research, Universitat Oberta de Catalunya, Rambla del Poblenou, 08018 Barcelona, Spain

Corresponding author: David Velásquez ([email protected])


This work was supported in part by Vicomtech Foundation and in part by Universidad EAFIT.

ABSTRACT Detecting faults and anomalies in real-time industrial systems is a challenge due to the
difficulty of sufficiently covering an industrial system’s complexity. Today, Industry 4.0 makes it possible
to tackle these problems through emerging technologies such as the Internet of Things and Machine
Learning. This paper proposes a hybrid machine-learning ensemble real-time anomaly-detection pipeline
that combines three Machine Learning models –Local Outlier Factor, One-Class Support Vector Machine,
and Autoencoder–, through a weighted average to improve anomaly detection. The ensemble model was
tested with three air-blowing machines obtaining a F1 -score value of 0.904, 0.890, and 0.887, respectively.
The results of the ensemble model showed improved performance metrics concerning the individual
metrics. A novelty of this model is that it consists of two stages inspired by a standard industrial system:
i) a manufacturing stage and ii) an operation stage.

INDEX TERMS Anomaly detection, industry 4.0, machine learning, predictive maintenance, real-time.

I. INTRODUCTION generated and acquired by data acquisition systems such as


Thanks to the fourth industrial revolution (4IR), traditional a Supervisory Control and Data Acquisition (SCADA) or an
industrial processes face new challenges: improving current embedded system. AI algorithms can then process this data
or establishing new processes that efficiently use novel tech- to generate new knowledge of the process and identify new
nologies and fully exploit their potential. 4IR or Industry machine conditions, which represents one of the advance-
4.0 is viewed as a disruptive innovation in a highly compet- ments provided by Industry 4.0. Predictive maintenance is an
itive market that positively impacts several industrial sectors industrial process that is the subject of the work presented in
by incorporating new enabling technologies: 3D printing, this article and highly benefits from the Industry 4.0 technolo-
the Internet of Things (IoT), Cyber-Physical Systems (CPS), gies mentioned above [2].
Artificial Intelligence (AI), Big Data, Robotics, Nanotechnol- Nowadays, most industrial companies face problems aris-
ogy, and Quantum Computing are examples of these tech- ing from maintaining their systems. However, multiple tech-
nologies [1]. In industrial machines, high volumes of data are niques –involving predictive or condition-based maintenance
(CBM)– allow predicting critical situations to reduce these
The associate editor coordinating the review of this manuscript and problems. According to An et al. [3], in terms of diagno-
approving it for publication was Mehul S. Raval . sis, predictive maintenance is divided into two categories:

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
72024 VOLUME 10, 2022
D. Velásquez et al.: Hybrid Machine-Learning Ensemble for Anomaly Detection

i) Models that take into account physical principles and approaches and research for anomaly detection in real-time.
ii) models based on historical observations. One of the tech- Next, the third section shows a detailed explanation of
niques used in the second group consists of the early detection the proposed hybrid anomaly detection. Finally, the results
of abnormal behavior in industrial equipment. This early section describes the scores obtained by applying the hybrid
detection can avoid possible breakdowns of equipment and anomaly detection methodology to a testing data set. A Con-
reduce associated maintenance costs. clusions section ends this paper, showing some concluding
Anomaly detection is being researched in several appli- remarks and a future work proposal.
cation fields. Some of the associated research fields are
disease detection, intrusion detection, fraud prediction, and II. STATE OF THE ART
fault detection in industrial equipment [4]. It is possible to According to [6], [7], an anomaly can be defined as a point in
identify anomalous states that do not match the normality time where the system’s behavior is unusual and significantly
data, which usually corresponds to the predominant states different from previous, normal behavior. An anomaly may
through anomaly detection. imply an adverse change in the system, for instance, a fluc-
The detection of anomalous states presents a challenging tuation in a jet engine’s turbine rotation frequency, which
task. The detection becomes more complicated than usual if possibly means an imminent failure. An anomaly may also
it is to be done in real-time due to the restrictive features of the mean positive behavior; for instance, many web clicks on
streaming data. Unlike batch learning, where all the historical a new product page imply higher demand. In both cases,
data are available, and no new information is added to the anomalies in data provide an insight into abnormal behavior
models already built, stream learning has five restrictions that that can be translated into potentially useful information.
must be taken into account [5]. i) Streaming data samples The challenge of detecting anomalies –in an industrial
arrive online and can be read at most one time, which is a environment– can be twofold. Firstly, to propose a method
strong restriction for processing them since the system has to understand different data obtained from various sensors,
to decide whether the current data sample is discarded or often with excessive noise. Secondly, to obtain an overview of
archived. ii) Past data samples can only be accessed if stored normal behavior to characterize such behavior from historical
in memory. Otherwise, a forgetting mechanism in charge of data. Therefore, to correctly detect anomalies in a data set,
discarding past samples is applied. iii) Since not all data one must first characterize and define normal data behav-
samples can be stored, a decision made on past samples ior [8]. In addition, normal behavior can be characterized
cannot be undone. iv) The data processing time of each data by the following three stages. (i) Consider data describing
sample should be short and constant. v) The data processing normal behavior through historical data (without considering
algorithm must produce a model equivalent to what a batch anomalies) segmented into different classes according to the
algorithm would produce. context in which they were recorded. (ii) Extract the most
The former five restrictions are why most anomaly detec- frequent behaviors, thus characterizing each class. (iii) Detect
tion algorithms –for batch processing– do not apply to stream anomalies in newly recorded data based on previous knowl-
processing. Nonetheless, there are hybrid approaches that use edge.
batch-learning algorithms to build an initial model as the first In general, anomalies are classified into three types: spe-
step and then apply streaming anomaly-detection algorithms cific, contextual, and collective [9]–[11]. It is considered a
as the second step. point anomaly when this single data point is recognized as
The contribution of this work is the evaluation and com- anomalous concerning the rest of the data. According to [10],
parison of different methods to detect anomalies that, due to these anomalies must be identified before processing or ana-
their performance-control metrics, establish the weight (or lyzing the data.
incidence) of each method in the final combined model, thus
responding better and efficiently to the challenge of real-time • Contextual anomalies are those where the data are con-
anomaly detection. Specifically, the present work combines sidered anomalous in a specific context (e.g., the same
the predicted output of three Machine Learning (ML) mod- sample data are ‘‘normal’’ in a given scenario but anoma-
els: Local Outlier Factor (LOF), One-Class Support Vector lous in another context). These types of anomalies are
Machine (OCSVM), and Autoencoder employing a weighted more common in time-series data flows [10].
average –using as weight the F1 -score value of each model. • Collective anomalies are those that occur when a collec-
The goal of the combined model is the detection of anomalies tion of related data are considered anomalous to the total
in industrial systems in real-time. The proposed hybrid model data. Collective anomalies can also be spatial if they are
was implemented using a data set from a real industrial outside a typical range or temporal, where the value is
system of air-blowing machines. Thus, it can be said that the not outside the typical range. However, the sequence in
proposed hybrid anomaly detection model applies to Industry which it occurs is unusual.
4.0 systems as well as other industrial frameworks where Anomaly detection methods can be distinguished as super-
real-time data acquisition systems are available. vised, semi-supervised and unsupervised. Using one method
The following sections of the article are divided into or another usually depends on the existence or not of descrip-
four sections. The state-of-the-art section shows existing tive labels of the anomaly. The labels can be categorical,

VOLUME 10, 2022 72025


D. Velásquez et al.: Hybrid Machine-Learning Ensemble for Anomaly Detection

e.g., we can have a case of binary or all/nothing labels such TABLE 1. Classification of the different techniques for anomaly
detection [13].
as ‘‘anomalous behavior’’ (1) and ‘‘non-anomalous / normal
behaviour (0)’’, or numerical, e.g., a value of ‘‘anomaly
score’’ ranging from 0 (‘‘non-anomalous / normal’’) to 1
(‘‘totally anomalous’’). While anomaly detection could be
posed as a supervised learning problem, this is –generally–
not the case, as there is often no or little data labeled with the
anomalous behavior [12].
Once the data is available, normally, a series of transfor-
mations of the data needs to be performed before starting the
anomaly detection process [13].
• Aggregation methods: A set of consecutive values from
a time-series data is replaced by a corresponding rep-
resentative value. It provides benefits such as reducing
dimensionality, although it can make detecting anoma-
lies in subsequent steps difficult.
• Discretisation methods: Time-series data are converted
into a discrete sequence of finite alphabets. Techniques
such as symbolic sequence and editing distance can be
applied to detect anomalies.
• Digital Signal Processing (DSP) techniques (such as
Fourier transform, Gabor, and Wavelets filters): Time- The two typologies covered by this technique are para-
series data are transformed into a lower-dimensional metric and non-parametric. The first assumes an underlying
representation of the input data where anomaly detection data distribution. Although somewhat less efficient in finding
can take place. anomalies, the second is preferred because, a priori, it does
A common type of problem detected, which may be present not define any model structure as this is determined from the
in the data, is noise and outliers. Noise among normal data data.
may cause the model not to obtain the desired optimal pre- The most common parametric techniques are divided into
dictions. Outliers are data points that may be caused by noise those based on Gaussian models and those based on regres-
or may have an irregular pattern of behavior. Therefore, this sion models. If a non-parametric approach is to be followed,
unusual behavior must first be identified and decided whether such a classification can be made based on histograms or
it should be considered an anomaly or an outlier. kernels.
Usually, data are created by one or more generation pro- Statistical techniques work well for simple structured data
cesses, representing system’s activities. When the generation with small dimensions and volume. In such cases, sev-
process behaves unusually, it creates anomalies. Therefore, eral methods can be used [13], such as Box-plots, Blum
an anomaly often contains valuable information about the Floyd Pratt Rivest Tarjan (BFPRT) algorithm, and similar
abnormal characteristics of the systems and elements that central-value estimations on data streams; Medcouple and
impact the generation process [11]. Grubbs test (for univariate data); Comparison of distributions
(QQ charts, Kolmogorov-Smirnov test, Kruskal-Wallis test,
and Wilcoxon signed range tests); Auto-regressive techniques
A. CLASSIFICATION OF TECHNIQUES FOR ANOMALY
(Auto-regressive Integrated Moving Average - ARIMA,
DETECTION
Auto-regressive Moving Average - ARMA); ML-based meth-
There are currently six techniques to detect anomalies. These ods; Bayesian networks. Principal Components Analysis
techniques are i) Statistics, ii) Classification, iii) Clustering, (PCA) / Independent Component Analysis (ICA) (e.g.,
iv) Similarity-based, v) Soft Computing, and vi) Knowl- sequence micro-batch analysis).
edge and Combined Techniques based, as explained in [13].
In Table 1, these techniques –and some examples of the 2) CLASSIFICATION BASED ANOMALY DETECTION
algorithms– used can be seen in detail. The most relevant ones TECHNIQUES
for this work will be detailed next.
Classification-based anomaly detection techniques perform
two main stages called training and testing. In the training
1) STATISTICS BASED ANOMALY DETECTION TECHNIQUES phase, the system learns from the available samples and
Statistical techniques adjust a predefined distribution to generates a classifier. In the testing phase, samples that the
a given data and apply statistical inference to determine classifier has not seen are tested to measure the model’s
whether an instance belongs to that model. Instances with a performance. According to the labels available for training,
low probability are reported as anomalies [14]. classifiers can be grouped into two categories: i) one-class

72026 VOLUME 10, 2022


D. Velásquez et al.: Hybrid Machine-Learning Ensemble for Anomaly Detection

and ii) multi-class. Examples of single and multi-class classi- The above is because the Euclidean distance does not work
fiers are neural networks, Bayesian networks, Support Vector well in high-dimensional sets, and measurements such as
Machines (SVM), and decision trees. These, together with Mahalanobis, Hamming, or Chebyshev distances are used
fuzzy logic, are also methods that present a good performance instead. The k-NN algorithm is based on the data score given
in the presence of strong noise [15]–[18]. by the distance to most of the data around it. So, new data
Classification-based techniques have the advantage of are classified according to this score. Although, there are
being able to distinguish between observations that belong some considerations to be taken into account in this type
to different anomalies (instead of an overall class called of technique [13]: i) A shortage of data can be seen as an
‘‘anomaly’’), and their testing phase is quick, as the test anomaly in unsupervised techniques. ii) The performance is a
instance is compared to the predefined model [19]. Although, function of the distance method chosen; therefore, the criteria
classification techniques are based on the availability of must be clear when choosing a metric. iii) It is valid only
assigning labels to various normal and abnormal classes, in cases of low-dimensional data. Defining a measure of the
which is a difficult task. Also, these techniques assign labels distance between instances can be complicated when the data
to test data, which can be a disadvantage when an anomaly dimension is increased.
score is desired. Another essential similarity-based anomaly detection tech-
Classification-based techniques can also be categorized nique is based on relative density rather than distance. This
according to the type of anomaly. Radial-Base Functions technique estimates the neighborhoods’ density so that a
(RBF), SVM, and derivates are commonly used for individual data item in a low-density neighborhood will be anomalous
anomalies. RBFs are very accurate and fast, particularly for while one in a high-density neighborhood will be considered
the supervised classification of individual anomalies. For normal. An existing method for the above is the Local-Outlier
multiple anomalies, Deep Neural Networks (DNN), induction Factor (LOF), which introduces the concept of local outliers
rules, and decision trees are used. DNNs can provide excep- and is based on scoring a data sample according to the
tional recognition rates in static scenarios but can give data average ratio of the neighborhood’s density to the instance’s
problems that vary over time. density [20].

3) CLUSTERING-BASED ANOMALY DETECTION TECHNIQUES B. RELATED WORKS


Clustering techniques are generally divided into two stages: Many studies on anomaly detection in static data sets in
first, the data are grouped with clustering algorithms, and the literature exist. Examples of supervised approaches are
then the degree of deviation is analyzed according to the SVM and Decision Tree [12], or cluster-based methods
results obtained by the clustering [4]. There are some prior such as the Distributed Matching-based Grouping Algorithm
considerations about the data instances in these unsupervised (DMGA) [21]. Other examples use self-adaptive and dynamic
techniques. On the one hand, normal-data samples belong to clustering to learn weights for anomaly detection [22] or
global clusters. On the other hand, anomalies do not belong to statistical methods such as auto-regressive techniques (e.g.,
any defined cluster. In addition, normal data samples are near ARIMA models [23]).
the centroids of the closest cluster, while anomalous data are The problem with these methods is that they are not
further away. Finally, normal-data samples belong to large, designed to process streaming data as they need to have the
dense groups, but anomalies belong to local, small, disparate data set previously stored in the main memory. Therefore,
groups. these traditional techniques have been adapted first and then
Cluster-based methods are applied in both supervised applied to streaming-data environments in many cases.
and unsupervised learning. Most techniques work well for In this sense, Tan et al. [24] propose a fast-anomaly detec-
complex, large-sized, and voluminous data and –optimally– tion of a class that uses only normal data and works well
if the anomalies do not form significant clusters in a when anomalous data are rare. To do this, they use the
short time series. Examples of this type of algorithm are Half-Space Trees (HS-Trees) algorithm. The HS-Trees algo-
k-Means, Shared Nearest Neighbour (SNN), Density-Based rithm presents a set of random HS trees. Each HS tree consists
Spatial Clustering of Applications with Noise (DBScan), of a set of nodes, where each node captures the number of data
Self-Organizing Map (SOM), or Clustering-based Dynamic elements (called mass) within a subspace of the data stream.
indexing Tree (CD-Tree) [4]. The mass is used to profile the degree of an anomaly as it
is quick and straightforward to calculate compared to other
4) SIMILARITY BASED ANOMALY DETECTION TECHNIQUES methods based on distance or density. The tree structure is
These techniques are the most widely used to detect anoma- constructed without any data, making it very efficient as it
lies. One of the techniques, based on similarity, is known does not require restructuring the model once it is running on
as k Nearest Neighbours (k-NN). k-NN is a non-parametric streaming data. HS-Trees only need normal data for training.
method that requires a distance metric to measure the similar- Another technique that is worth mentioning is the
ity between data observations. Although Euclidean distance isolation-Forest Algorithm for Streaming Data
is the most commonly used metric for data with continuous (iForestASD) [25], based on the Isolation-Forest algo-
attributes, it is not usually employed on a practical level. rithm [26]. This method handles streaming data using sliding

VOLUME 10, 2022 72027


D. Velásquez et al.: Hybrid Machine-Learning Ensemble for Anomaly Detection

windows. In this case, the authors start from the ‘‘concept The manufacturing stage or pipeline of the Hybrid
drift’’, which is a common occurrence handling the streaming Anomaly Detection model construction process takes its
of data in dynamic and non-stationary environments pro- name from the manufacturing process of an industrial
ducing a change in the distribution of the data [27]. The machine. At this stage, an ML model is trained on machines’
‘‘concept drift’’ is a problem that occurs when the statistical quality control process data to validate whether the machine
properties of the target variable change over time and the meets its design standards or not [34]. Thus, the objective of
anomaly detection model is no longer compatible with the completing this manufacturing stage model construction task
data the model handles, resulting in less accurate predictions. is double: (i) to use the trained model for detecting machine
Therefore, to maintain the anomaly detection effectively, the design/manufacturing anomalies; (ii) to later deploy it in the
model needs to be retrained and updated based on the new operation stage of the machine when it is integrated into
data the model receives [27]. an industrial production process, for performing a machine
Another research work on anomaly detection is proposed operation anomaly detection task. This model construction
by [28], which is based on an HT (Hoeffding tree). It is manufacturing stage is equivalent to the design phase of a
an inductive-incremental decision-tree algorithm used for classical ML workflow. The metric chosen for measuring
anomaly detection. A handicap of this algorithm is that it models’ performance is the F1 -score of label L. The data
needs class labels to be available for training. set available is a slightly imbalanced (see Table 2 for class
Another work to be highlighted would be that carried out sizes percentage), where more machine’s ‘‘normal data’’ than
by a group of Yahoo researchers [29]. Their system –called ‘‘anomalous data’’ exists, for which the F1 -score metric is
Extensible Generic Anomaly Detection System (EGADS)– considered appropriate. The F1 -score is a value in the [0, 1]
allows precise, flexible, scalable, and extensible detection of range, and it’s calculated as the harmonic mean of the estima-
anomalies, taking into account time series. The system makes tor’s precision and recall with respect to L (see Equation (1))
it possible to separate forecasting, anomaly detection, and 2 × precisionL × recallL
alerts into three separate components. F1 −scoreL = (1)
precisionL + recallL
Finally, another interesting work is that contributed by [30]
in which, through the integration of various technologies, the Finally, models’ F1 -score (F1i )P performance ratio with
development of a disease in the leaf of a Colombian-coffee respect to the sum of all F1 -scores ( j F1j ) (see Equation 2)
variety is evaluated and diagnosed. The project contribution is calculated and used as the weight (wi ) for the weighted
relied on a model ensemble comprising four sub-models average of the prediction done by each model multiplied by
that received the data according to their nature. Once the the computed weights. This weighted average assembles the
prediction of each sub-model was made, its results were Hybrid Anomaly Detection model at the manufacturing stage.
combined, calculating the weighted average. The weight of F1 − scorei
each sub-model was a value associated with its F1 -score value wi = P (2)
j F1 − scorej
in the final model.
Most of the approaches to detect anomalies existing in the The operation stage or pipeline refers to the phase when
literature are based on models that first build a profile of the machine is already running in production; in terms of a
what is ‘‘normal’’ and then point out those instances that do classical ML pipeline, it represents the deployment phase.
not fit that normal profile as anomalies (statistical methods, Thus, this pipeline requires the machine to be able to measure
classification-based methods, or cluster-based methods use the same variables taken at the manufacturing stage through
this approach). industrial sensors. Once these sensors’ data are captured in
A contribution of this work is to build an ensemble model real-time, they are used as inputs for the Hybrid Anomaly
that uses different algorithms that, by combining their results, Detector, already trained during the manufacturing stage.
will generate a new model to detect anomalies. Ensemble This detector will diagnose based on the data received to
learning, either for classification or regression, refers to generate an alarm for the operator in case of an anomaly. This
methods that generate multiple models that are combined to detector can also be tuned in operation through a supervised
make a prediction [31]. Ensembles have been –extensively– action of the operator. If this action is triggered, the data are
used in the last decades as they are considered to provide captured during a time window and labeled as ‘‘normal’’ data.
greater accuracy and increased robustness [32]. Additionally, The models are retrained within the hybrid anomaly detector
multiple ensemble approaches have been proposed, and sev- when the data capture is complete. Once the calibration is fin-
eral studies have reported that model diversity enhances the ished, the system will be able to continue detecting anomalies
ensemble model’s performance as different learners general- in real-time.
ize in different ways [33].
A. MANUFACTURING-STAGE PIPELINE
III. PROPOSED METHODOLOGY As previously mentioned, this stage is executed when the
The proposed ML hybrid pipeline for real-time anomaly machine is in the factory. The proposed pipeline requires
detection, as seen in Fig. 1, consists of two stages: i) the that the manufactured machine goes through a quality control
Manufacturing stage and ii) the Operation stage. process [34], where sensors can capture information about the

72028 VOLUME 10, 2022


D. Velásquez et al.: Hybrid Machine-Learning Ensemble for Anomaly Detection

The previous data set is then divided at random and stratified


into three sets: training, validation, and test. The training set
corresponds to 60% of all the data, where only the normal data
are used to build each ML model with cross-validation, which
allows for testing its intermediate performance and tuning
model hyper-parameters.
For this pipeline, the following three ML algorithms were
used, selected as a result of the authors’ research work on state
of the art relating one-class anomaly detection for real-time
systems, as they present an optimum balance of computation
cost, implementation complexity, and performance [6]–[8],
[12], [19]: i) LOF, which finds anomalous data points using
the local deviation of a given data point to its neighbors [20];
FIGURE 1. Higher-level representation of the proposed Hybrid-ML ii) One-Class SVM (OCSVM), which finds a frontier that
pipeline for Anomaly Detection in real-time.
encloses the vast majority of data (normal data) and new
upcoming data that lay outside the frontier are considered
manufactured machine’s operation during a period of time. abnormal [37], [38]; and iii) Autoencoder, which reduces the
The data captured by the sensors during the quality control input data’s dimensionality by encoding the information to a
process will be called sensor data set. smaller space. From this compressed space, it is decoded to
Once sensors’ data are stored, the data are pre-processed the same dimensions as the original input. The reconstruction
for data cleaning purposes, i.e., those features that the system error in this process determines a possible anomaly [39].
cannot capture with sensors when the machine is in operation Normal data are used for the training because the proposed
are removed. pipeline is designed to identify anomalies based on a single
The pre-processed data are then normalized so that all class for novelty detection, and individual ML models use
features are on the same scale and comparable in later stages unsupervised algorithms.
of the pipeline. A feature selection is then carried out to The validation set, which corresponds to 20% of the data
extract those variables relevant to the study; this step includes set, is used to obtain the definitive performance (in this case,
as a first filter the expert in the domain knowledge, which the F1 -score value) of each trained model. The weights for the
can give an initial selection of what variables should be predictions of each model are then determined as the ratio of
maintained or discarded. Then an automatic algorithm [35] to each F1 -score value (obtained using the validation set). The
remove redundant features is applied. Following the above, weights are stored to be later used for the rounded weighted
a dimensionality reduction is performed using a Principal average of the Hybrid Anomaly Detector component. The test
Components Analysis (PCA) to extract the data’s most rep- set corresponds to the final 20% of the data set and is reserved
resentative characteristics. for measuring the performance of the hybrid anomaly detec-
The next stage is to apply a clustering algorithm, the tor. The manufacturing stage pipeline is shown in Fig. 2.
K-means algorithm, with k = 2, which allows a distinction
between a group of data samples belonging to the transient B. OPERATION-STAGE PIPELINE
state and another group of data belonging to the steady state. This stage is executed when the machine is in operation. The
To correctly label the result of the groups generated by the operating machine generates real-time data from previously
clustering algorithm, the cluster assigned value is first identi- installed sensors during this process, corresponding to the
fied to the sample with the lowest timestamp of the data set. same sensors used in the manufacturing stage. Each execu-
This value will correspond to the Transient Data Group and, tion cycle is pre-processed and delivered to the previously
therefore, all the samples containing this same cluster value obtained hybrid model, giving a diagnosis if the machine is
will correspond to this same state. The rest of the values will in normal condition or if any anomalies should be reported
be labeled as Steady-State Data Group. through an alarm.
It is also proposed for the steady-state data group to apply The operation stage also allows for calibrating the Hybrid
an outlier detection algorithm. In this case, it is proposed Anomaly Detection models required in industrial systems
to use a density-based algorithm called DBSCAN, which is that degrade over time and can be planned (e.g., every time
useful to detect outliers in applications with noise, commonly maintenance is carried out). The operator must verify that the
found in industrial sensor data [36]. machine is in a stable state and under optimal conditions of
Once the data group belonging to the transient state, stable normality and activate the ML models’ calibration routine
state, and outliers (in the stable state) have been identified, to carry out this process. Once this process is activated,
a data set with new labels is generated. Furthermore, a depu- the system will collect data during a period of time, which
ration stage is carried out to obtain the final label for the data will depend on each system’s dynamics. Each data will be
set. The transient state and outliers are labeled with a value stored with the normality label in the data set. This data set
of -1, and the normal stable data is labeled with a value of 1. with normal data is then used to retrain each ML algorithm

VOLUME 10, 2022 72029


D. Velásquez et al.: Hybrid Machine-Learning Ensemble for Anomaly Detection

FIGURE 2. ML manufacturing stage pipeline.

TABLE 2. Air-Blowing machines’ data set characteristics. TABLE 3. Variables pre-processing at manufacturing stage.

with cross-validation. Finally, the newly trained models are The sensors’ data set was composed of the variables mea-
updated in the Hybrid Anomaly Detector. It should be noted sured by sensors installed in each machine in the Quality-
that only the weights (obtained through the F1 -scores) that Control stage. The measured variables were Flow Rate,
were acquired in the manufacturing process are used because, Power, Water Temperature, Nozzle Temperature, Input Pres-
in the operation process, usually, there are no anomalous data sure, Output Pressure, Flow Temperature, Machine Vibra-
to measure this performance. The operation stage pipeline can tions, RPM, Active Power, Cos Phi, Motor Current, Motor
be seen in Fig. 3. Voltage, Ambient Humidity, Ambient Temperature, Atmo-
spheric Pressure.
C. EXPERIMENTAL SETUP The pre-processing step selects the shared variables for
The proposed ML Hybrid real-time anomaly detection the manufacturing and operation stages. The variables’ pre-
pipeline was tested for three different industrial air-blowing processing can be seen in Table 3, with a total of 11 variables
machines from the local industry, with a data set generated by selected (those with ticks in both manufacturing and oper-
the quality-control process, and these machines are currently ation). Additionally, samples with invalid or missing values
operational. were checked and removed from the data set in the pre-
The period for collecting machines’ data is between processing stage.
7 January 2020 and 2 October 2020. The data are recorded Afterward, the pre-processed data set was normalized to
and stored at 2-second intervals. The final data set comprises scale variables’ values, as it is recommended for data prepa-
16 columns (15 variables and timestamps) with 1990 obser- ration in ML since some of the variables have different
vations for Machine A, 2009 observations for Machine B, and ranges [40]. The normalization used for this experiment was
2132 observations for Machine C. The above-mentioned data the Min-Max scaling, which scaled the data to values between
set characteristics are shown in the table 2. 0 and 1.

72030 VOLUME 10, 2022


D. Velásquez et al.: Hybrid Machine-Learning Ensemble for Anomaly Detection

FIGURE 3. ML operation stage pipeline.

TABLE 4. Outlier detection using DBSCAN. TABLE 5. Labelled data sets final samples observations.

explained the variance by 90% for each machine. A clus-


The ‘‘Standard Scaler’’ (Z-score Normalization) was not tering was then performed using k-Means to separate the
used as the normalization method due to two main rea- data between the Transient State and the Steady-state with
sons: i) In the presence of outliers, the ‘‘Standard Scaler’’ k = 2 groups. Furthermore, the Silhouette coefficient was
does not guarantee balanced scales of characteristics due to used to measure the clustering’s quality, presenting a value
the influence of outliers on the calculation of the empirical of 0.6547 for machine A, 0.5895 for machine B presented,
mean and standard deviation, and ii) the ‘‘Standard Scaler’’ and 0.6744 for machine C.
assumes a normally distributed data set, which is not the Once the Transient and Steady-state data groups were sep-
case of our data set. In cases where the distribution is not arated, outliers were detected using DBSCAN in the Steady-
Gaussian or the standard deviation is small, the ‘‘Min-Max’’ state part. For this algorithm, two parameters called minimum
scaling works better [41]. Besides, ‘‘Min-Max’’ preserves samples (min_samples) and epsilon (eps) are required, which
the original distribution, does not significantly change the are assigned to a list of initial values. Then the best values are
information embedded in the original data, and does not found automatically to maximize the Silhouette coefficient.
reduce the importance of outliers. The list of initial values for the three machines are displayed
Following Data Normalization, a Feature-Selection step in equations 3 and 4.
was carried out, where all the data features were vali-
dated with the expert in the domain of the machines tested.
initial_min_samples = [2, 3, 4, 5, 6, 7, 8] (3)
The expert determined that the ‘‘environmental’’ variables
(Ambient Humidity, Temperature, and Atmospheric Pres- initial_eps = [0.010, 0.011, 0.012,
sure) should not be taken into account since they can present a . . . , 0.029, 0.030] (4)
change not necessarily related to the machine’s behavior and
generate information that can disturb the final prediction of The selected DBSCAN parameters, their performance, and
the system. The variable Cos-phi was removed because it had the resulting number of outliers for the three machines are
zero variance. Finally, the motor voltage could be explained shown in Table 4.
through the motor current, and it was removed, as it was Afterward, the labeled data set was created for each
considered redundant. Finally, seven variables remained, and machine. The previously identified Transient group and Out-
none of them had zero variance, so no additional variable liers are labeled as anomalies (‘‘-1’’), and the rest of the
selection step was required. Steady-state group is labeled as normal data (‘‘1’’). The final
A dimensionality reduction was performed using a sample observations of the three labeled data sets are shown
two-component PCA with the selected features, which in Table 5.

VOLUME 10, 2022 72031


D. Velásquez et al.: Hybrid Machine-Learning Ensemble for Anomaly Detection

TABLE 6. Hyper-parameters selection table. TABLE 9. Hyperparameters and F1 -score for each generated submodel of
Machine C.

TABLE 10. Autoencoder’s architecture.


TABLE 7. Hyperparameters and F1 -score for each generated submodel of
Machine A.

TABLE 11. Weights for the predictions of each submodel.

TABLE 8. Hyperparameters and F1 -score for each generated submodel of


Machine B.

is multiplied by its respective weight, this computing the


final classification of the hybrid model. Thus, considering the
weights from Table 10, the output of the hybrid model will
be 0.8. If this value is greater than 0, the hybrid model will
classify it as a normal data point (‘‘1’’).

The labeled data set was then separated into three sets: IV. RESULTS
20% Validation set, 60% Training set (with only normal data), In addition to the pipeline proposed for real-time anomaly
and 20% Test set, as explained in the Manufacturing stage detection, the proposed hybrid model must present improved
pipeline section. For the Training set, a grid search with performance metrics for the individual models. In this case,
cross-validation was performed with five folds (k = 5), the precision, recall, and F1 -score values, as well as the Area
where a set of hyper-parameters for each model was defined Under the ROC Curve (AUC) of all models, were compared.
so that the search algorithm finds the best ones according to
their respective F1 -score. These initial hyper-parameters are A. MANUFACTURING-PIPELINE RESULTS
displayed on Table 6. Three machines were selected corresponding to three differ-
Tables 7, 8, and 9 show the selected hyper-parameters and ent model versions to check that the hybrid models worked
the obtained F1 -score values for the three machines. equally well on heterogeneous equipment.
The last step of the proposed ML pipeline consisted of The confusion matrix allows checking which types of hits
implementing an ensemble of three models: LOF, OCSVM, and errors (type I or false-negative errors and type II or
and Autoencoder, through a weighted average distribution. false-positive errors) the current models have through their
Autoencoder’s architecture is detailed in Table 10. Table 11 different metrics, such as accuracy, precision, sensitivity, and
shows the weights for the predictions of each model, which specificity. Finally, the confusion matrix of the ensemble
were determined as the ratio of each F1 -score value in model was analyzed to check whether it improves the indi-
Tables 7, 8, and 9 with respect to the sum of all F1 -score vidual models’ performance or not. In this respect, we focus
values for each class (‘‘-1’’ and ‘‘1’’). As an illustrative on two metrics: i) Precision: Anomaly data are classified as
example, for a given sample, the LOF model predicted an normal. Also known as the False Positive Rate (FP) or Type
anomaly (-1), the OCSVM predicted normality (1), and the I error. ii) Recall: Normal data are classified as an anomaly,
Autoencoder predicted an anomaly (-1) again, each output also known as False Negative Rate (FN) or Type II error.

72032 VOLUME 10, 2022


D. Velásquez et al.: Hybrid Machine-Learning Ensemble for Anomaly Detection

TABLE 12. Machine A - confusion matrix (test set). TABLE 15. Machine A - metrics table (test set).

TABLE 16. Machine B - metrics table (test set).

TABLE 13. Machine B - confusion matrix (test set).

TABLE 17. Machine C - metrics table (test set).

Tables 15, 16, and 17 show the models’ summary


results, both individually and jointly, using their metrics for
comparison.
As seen in the above tables, the performance obtained by
the hybrid model improves the performance of the individ-
TABLE 14. Machine C - confusion matrix (test set).
ual models. Thus, this justifies integrating models through a
hybrid model using a weighted average improves the whole
pipeline’s final performance. It should also be noted that
the results presented by the Autoencoder are relatively low
compared to the other model; this is because the Autoencoder
operates better for anomaly detection using time windows
and a convolutional network architecture, which is not the
case. The problem of using a convolutional architecture is that
it requires time windows that could add significant delay in
the operation stage and would make it difficult to compare
its metrics to those of the rest of the models due to the
transformation of the training, validation, and testing data that
is needed to be done for being able to use the data with this
type of model.

B. OPERATION PIPELINE RESULTS


The Confusion matrix for machine A, machine B, and The above anomaly detection algorithm would not be useful if
machine C are shown in Tables 12, 13, and 14 respectively. it could not process the trained models smoothly in a standard,
The confusion matrix shows a generalized improvement of real-time operation environment.
the hybrid model’s performance compared to the other mod- In order to measure performance, a data batch comprising
els in all three machines, both for recall and precision. For the 2012 samples was run for all individual models in a common
experiments being analyzed, precision should be maximized computer (8GB RAM and a minimum of Intel Core i5 or
as much as possible since it is indicative of the anomalous equivalent; no graphic card required); the computation time
values detected by the system. needed to get the results was measured. After that, we ran the

VOLUME 10, 2022 72033


D. Velásquez et al.: Hybrid Machine-Learning Ensemble for Anomaly Detection

TABLE 18. Performance results of each model in microseconds. meta-classifiers using different base classifiers such as recur-
rent neural networks, like LSTMs, where time series need
to be considered. Furthermore, a study with a larger number
of machines must be carried out to see how well the hybrid
model generalizes against the individual sub-models. In cases
where the hybrid model does not provide any improvement,
same data for the hybrid model and analyzed the computation other ensemble strategies such as taking the best of the indi-
time needed to process the data. The results are presented in vidual sub-models are considered.
table 18. Finally, as this project focuses on single-type anomaly
As expected, the hybrid model was slower than the indi- detection, a challenge to be addressed in future work will be
vidual ones. Nevertheless, its time response is still over the to be able to classify or categorize different types of faults.
real-time response threshold defined for a run-of-the-mill For that, the authors might use appropriate methods such as
computer of 2020 (under 200 milliseconds in the worst loop explainable ML or correspondingly labeled datasets.
of the batch analysis), thus achieving the objective established ACKNOWLEDGMENT
for the operation stage: real-time anomaly detection. The authors would like to thank the Vicomtech Foundation
for providing the necessary resources for the proper execution
V. CONCLUSION
of this research project and University EAFIT for the research
This research work has developed and presented a Hybrid
grant awarded to the principal author.
Machine-Learning Ensemble for Anomaly Detection for a
Real-Time Industry 4.0 System. This ensemble consists of REFERENCES
implementing two stages inspired by a standard industrial [1] M. Xu, J. M. David, and S. H. Kim, ‘‘The fourth industrial revolu-
tion: Opportunities and challenges,’’ Int. J. Financial Res., vol. 9, no. 2,
system: i) A Manufacturing Stage and ii) An Operation pp. 92–95, 2018.
Stage. Up to our knowledge, there are no other ML meth- [2] M. Reis and G. Gins, ‘‘Industrial process monitoring in the big
ods that consider these industrial stages. The ensemble sys- data/industry 4.0 era: From detection, to diagnosis, to prognosis,’’
Processes, vol. 5, p. 35, Jun. 2017. [Online]. Available: http://www.
tem was tested on three machines, presenting an increased mdpi.com/2227-9717/5/3/35
F1 -score value and AUC concerning individual ML sub- [3] S. H. An, G. Heo, and S. H. Chang, ‘‘Detection of process anomalies using
models (LOF, OCSVM, and Autoencoder). The ensemble an improved statistical learning framework,’’ Expert Syst. Appl., vol. 38,
no. 3, pp. 1356–1363, Mar. 2011.
model for Machine A presented a F1 -score value of 0.904 for [4] A. Boukerche, L. Zheng, and O. Alfandi, ‘‘Outlier detection: Methods,
anomalies (-1), a F1 -score value of 0.944 for normal data models, and classification,’’ ACM Comput. Surv., vol. 53, no. 3, pp. 1–37,
(1), and an AUC value of 0.913; the ensemble model for May 2021.
[5] J. A. Silva, E. R. Faria, R. C. Barros, E. R. Hruschka, A. C. D. Carvalho,
Machine B presented a F1 -score value of 0.890 for anomalies and J. Gama, ‘‘Data stream clustering: A survey,’’ ACM Comput. Surv.,
(-1), a F1 -score value of 0.946 for normal data (1), and an vol. 46, no. 1, pp. 1–31, 2013.
[6] S. Ahmad, A. Lavin, S. Purdy, and Z. Agha, ‘‘Unsupervised real-
AUC value of 0.905; finally, the ensemble model for Machine time anomaly detection for streaming data,’’ Neurocomputing, vol. 262,
C presented a F1 -score value of 0.887 for anomalies (-1), pp. 134–147, Nov. 2017.
a F1 -score value of 0.889 for normal data (1), and an AUC [7] V. Chandola, V. Mithal, and V. Kumar, ‘‘Comparative evaluation of
anomaly detection techniques for sequence data,’’ in Proc. 8th IEEE Int.
value of 0.897. Conf. Data Mining, Dec. 2008, pp. 743–748.
The proposed system allows vertical scaling in the number [8] J. Rabatel, S. Bringay, and P. Poncelet, ‘‘Anomaly detection in monitoring
sensor data for preventive maintenance,’’ Expert Syst. Appl., vol. 38, no. 6,
of algorithms used for the ensemble. As seen in section pp. 7003–7015, Jun. 2011.
Results, subsection B, the hybrid model presented a maxi- [9] V. Vercruyssen, W. Meert, G. Verbruggen, K. Maes, R. Baumer, and
mum computation time of approximately 190 milliseconds, J. Davis, ‘‘Semi-supervised anomaly detection with an application to water
analytics,’’ in Proc. IEEE Int. Conf. Data Mining (ICDM), Nov. 2018,
fast enough for real-time anomaly detection. Concerning pp. 527–536.
individual models’ performance, the Autoencoder results [10] M. Fahim and A. Sillitti, ‘‘Anomaly detection, analysis and
showed a low F1 -score value, so it is proposed to test prediction techniques in IoT environment: A systematic literature
review,’’ IEEE Access, vol. 7, pp. 81664–81681, 2019. [Online].
other algorithms (e.g., Isolation Forest, Elliptic Envelope) Available: https://ieeexplore.ieee.org/document/8733806, doi: 10.1109/
to improve the overall performance of the whole assembly. ACCESS.2019.2921912.
[11] B. R. Priyanga and D. Kumari, ‘‘A survey on anomaly detection
However, a study of the computational cost linked to the
using unsupervised learning techniques,’’ Int. J. Creative Res. Thoughts
retraining of more types of algorithms must be carried out. (IJCRT), vol. 6, no. 2, pp. 2320–2882, 2018. [Online]. Available:
Future work is proposed to study system retraining in http://www.ijcrt.org/papers/IJCRT1812118.pdf
[12] V. Chandola, A. Banerjee, and V. Kumar, ‘‘Anomaly detection for
the Operation Stage pipeline and its computational cost. discrete sequences: A survey,’’ IEEE Trans. Knowl. Data Eng.,
It is also proposed to study the proposed system devel- vol. 24, no. 5, pp. 823–839, 2012. [Online]. Available: https://
oped on machines with different levels of degradation. Addi- ieeexplore.ieee.org/document/5645624, doi: 10.1109/TKDE.2010.235.
[13] A. I. Rana, G. Estrada, M. Sole, and V. Muntes, ‘‘Anomaly detection
tionally, a data imputation study should be carried out to guidelines for data streams in big data,’’ in Proc. 3rd Int. Conf. Soft Comput.
generate synthetic samples for systems where some infor- Mach. Intell. (ISCMI), Nov. 2016, pp. 94–98.
mation is missing (a loss of data due to communication [14] M. Hubert and E. Vandervieren, ‘‘An adjusted boxplot for skewed
distributions,’’ Comput. Statist. Data Anal., vol. 52, no. 12,
breakdowns is a common problem in industrial systems). pp. 5186–5201, 2008. [Online]. Available: http://www.sciencedirect.
Deep Learning techniques could be considered when creating com/science/article/pii/S0167947307004434

72034 VOLUME 10, 2022


D. Velásquez et al.: Hybrid Machine-Learning Ensemble for Anomaly Detection

[15] S. Agrawal and J. Agrawal, ‘‘Survey on anomaly detection using [39] T. Amarbayasgalan, B. Jargalsaikhan, and K. Ryu, ‘‘Unsupervised novelty
data mining techniques,’’ Proc. Comput. Sci., vol. 60, pp. 708–713, detection using deep autoencoders with density based clustering,’’ Appl.
Jan. 2015. [Online]. Available: https://www.sciencedirect.com/science/ Sci., vol. 8, no. 9, p. 1468, Aug. 2018.
article/pii/S1877050915023479 [40] A. Zheng and A. Casari, Feature Engineering for Machine Learning:
[16] S. Ferreiro, B. Sierra, I. Irigoien, and E. Gorritxategi, ‘‘A Bayesian network Principles and Techniques for Data Scientists, 1st ed. Sebastopol, CA,
for burr detection in the drilling process,’’ J. Intell. Manuf., vol. 23, no. 5, USA: O’Reilly Media, 2018.
pp. 1463–1475, Oct. 2012, doi: 10.1007/s10845-011-0502-z. [41] D. Freedman, R. Pisani, and R. Purves, Statistics: Fourth Inter-
[17] B. Sierra, E. Lazkano, E. Jauregi, and I. Irigoien, ‘‘Histogram distance- national Student Edition (International Student Edition). New York,
based Bayesian network structure learning: A supervised classification NY, USA: W. W. Norton & Company, 2007. [Online]. Available:
specific approach,’’ Decis. Support Syst., vol. 48, no. 1, pp. 180–190, https://books.google.es/books?id=mviJQgAACAAJ
Dec. 2009.
[18] Y. Yuan, S. Li, X. Zhang, and J. Sun, ‘‘A comparative analysis of SVM,
naive Bayes and GBDT for data faults detection in WSNs,’’ in Proc.
IEEE Int. Conf. Softw. Qual., Rel. Secur. Companion (QRS-C), Jul. 2018, DAVID VELÁSQUEZ received the B.S. degree
pp. 394–399. in mechatronics engineering from the Univer-
[19] V. Chandola, A. Banerjee, and V. Kumar, ‘‘Anomaly detection: A sur- sity Escuela de Ingeniería de Antioquia (EIA),
vey,’’ ACM Comput. Surv., vol. 41, no. 3, pp. 1–58, Jul. 2009, doi:
in 2011, and the master’s degree in engineering
10.1145/1541880.1541882.
[20] M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, ‘‘LOF: Identi- from Universidad EAFIT, with emphasis on tech-
fying density-based local outliers,’’ ACM SIGMOD Rec., vol. 29, no. 2, nical systems integrated design, in 2014. He is
pp. 93–104, Jun. 2000. currently pursuing the Ph.D. degree in informat-
[21] P.-Y. Chen, S. Yang, and J. A. McCann, ‘‘Distributed real-time anomaly ics with the University of the Basque Country,
detection in networked industrial sensing systems,’’ IEEE Trans. Ind. Spain, in collaboration with research projects from
Electron., vol. 62, no. 6, pp. 3832–3842, Jun. 2015. the VICOMTECH Research Center. He is also
[22] S. Lee, G. Kim, and S. Kim, ‘‘Self-adaptive and dynamic clustering working as an Assistant Professor with the Department of Systems and
for online anomaly detection,’’ Expert Syst. Appl., vol. 38, no. 12, Informatics Engineering and as a Researcher with the TICs Development
pp. 14891–14898, Nov. 2011. and Innovation Research Group (GIDITIC) and the Design Engineering
[23] E. H. M. Pena, S. Barbon, J. J. P. C. Rodrigues, and M. L. Proenca, Research Group (GRID), Universidad EAFIT. His research interests include
‘‘Anomaly detection using digital signature of network segment with adaptive systems control design, mechatronics design, industry 4.0, machine
adaptive ARIMA model and paraconsistent logic,’’ in Proc. IEEE Symp. learning, computer vision, electronics optimization, embedded systems,
Comput. Commun. (ISCC), Jun. 2014, pp. 1–6.
the Internet of Things implementation, and biomedical signal processing
[24] S. C. Tan, K. M. Ting, and T. F. Liu, ‘‘Fast anomaly detection for streaming
data,’’ in Proc. IJCAI Int. Joint Conf. Artif. Intell., 2011, pp. 1511–1516.
applications.
[25] N. Ding, H. Ma, H. Gao, Y. Ma, and G. Tan, ‘‘Real-time anomaly detection
based on long short-term memory and Gaussian mixture model,’’ Comput.
Electr. Eng., vol. 79, Oct. 2019, Art. no. 106458.
[26] F. T. Liu, K. M. Ting, and Z. H. Zhou, ‘‘Isolation-based anomaly detec- ENRIQUE PÉREZ received the graduate degree
tion,’’ ACM Trans. Knowl. Discovery Data, vol. 6, no. 1, pp. 1–39, 2012. in information technology engineering from the
[27] J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bouchachia, Universidad Nacional de Educación a Distancia
‘‘A survey on concept drift adaptation,’’ ACM Comput. Surv., (UNED), in 2019. He is currently pursuing the
vol. 46, no. 4, Mar. 2014, Art. no. 44. [Online]. Available: master’s degree in data science with the Univer-
https://dl.acm.org/doi/10.1145/2523813, doi: 10.1145/2523813. sitat Oberta de Catalunya (UOC), carrying out
[28] G. Hulten, L. Spencer, and P. Domingos, ‘‘Mining time-changing data the external end of master’s work in the field of
streams,’’ in Proc. 7th ACM SIGKDD Int. Conf. Knowl. Discovery Data artificial intelligence, developing a proposal for
Mining (KDD), 2001, pp. 97–106. intelligent services for industrial blowers with the
[29] N. Laptev, S. Amizadeh, and I. Flint, ‘‘Generic and scalable framework for VICOMTECH Research Center, Data Intelligence
automated time-series anomaly detection,’’ in Proc. 21st ACM SIGKDD for Energy and Industrial Processes Department, through an educational
Int. Conf. Knowl. Discovery Data Mining, Aug. 2015, pp. 1939–1947.
cooperation agreement between the university and company. He carried
[30] D. Velásquez, A. Sánchez, S. Sarmiento, M. Toro, M. Maiza, and B. Sierra,
‘‘A method for detecting coffee leaf rust through wireless sensor networks,
out the end-of-degree project (PFG) in the field of machine learning (ML)
remote sensing, and deep learning: Case study of the caturra variety in associated with predictive maintenance in industry 4.0 environments. His
Colombia,’’ Appl. Sci., vol. 10, no. 2, p. 697, Jan. 2020. research interests include the field of machine learning and deep learning
[31] J. Mendes-Moreira, C. Soares, A. M. Jorge, and J. F. D. Sousa, ‘‘Ensemble (DL), creation of predictive models through advanced analytics in the Indus-
approaches for regression: A survey,’’ ACM Comput. Surv., vol. 45, no. 1, trial Internet of Things (IIoT) systems, data visualization, and its practical
p. 10, Nov. 2012. application for the industry 4.0.
[32] N. Garcia-Pedrajas, C. Hervas-Martinez, and D. Ortiz-Boyer, ‘‘Coopera-
tive coevolution of artificial neural network ensembles for pattern classifi-
cation,’’ IEEE Trans. Evol. Comput., vol. 9, no. 3, pp. 271–302, Jun. 2005.
[33] J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas, ‘‘On combining classi- XABIER OREGUI received the Ph.D. degree in
fiers,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 3, pp. 226–239, telecommunications engineering from the Cen-
Mar. 1998. tro de Estudios e Investicación Técnicas (CEIT),
[34] H. Judi, R. Jenal, and D. Genasan, Quality Control Implementation in University of Navarra, more precisely in the
Manufacturing Companies: Motivating Factors and Challenges. London, area of electronics and communications, where
U.K.: IntechOpen, Apr. 2011, ch. 25. he researched about multi-source virtual machine
[35] A. Jovic, K. Brkic, and N. Bogunovic, ‘‘A review of feature selection meth- management and automatic scaling. After a
ods with applications,’’ in Proc. 38th Int. Conv. Inf. Commun. Technol., small recess from research, where he spent
Electron. Microelectron. (MIPRO), May 2015, pp. 1200–1205.
his acquired knowledge developing educational
[36] E. Schubert, J. Sander, M. Ester, H. Kriegel, and X. Xu, ‘‘DBSCAN
games and ‘‘serious-games’’ for the company
revisited, revisited: Why and how you should (still) use DBSCAN,’’ ACM
Trans. Database Syst., vol. 42, no. 3, pp. 1–21, 2017. Ikasplay, in 2016, he camed back to the research world in Vicomtech to the
[37] B. Schölkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and area of data intelligence and industrial processes. Back on his incorporation
R. C. Williamson, ‘‘Estimating the support of a high-dimensional on Vicomtech, he has been working on multiple projects focused on data
distribution,’’ Neural Comput., vol. 13, no. 7, pp. 1443–1471, Jul. 2001. management on industrial environments using different kinds of protocols,
[38] D. M. Tax and R. P. Duin, ‘‘Uniform object generation for optimizing one- and projects oriented on the management and analysis on big data and the
class classifiers,’’ J. Mach. Learn. Res., vol. 2, pp. 155–173, Dec. 2002. visualization of that information from that same environment.

VOLUME 10, 2022 72035


D. Velásquez et al.: Hybrid Machine-Learning Ensemble for Anomaly Detection

ARKAITZ ARTETXE received the degree in com- MAURICIO TORO received the B.S. degree in
puter engineering and the M.Sc. degree in compu- computer science and engineering from Pontificia
tational engineering and intelligent systems from Universidad Javeriana, Colombia, in 2009, and the
the University of the Basque Country (UPV/EHU), Ph.D. degree in computer science from the Univer-
San Sebastian, in 2011 and 2014, respectively, and sité de Bordeux, France, with emphasis on artifcial
the Ph.D. degree in computer science with empha- intelligence, in 2012. He has been a Postdoctoral
sis on the application of knowledge engineering Fellow with the Computer-Science Departament,
and machine learning to the medical domain from University of Cyprus, since 2013. Since 2014,
the University of the Basque Country, in 2017. he has been working as an Assistant Professor
Since 2011, he has been working as a Researcher with the Department of Systems and Informatics
in the field of biomedical applications with the Technological Centre Engineering and as a Researcher with the TICs Development and Innova-
Vicomtech. Since 2018, he has been working as a Researcher with the Data tion Research Group (GIDITIC), Universidad EAFIT. His research inter-
Intelligence for Energy and Industrial Processes Department, Vicomtech. His ests include artificial intelligence, industry 4.0, machine learning, computer
research interests include machine learning, imbalanced classification, and vision, and agricultural applications.
data fusion techniques in the context of industry 4.0.

MIKEL MAIZA received the degree in automatic


JORGE MANTECA is currently a Senior Industrial engineering and industrial electronics from the
Engineer (specialty mechanics, machines) with University of Mondragon, in 2000, and the Ph.D.
ETSIIG, University of Oviedo. He is also the Tech- degree from the University of York, U.K., in 2003,
nical Director of MAPNER, a company dedicated with emphasis on parallel computing for real-time
to the manufacture of machinery in the field of systems. He was an External Professor with the
compressors and vacuum pumps. As main func- University of Mondragon, from 2002 to 2004,
tions, he is an in-charge of technical support in an Associate Professor with the School of Engi-
commercial tasks, management and implementa- neering, University of Navarra, from 2009 to 2013,
tion of resources for the organization of the differ- and an Associate Professor with the Department of
ent departments of the company, and management Applied Mathematics, University of the Basque Country, from 2015 to 2017.
and coordination of the technical office, research, optimization of equipment He has been collaborating as an External Professor with the Ecole Supérieure
performance, and development of new products. His previous experience des Technologies Industrielles Avancées (ESTIA), since 2017. Since 2016,
includes his participation in a research project on the behavior of cryogenic he has been working as a Senior Researcher with the Technological Centre
fluids at CERN (European Center for Particle Physics Research), Geneva, Vicomtech, Data Intelligence for Energy and Industrial Processes Depart-
Switzerland. He has published several scientific papers, including the publi- ment. His research interests include parallel processing systems, heuristic
cation in the international conference on cryogenics in Anchorage (Alaska), algorithms and stochastic techniques of mathematical optimization and their
in 2003:‘‘CONCLUSION OF THE HE SPILL SIMULATIONS IN THE integration with artificial intelligence techniques, for building stochastic
LHC TUNNEL.’’ models aimed at the simulation and optimization of processes and systems,
and applications, such as data mining, pattern recognition, and automatic
learning or early fault detection.

JORDI ESCAYOLA MANSILLA received the


bachelor’s and master’s degrees in statistics
and operations research, the master’s (Executive) BASILIO SIERRA is currently a Full Profes-
degree in business administration, and the Ph.D. sor with the Computer Sciences and Artificial
degree in economics and business. He has been Intelligence Department, University of the Basque
working in data science and predictive analyt- Country (UPV/EHU). He is also the Co-Director
ics for the last 11 years in different industries of the Robotics and Autonomous Systems Group,
(insurance, banking, pharma, and public sector) RSAIT. He is also a Researcher in the fields
and consulting, which includes a relevant multina- of robotics and machine learning, where he is
tional experience helping organizations and gov- working on the use of different paradigms to
ernments. Since 2017, he has been a fellow of the prestigious organization improve robot’s behaviors. He works as well in
Betta Gamma Sigma, which represents one of the highest honor institutions multidisciplinary applications of machine learning
in business worldwide. He is currently the Consultancy Manager and the paradigms, in agriculture, natural language processing, and medicine. He has
Practice Leader in data science, an Associate Professor in probability and published more than 50 journal articles, and several book chapters and
statistics with the Universitat Politècnica de Catalunya (UPC), and an Asso- conference papers.
ciate Professor with the Universitat Oberta de Catalunya (UOC).

72036 VOLUME 10, 2022

You might also like