0% found this document useful (0 votes)
19 views12 pages

Bypassing Data Issues of A Supply Chain Simulation Mode 2020 Procedia Manufa

Uploaded by

Anoop Prajapati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views12 pages

Bypassing Data Issues of A Supply Chain Simulation Mode 2020 Procedia Manufa

Uploaded by

Anoop Prajapati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Available online at www.sciencedirect.

com

ScienceDirect
Procedia Manufacturing 42 (2020) 132–139

International Conference on Industry 4.0 and Smart Manufacturing (ISM 2019)

Bypassing Data Issues of a Supply Chain Simulation Model in a


Big Data Context
António A. C. Vieiraa,*, Luís Diasa, Maribel Y. Santosa, Guilherme A. B. Pereiraa, José Oliveiraa
a
ALGORITMI Research Centre, University of Minho, 4804-533, Portugal

* Corresponding author. E-mail address: [email protected]

Abstract

Supply Chains (SCs) are complex and dynamic networks, where certain events may cause severe problems. To avoid them, simulation can be
used, allowing the uncertainty of these systems to be considered. Furthermore, the data that is generated at increasingly high volumes, velocities
and varieties by relevant data sources allow, on one hand, the simulation model to capture all the relevant elements. While developing such
solution, due to the inherent use of simulation, several data issues were identified and bypassed, so that the incorporated elements comprise a
coherent SC simulation model. Thus, the purpose of this paper is to present the main issues that were faced, and discuss how these were bypassed,
while working on a SC simulation model in a Big Data context and using real industrial data from an automotive electronics SC. This paper
highlights the role of simulation in this task, since it worked as a semantic validator of the data. Moreover, this paper also presents the results that
can be obtained from the developed model.

© 2020 The Authors. Published by Elsevier B.V.


This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review under responsibility of the scientifi committee of the International Conference on Industry 4.0 and Smart Manufacturing.

Keywords: Simulation; Supply Chain; Big Data; Data issues; Industry 4.0

1. Introduction research that was conducted to define the data requirements of


the BDW was published in [4]. In its turn, a prototype of the
Supply Chains (SCs) are complex and dynamic systems simulation model, which was used to validate the set of
where a proper assessment of their performance is hard to variables selected for the project was also published in [5].
quantify. Simulation can be used as decision-making tools of Having validated such data model, the next step in the project
SC systems, allowing alternative scenarios to be tested, was to complement the simulation model, so that it is capable
performance measures to be determined, or simply to animate of using data provided by the BDW.
the logistics flows, enhancing the knowledge discovery from Notwithstanding, several data issues were faced when
raw data. However, SC processes generate huge amounts of providing the real data, stored in the BDW, to the simulation
data, nowadays referred to as Big Data. Thus, such decision- model. Such issues were verified in the organization hosting
making tools benefit from Big Data structures, which provide this research, despite its technological conditions, such as the
quality and integrated data for SC simulation models [1], [2]. advanced Information Systems (IS) and Enterprise Resource
Aligned with the above, such artifact is currently being Planning (ERP), as well as despite being a flagship in its
developed at an organization of the automotive electronics industry sector with reference business processes.
industry sector. The solution integrates a Big Data Warehouse In fact, the subject of facing data issues while working on
(BDW) [3], which supports the SC simulation model, by simulation projects and using real industrial data is not new, as
extracting raw data from selected data sources, transforming it Bokrantz et al. [6] corroborated. The authors presented a
into quality data and providing it to the simulation model. The multiple case study within the automotive industry to provide
2351-9789 © 2020 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review under responsibility of the scientifi committee of the International Conference on Industry 4.0 and Smart Manufacturing.
10.1016/j.promfg.2020.02.033
António A.C. Vieira et al. / Procedia Manufacturing 42 (2020) 132–139 13
3
empirical descriptions of data quality problems in simulation integrate data
projects. As the authors postulated, simulation requires high
quality data and, often, extensible transformations to allow its
utilization in simulation models, i.e., data issues must be
bypassed, in order to produce a coherent simulation model.
In light of the above, the purpose of this paper is to address
the data issues that were faced while developing a SC
simulation model in a Big Data context, since Big Data
concepts and technologies were used in this project. Such
issues are identified and the corresponding approaches that
were conducted to bypass them are presented. With this work,
the authors believe that researchers focusing on similar
problems and facing similar difficulties will find the shared
approaches and conclusions of this research helpful.
This paper is structured as follows. Section 2 summarizes
the related work. Section 3 provides a brief description of the
SC system considered in this paper, as well as the main
development stages that were conducted in this project.
Section 4 details the main data issues that were faced, and
which needed to be bypassed, in order to produce a coherent
SC simulation model. In its turn, section 5 illustrates
examples of results that can be obtained from such simulation
model. Finally, the main conclusions and future work are
discussed in section 0.

2. Related Work

The need to improve industrial processes is, in fact, one of


the main goals of Industry 4.0 as emphasized by Kagermann et
al. [7]. Such improvement may involve several methods, with
the authors stressing the use of simulation to analyze the
behavior of complex systems like SCs, including potential
crisis scenarios. The authors also noted the importance of
using Big Data in conjunction with such solutions, as it
allows data from several data sources to be considered in the
model.
Vieira et al. [8] reviewed simulation studies closely related
with the concept of Industry 4.0, in order to identify the
boiling research directions for simulation in this industrial
revolutionary movement. According to the authors, such
studies include the use of Big Data technologies applied to SC
problems, due to the possibility of capturing the detail of
processes that Big Data allows, along with the ability to
consider alternative scenarios that simulation offers.
Zhong et al. [2] outlined the current movements on the
application of Big Data for Supply Chain Management (SCM).
According to the authors, the increasing volume of data in the
several SC sectors is a challenge, which requires tools to make
full use of the data, with Big Data emerging as a discipline
capable of providing solutions for analysis, knowledge
extraction, and advanced decision-making.
According to Tiwari et al. [1], analytics in SCs, including
simulation, is not new. However, the advent of Big Data
presents as an opportunity for its use with such analytics
methods (e.g. simulation). The authors stressed the importance
of such duo in predictive and prescriptive analytics, with
simulation being used in the former to predict future events
and in the latter to enhance the decision-making process.
As the cited works suggest, and to the best of the authors’
knowledge, a gap can be identified in literature, which
concerns the existence of Big Data structures to store and
1 António A.C. Vieira et al. / Procedia Manufacturing 42 (2020) 132–
from several sources, with the end goal of providing such organizational processes and relevant variables to include in
data to a SC simulation model. As such solution is currently the BDW and in the simulation model; making sure that no
being developed by the authors, this paper builds on the important data is excluded; helped in the definition of
identified gap, by contributing with the approaches the BDW model, namely the Hive tables to use.
conducted to bypass the data issues found while developing
such solution.

3. Materials and Methods

This section starts with a description of the SC in


analysis, to convey the complexity associated to the
problem. Thereafter, second subsection describes the
development stages of the project, to provide the
approaches used for the problem.

3.1. System Characterization

This project is being developed at a plant of the


automotive industry sector, which produces electronic
components. This subsection describes the SC at hand, to
give a perspective of the scale and complexity of the
network in analysis.
The plant considers roughly 7 000 different types of
materials which are actively being supplied by around
500 different suppliers, which are located in more than 30
countries from around the world. After analyzing the
obtained data, and during the time frame considered in such
data, it was possible to observe that most suppliers were
from Europe and Asia, with Germany and Netherlands
having more suppliers and shipments from Europe, and
Malasya, Taiwan, China, Hong Kong and Singapore
having more shipments from Asia. All suppliers shipped
more than 200 000 deliveries.

3.2. Methods

It is widely accepted that SCs generate huge amounts


of data, leveraging the need for Big Data technologies [1],
[2], [8], which were used in this project. It should also be
noted that the Big Data cluster of the organization is also
being used.
SC activities may differ between companies,
geographic locations, businesses and industry sectors.
Thus, this work started by studying the processes
associated to the SC at hand. For this purpose, interviews
and trainings with process specialists were helpful to
gain insights from them. Internal documents of the
organization were also analyzed at this stage. To develop a
BDW, despite being in a Big Data context, where data
models are not usually a main concern in terms of providing
an overall and integrated view of the data, it is still
important to start by analyzing the data requirements of
the BDW system, namely its elicitation. This was done by
applying user-, goal- and data-driven approaches in
conjunction, in order to have all the relevant perspectives. It
should be noted that this was continuously done throughout
the development phase in successive iterations, as described
in detail in [4]. By doing this step, the following major
benefits were achieved, as corroborated by previous
studies [9], [10]: better understanding of the data,
António A.C. Vieira et al. / Procedia Manufacturing 42 (2020) 132–139 13
5
After selecting the relevant variables, the data profiling is Production, only to remain stored in supermarkets, waiting
conducted, where its quality is assessed, to determine the hours or even days to be consumed. Second, the consequences
necessary transformations. Such data profiling techniques of customers’ order variability cannot be efficiently measured,
allow to verify, for instance, the existence of null values, the as it is different to change the order quantity of a certain raw
distribution of values, and the quality of categorical values. material than changing the order quantity of a finished good,
Thereafter, ETL (Extract, Transform, Load) jobs are which is comprised by critic and not-so-critic raw materials.
developed, which extract data from data sources, compute Third, a shortage of a material that is used in a relevant
eventual transformations that are identified after assessing the number of finished goods should have a greater impact than
data quality, and send the transformed data to HDFS (Hadoop the shortage of a material that is used in fewer finished goods.
Distributed File System) of the Hadoop ecosystem. The next Such difference is hardly felt if the scope only considers
step consists in defining the schema of the BDW and load the the raw materials. However, the mentioned cons, such data
necessary data, so that it can provide the required data to the is crucial since the end goal of a SC is to fulfill its
simulation model. See [4] and [11] for more details regarding customers’ needs. Thus, to bypass this lack of data, the
the Big Data concepts and tools that were used. Production’s orders were used as the demand that stimulates
The last step of the project consists in using the data in the the simulated SC to operate.
simulation model, which was developed in SIMIO [12].
However, despite the data profiling phase that was conducted,
the authors experienced several data issues, which needed to 4.2. Data of orders to suppliers
be handled, to maintain the coherence of the simulation
model. Such data issues are addressed in next section. In the data profiling phase, it was not possible to obtain the
Around 3GB of data were considered, corresponding to a date of roughly 27,8% of the orders to suppliers, and roughly
year of data and roughly 8 000 000 rows of data. This volume 0,5% of the order dates had to be altered, since they had an
only considered the data integrated in the BDW and no other arrival date prior to the order date, which would bring several
data that could not be included after analyzing its quality. problems for the simulation model. After discussing this issue
with process experts, this was solved by subtracting a constant
4. Data Issues: Simulation Model Coherence value to the arrival date, which corresponds to the estimated
lead time of suppliers operating in similar circumstances.
Traditional data profiling techniques aim to verify the
quality of data. However, this is done at a syntactic level, by
evaluating aspects such as checking for null values or errors. 4.3. Suppliers’ locations
In a simulation project, such data profiling techniques are
required, albeit limited, as a simulation model needs not only Some values of the city of suppliers cannot be used, due to
quality data, but also coherent data, so that the result is an data problems that were identified in the data profiling phase.
equally coherent model. Thus, in some cases, data needs to be This was observed even though the values are provided by
estimated; on other occasions, data sources of certain relevant SAP. Thus, whenever it was possible to use the city
business processes simply do not exist but need to be coordinates, these were used. However, the simulation requires
incorporated in the simulation model in some way. Hence, in a location for every supplier, even for those without a city in
such situations, it is important to involve process experts, as SAP. In these cases, the country’s geographic location was
well as query the available data, in order to identify reasonable used. Moreover, the geographic coordinates of cities and
approaches to bypass the identified data issues. countries had to be generated, since these are not contained in
The purpose of this section is to present the data issues that any of the organization’s Information Systems.
were faced, mentioning the importance of handling them, Finally, to visualize all orders sent to the same location, a
while also emphasizing the approaches that were adopted to small deviation to their coordinates was applied. This ensures
bypass them. The following provides such discussion. that all entities, in their respective locations, can be visualized
when viewing the model running. These location changes and
the above discussed estimation do not affect the simulation
4.1. Customers’ orders results, as the respective distance and travel duration of the
associated entities, between the source and destination
To assess the impact of customers’ orders variability, the locations, are considered when calculating the lead time.
quantity of each finished good ordered by the end customers,
as well as the respective delivered quantity, are crucial. This is
an information that could not be provided by the Logistics 4.4. Travel mode
Department hosting this research, as it is considered sensitive
purchasing information, since the Department is solely The travel model for some orders could not be obtained,
responsible for providing the raw material needs to the because the supplier does not exist in the data source. Thus,
Production. To bypass this lack of data, such material needs with the help of process experts, some rules were
can be considered. However, this approach is limited in some implemented. Nevertheless, these rules only set the symbol of
ways. First, as field observations and the interviewed the entity, not affecting the transportation durations, neither
managers suggested, it is common for raw materials to be the results.
required by
1 António A.C. Vieira et al. / Procedia Manufacturing 42 (2020) 132–

4.5. Transit and lead times The organization measures its stock level by assessing the
percentage of occupied bins. However, while running the
Some transit time durations preclude the arrival of orders at simulation model, several cases in which these premises were
the date specified in the data. In these situations, both transit not verified could be observed. Thus, since simulation allowed
and lead time were estimated. Moreover, these problems were to discover these problems in data, it would also be interesting
handled in conjunction, since transit time can be considered to use it to understand the scale of these problems. Thus, the
part of total lead time. In light of this, first, it is verified if simulation model was used to record the percentage of
there is a transit time specified for a given entity. If not, it is movements that do not follow the storage strategy of the plant
estimated based on the transit times of other suppliers from the and the obtained results can be seen in Fig. 1. These
same country. Afterwards, it is verified if the transit time percentages were registered after the first move to each bin.
allows the associated entities to arrive to the plant at the As the figure shows, the percentage in the movements out
date in the data. If the durations are not adequate, the lead of the warehouse maintains the same level throughout the year,
time and transit time values are adjusted to allow the entity to with the exception of one day at the end of the year.
arrive to the plant at the arrival date specified in the data. This Conversely, the number of failed movements to the warehouse,
approach does not influence the results, as the total lead time on average, is higher throughout the year.
remains the same. After trying to understand this problem with process
experts, two main justifications arose. The first is that not all
movements are registered. The second is that movements are
4.6. Internal material movements registered with a wrong date. For instance, a material may be
consumed, but its consumption register is not immediately
One of the benefits of including all material movements is created (or is created with a wrong date), hence movements
to model the storage strategy followed at the plant and, hence, appear in the wrong order. This problem, in fact, demanded a
measure the stock level. In the considered plant, such strategy change in the approach to model the warehouse. While the
implies that materials are stored in an empty storage location. ideal approach would be to have a data structure comprised
Hence, such strategy implies the following two premises: by a position for each bin of the warehouse, this was not
possible due to this data issue. Thus, the solution was to
 When a material is moved to a bin, the bin is empty before measure the variation of the total quantity of material in the
this movement occurs; plant.
 When a material is moved out of a bin, the material needs
to have been previously stored in the same bin.

Fig. 1. Percentage of movements not consistent with the storage strategy followed at the plant.

To bypass this lack of data, the simulation model was run


4.7. Initial stock without considering any stock method, and the average,
standard deviation and other aggregation values were stored.
The stock level at the beginning of the simulation should In a new run, several expressions and approaches were
correspond to the one verified at the day corresponding to the considered, which use the previously calculated aggregation
start of the simulation. However, it was not possible to obtain values, in order to obtain the quantity for each material at the
historical stock data, as the ERP only displays the current beginning of the simulation. This way, the simulation is
level. “learning” the stock level to use. The following expressions
were used to calculate the initial stock:
António A.C. Vieira et al. / Procedia Manufacturing 42 (2020) 132–139 13
7
𝑄𝑄"#$%&'() ∗ 𝑇𝑇𝑇𝑇%&--./(0% (1) On this note, the use of this expression assumes that the safety

𝑄𝑄"#$%&'() ∗ 𝐿𝐿𝑇𝑇
stock of each material can be used as the stock to start the
(2) simulation. The same method was also analyzed by Schmidt et

𝑄𝑄"#$%&'() ∗ 𝑆𝑆𝑆𝑆3/'(
al. [14] in their review of safety stock calculation methods.
(3) This problem, in fact, remains as one of the hottest and more
complex research topics in the field [13], [14]. Besides the
𝑆𝑆𝑆𝑆(𝑆𝑆𝐿𝐿) ∗ 7𝐿𝐿𝑇𝑇 ∗ 𝑆𝑆𝑆𝑆(𝑄𝑄9"#$%&'())9 + 𝑄𝑄"#$%&'()
above approaches, the following were also considered:

∗ 𝑆𝑆𝑆𝑆(𝐿𝐿𝑇𝑇)9 (4)
 A: Sum of all consumptions;
 B: Quantity difference between all consumptions and all
with 𝑄𝑄"#$%&'() , average consumed quantity; 𝑇𝑇𝑇𝑇
%&--./(0% , average time between orders to suppliers; 𝐿𝐿𝑇𝑇 ,
arrivals;
average lead time; 𝑆𝑆𝑆𝑆3/'( , safety stock in time; 𝑆𝑆𝑆𝑆(𝑆𝑆𝐿𝐿),  C: Sum of all consumptions until the first arrival of each
safety factor for material;
serviceon
based level, which indistributed
a normally this case was considered
demand, to be 99,9%,
thus obtaining the  No initial stock.
value 3,9; 𝑆𝑆𝑆𝑆(𝑄𝑄"#$%&'() ), standard deviation for consumed
quantity; 𝑆𝑆𝑆𝑆(𝐿𝐿𝑇𝑇), standard deviation for lead time. Fig. 2 shows the evolution of the stock for each
Expression 4 was obtained from the literature and, as implemented method. The graph shows the stock approaches
suggested by Ruiz-Torres and Mahmoodi [13], it is one of the corresponding to expressions 1 to 4 with dashed or dotted
most commonly used methods for the safety stock calculation. lines, and the remaining four approaches with continuous
lines.

Fig. 2. Evolution of the stock level using different safety stock approaches.

Regarding this later set, it can be seen that approach A In sum, obtaining a method or an expression to calculate the
results in a high stock level, which is related with the nature of optimum safety stock is a very complex task, as corroborated
the approach, starting the simulation with all the quantity of by Ruiz-Torres and Mahmoodi [13] and Schmidt et al. [14].
materials that will be consumed throughout the year, already in Thus, with all the pros and cons above discussed, it is certainly
stock. Conversely, approach B is the result of the difference an arguable decision, however approach B and C and
between all consumptions and all arrivals. However, as the expression 4 can be emphasized. The former resulted in the
graph shows, the stock indeed decreased, albeit with the cost second lowest unfilled orders percentage, albeit approach A
of some unfilled orders (2%), which can be justified by the cannot be selected for disruption scenarios, since it would
arrival of some materials later than expected (volatile demand never result in unfilled orders, as it starts with the exact stock
or lead time). Approach C shows that it is not enough to required during the simulation. In its turn, expression 4 is one
consider the quantity consumed until the first arrival, as the of the most adopted calculation methods in literature [13], [14]
unfilled orders considerably increases, in comparison to the and resulted in less unfilled percentage than the remaining
previous approaches. Lastly, the graph also includes a calculation methods. Hence, as the analysis suggests, there is
scenario without initial stock with 59% of unfilled orders. no solution unarguably better than the others.
In their turn, expressions 1 to 3 returned considerably lower
initial stock, albeit with high unfilled orders percentage
(respectively 44%, 44% and 45%). It is interesting to note that 4.8. Production time, capacity and utilization
all approaches, except for expression 4 and approach A, tend
to the same stock level, although all, except approach B, The simulation model must consider the production
obtained the highest percentage of unfilled orders. capacity of the plant, albeit it is hard to obtain such
1
metric. Hence,
António A.C. Vieira et al. / Procedia Manufacturing 42 (2020) 132–
António A.C. Vieira et al. / Procedia Manufacturing 42 (2020) 132–139 13
9
simulation was used to estimate it. The plant’s production is not available. Thus, with the help of process experts and some
divided in two Departments, dedicated to different production field observations to measure average production times, a
phases: automatic insertion and final assembly. Thus, the generic normal distribution was applied to all materials. Note
number of capacity units of these production units was set to that the customers’ orders were replaced by Production orders
infinity and the results were plot in Fig. 3. Nevertheless, (as previously discussed in this section), which reduced the
besides recording the number of units in the production, it was scope of the SC system. Thus, a considerable impact of this
also necessary to establish a production time, which was also expression in the performance of the system is not expected.

Fig. 3. Utilized capacity units of both production Departments per week.

As the figure shows, the maximum capacity of both to bypass such issues, which was done in an iterative way.
production Departments can be determined. This is the Next section shows the types of results that can be obtained
required number of capacity units in order to fulfill all the from the simulation model, after bypassing the dissed data
orders registered in the data. The figure also shows the issues.
required capacity units for the overall production is 240.
The data issues discussed in this section, allowed to 5. Results
understand that, despite the Big Data that organizations
already have, it is arguable if their data models are In this section, the main results that can be retrieved form
complete and consistent. In fact, this section showed that in the developed and coherent SC simulation model are
the plant considered in this case study this is not the case. addressed. In this regard, Fig. 4 shows a picture of the model
Hence, the solution was to apply the approaches described in during a simulation run.
this section

Fig. 4. Orders being sent to the plant.

The model runs in a 3D world map view. The figure also these entities represents the location of the supplier. The
shows some circles placed at north of Europe. The location of
1 António A.C. Vieira et al. / Procedia Manufacturing 42 (2020) 132–
number presented under each yellow entity is the number
of
António A.C. Vieira et al. / Procedia Manufacturing 42 (2020) 132–139 14
1
days remaining for the order to be shipped to the plant. This deliveries were shipped to the plant. Apart from graphical
number decreases as the simulation clock advances in time. results, it is also possible to retrieve analytical results from the
When it is time to ship the order, the symbol of the orders tool, with Fig. 5 showing the total quantity of materials
change to the respective transport type, with the figure ordered, consumed and arrived to the plant during the
showing some of these entities highlighted. The date time years of data stored in the BDW.
values associated to each entity represent the instant
when those

Fig. 5. Total quantity of materials ordered, received and consumed per week.

The adopted approach allowed to achieve a simulation assessment of the data quality to a syntactic level (e.g., null
model that is coherent and consistent with the system being value verification), which is not enough for simulation, where
modelled, in the sense that the main elements stored in the this verification needs to be taken to a different level of
BDW are reflected in the simulations. Hence, managers from exigency. In fact, in simulation there is an obligation to
the plant can use such tool to aid them in the analysis of integrate data in such way that it must originate a coherent
uncertain and alternative scenarios. Nevertheless, the achieved simulation model (in order to accurately mimic a process, all
results also show that simulation can be used as a data its elements must be present and coherent). In this work, the
validation technique, further extending traditional data authors argue that simulation can be used as a semantic
profiling ones. In fact, simulation allowed data issues to be validator of the data model, advancing traditional data
identified, by evaluating the semantics of data, and also profiling techniques, in the sense that it allowed additional
allowed certain missing data to be estimated. data issues to be identified and missing data to be estimated.
The identified issues and the respective approaches that
6. Conclusions were implemented to bypass them, allowed to better
understand both the data sources and the associated
SC systems generate huge amounts of data, due to the business processes, hence helping in the development of the
several data sources that are used to manage the associated simulation model. In fact, the obtained results (both graphical
business processes. Furthermore, SCs are complex systems, and numerical) were the result of bypassing the identified
being useful to use both Big Data and Simulation to model SC issues, while still maintaining the overall coherence of the
problems. With these, it would be possible to test uncertainty model.
scenarios using simulation, as well as to consider the detail Despite the huge amounts of available data (around 3 GB of
provided by Big Data. In this paper, an industrial project using data), this work showed that the data model of organizations is
real data from an automotive electronics SC was presented, still incomplete, in the sense that it still does not allow
which is associated to a plant of the automotive electronics complete mimics of their SC systems to be reproduced. This
industry sector. In such highly dynamic environments, it is suggests that, despite using many software packages,
common for data issues to be verified. Thus, this paper aimed spreadsheets, IS and others, organizations are still lacking
to present the most relevant data issues that were faced while data that is relevant, in order to allow the creation of accurate
developing the SC simulation model in a Big Data context, simulations of their SCs. Such issues included data sources
while also discussing their impact on the solution and the which could not be obtained and data that did not reflect a
measures that were taken to bypass them. given business strategy followed at the plant, indicating that
Indeed, some data issues can be handled by traditional data the data was incomplete, or not registered in the correct order
profiling techniques. However, such techniques only allow the or with the correct date. Some of these issues may be related
1 António A.C. Vieira et al. / Procedia Manufacturing 42 (2020) 132–
with the top management view that often disregards the
existence of low-level data (e.g.,
António A.C. Vieira et al. / Procedia Manufacturing 42 (2020) 132–139 14
3
material movements), which is necessary in order to produce a [3] Costa E, Costa C, Santos M. Evaluating partitioning and bucketing
coherent simulation model. Notwithstanding, this barrier strategies for Hive-based Big Data Warehousing systems. Journal of Big
should be bypassed when the Industry 4.0 revolution is Data; 2019. 6, 1 (Dec. 2019), 34.
completely materialized, which will allow some of this data to [4] Vieira AC, Pedro L, Santos MY, Fernandes JM, Dias LS. Data
Requirements Elicitation in Big Data Warehousing. European,
be automatically generated, stored and integrated – without Mediterranean, and Middle Eastern Conference on Information Systems,
eventual errors related with manual interactions - to allow EMCIS, Lecture Notes in Business Information Processing; 2019. 106–
analytical methods (e.g., simulation) to be employed. 113.
In terms of future work, the following directions are [5] Vieira AC, Dias LS, Santos MY, Pereira GB, Oliveira JA. Simulation of
highlighted. In what concerns the issue of missing historical an Automotive Supply Chain in Simio: Data Model Validation. 30th
European Modeling and Simulation Symposium, EMSS; 2018. 294–301.
data, the BDW can be used to maintain it, however, these will [6] Bokrantz J, Skoogh A, Lämkull D, Hanna A, Perera T. Data quality
only be accessible in the mid- to long-term. The remaining problems in discrete event simulation of manufacturing operations.
missing data sources have to be covered with solutions aligned Simulation; 2018. 94, 11 (Nov. 2018), 1009–1025.
with the organization. Furthermore, despite the identified data [7] Kagermann H, Helbig J, Hellinger A, Wahlster. Recommendations for
issues, this paper also showed that it is possible to retrieve Implementing the Strategic Initiative INDUSTRIE 4.0: Securing the
Future of German Manufacturing Industry ; Final Report of the Industrie
results from a coherent simulation model, hence allowing 4.0 Working Group. Forschungsunion. 2013.
several types of SC risks to be analyzed. Thus, future work [8] Vieira AC, Dias LS, Santos MY, Pereira GB, Oliveira JA. Setting an
should also concern in performing such risks analysis. industry 4.0 research and development agenda for simulation – A
literature review. International Journal of Simulation Modelling; 2018.
Acknowledgements 17, 3, 377–390.
[9] Costa E, Costa C, Santos M. Efficient big data modelling and
organization for hadoop hive-based data warehouses. European,
This work has been supported by national funds through Mediterranean, and Middle Eastern Conference on Information Systems,
FCT – Fundação para a Ciência e Tecnologia within the EMCIS, Lecture Notes in Business Information Processing; 2017. 3–16.
Project Scope: UID/CEC/00319/2019 and by the Doctoral [10] Santos MY, Costa C. Data Models in NoSQL Databases for Big Data
Contexts. International Conference of Data Mining and Big Data,
scholarship PDE/BDE/114566/2016 funded by FCT, the
Lecture Notes in Computer Science (including subseries Lecture Notes
Portuguese Ministry of Science, Technology and Higher in Artificial Intelligence and Lecture Notes in Bioinformatics); 2016.
Education, through national funds, and co-financed by the 475–485.
European Social Fund (ESF) through the Operational [11] Vieira AC, Dias LS, Santos MY, Pereira GB, Oliveira JA. Supply chain
Programme for Human Capital (POCH). hybrid simulation: From Big Data to distributions and approaches
comparison. Simulation Modelling Practice and Theory; 2019. 97, (Dec.
2019), 101956.
References [12] Vieira AC, Dias LS, Pereira GB, Oliveira J, Carvalho MC, Martins P.
Automatic simulation models generation of warehouses with milk runs
[1] Tiwari S, Wee HM, Daryanto Y. Big data analytics in supply chain and pickers. 28th European Modeling and Simulation Symposium; 2016.
management between 2010 and 2016: Insights to industries. Computers 231–241.
& Industrial Engineering; 2018. 115, (Jan. 2018), 319–330. [13] Ruiz-Torres AJ, Mahmoodi F. Safety stock determination based on
[2] Zhong RY, Newman ST, Huang GQ, Lan S. Big Data for supply chain parametric lead time and demand information. International Journal of
management in the service and manufacturing sectors: Challenges, Production Research; 2010. 48, 10, 2841–2857.
opportunities, and future perspectives. Computers and Industrial [14] Schmidt M, Hartmann W, Nyhuis P. Simulation based comparison of
Engineering; 2016. 101, 572–591. safety-stock calculation methods. CIRP Annals - Manufacturing
Technology; 2012. 61, 1, 403–406.

You might also like