Thesis Analytics Procurement
Thesis Analytics Procurement
Gabriella Farkas, BA
angestrebter akademischer Grad / in partial fulfilment of the requirement for the degree of
Master of Science (MSc)
Abstract
I have researched a very popular area of nowadays in my thesis, where my goal was to study
academic literature and summarize their results in the topic of data analytics and big data within
supply chain management.
In the first step, I am trying to clarify the concepts related to the areas of data analytics as well
as big data, as well as the relationship between their concepts, sub-parts, methods, and
algorithms. This was a particularly challenging task, because there is not existing any generally
accepted terminology and taxonomy about these elements of big data analytics yet.
As a next step, I am categorizing the research papers in accordance with the element of a
supply chain planning matrix and demonstrating the tools, methods as well as algorithms
have been used in order to optimise processes and enhance efficiency at a specific or overall
part of the chain.
Finally, I am giving a brief overview in general about the benefits, challenges and hurdles
of using these analytics.
II
Contents
List of Figures ........................................................................................................................... III
List of Tables ............................................................................................................................IV
1. Introduction ........................................................................................................................ 1
2. Data analytics and big data ................................................................................................. 2
2.1. Taxonomy of terms ..................................................................................................... 2
2.2. Classification of terms ................................................................................................ 4
2.2.1. Data Science ....................................................................................................... 5
2.2.2. Business intelligence & advanced analytics ....................................................... 6
2.2.3. Business analytics ............................................................................................... 9
2.2.4. Artificial intelligence .......................................................................................... 9
2.2.5. Machine learning .............................................................................................. 10
2.2.6. Data mining ...................................................................................................... 10
2.2.7. Big data ............................................................................................................. 16
2.2.8. Big data analytics.............................................................................................. 22
2.3. Technologies in analytics ......................................................................................... 22
3. Supply chain management ................................................................................................ 24
3.1. Definitions of supply chain management ................................................................. 24
3.2. Structure of supply chain management .................................................................... 25
3.3. Challenges in supply chain management.................................................................. 28
4. Data analytics in supply chain management..................................................................... 32
4.1. Definitions ................................................................................................................ 32
4.2. 3Vs in supply chain and logistics management ........................................................ 32
4.3. Data types used in supply chain and logistics management ..................................... 33
4.4. Implementation of data analytics in supply chain management ............................... 36
4.4.1. Framework for implementing data analytics .................................................... 36
4.5. Application of data analytics along the supply chain planning matrix ..................... 37
4.5.1. Strategic network design .................................................................................. 37
4.5.2. Product design and development ...................................................................... 38
4.5.3. Purchasing ........................................................................................................ 40
4.5.4. Production planning .......................................................................................... 41
4.5.5. Distribution planning ........................................................................................ 44
4.5.6. Inventory management ..................................................................................... 45
4.5.7. Transport planning ............................................................................................ 47
4.5.8. Demand Planning ............................................................................................. 48
4.6. Advantages and hurdles of big data and analytics .................................................... 53
4.6.1. Benefits and opportunities ................................................................................ 53
4.6.2. Challenges and barriers .................................................................................... 54
5. Conclusions ...................................................................................................................... 57
Bibliography ............................................................................................................................. 59
Appendix .................................................................................................................................. 66
III
List of Figures
Figure 1: Elements of Data Science ........................................................................................... 3
Figure 2: Map of terms relating to Data Science and AI ............................................................ 4
Figure 3: Evolution of BI&A (Chen, et al., 2012) ...................................................................... 7
Figure 4: Main steps of KDD process (Tan, et al. 2006).......................................................... 11
Figure 5: Nine steps of KDD process (Maimon & Rokach, 2005) .......................................... 11
Figure 6: Data Mining with its tasks and methods (Maimon & Rokach, 2005)....................... 14
Figure 7: Big Data transformation to valuable information (Min, 2016) ................................. 20
Figure 9: Supply chain planning matrix (Meyr, et al., 2002) ................................................... 28
Figure 10: Framework for Big Data Analytics implementation (Sanders, 2016) .................... 36
Figure 11: Deduction graph for finding the best competence sets (Tan, et al., 2015) .............. 39
IV
List of Tables
Table 1: Differences between SQL and NoSQL (Shetty & Chidimar, 2016) .......................... 23
Table 2: Summary table of data analytics, methods and techniques in supply chain............... 52
Introduction 1
1. Introduction
This master thesis is giving an overview about a highly discussed topic from nowadays life that
is data analytics and big data on the field of supply chain management. My aim was to
summarise and categorise all of the research papers and books that has written in this topic
during the last decade as well as show what kind of benefits, challenges and barriers could bring
the implementation of big data analytics with itself.
Due to the fact that it is a relatively young and popular topic several research papers have been
issued and the newer ones continuously become public. Therefore, it is impossible to summarise
all of the available literature in this topic so I had to select the best works from them. I took
such factors into consideration for that like:
- how well know and acknowledged the authors are,
- how many citations the paper have,
- in which journal the paper has appeared,
- how old the paper is or
- whether a specific literature contains a new and value-added information.
In the next lines I would like to give a quick insight into my thesis structure which has three
main chapters dealing with the topics of data analytics and big data, supply chain and logistics
management and the combination of these two previously mentioned areas. First, I described
all the terms that related to data analytics and big data as well as I tried to visualise how they
relate to each other. This was a very challenging and consuming job as there is still not existing
any generally accepted definitions about them and it has neither clearly defined the structure
and hierarchy of these different approaches, methods and algorithms to each other. Because of
this issue, I mostly used literatures from recognised scientists and professors who have already
put milestones in this field and created a taxonomy by consolidating their works.
I have found important to introduce the term of supply chain management and thoroughly
described the elements of the supply chain planning matrix. All these things are in the third
chapter of my thesis and they are also completed with operational problems all along the chain
that could be efficiently solved by data analytics.
Last, I summarise research papers focusing on data analytics and big data application in supply
chain management. After I explained the relevant definitions and characteristics of this type of
analytics I started to categorise the papers in accordance with the sub-parts of the supply chain
Data analytics and big data 2
matrix then I divided the rest academic works into two groups, namely benefits-opportunities
versus challenges-barriers of applying big data analytics.
1 Terabyte: it is equal ca. one trillion bytes or one thousand gigabytes. To give a good example about it – a
one terabyte space could include 300 hours of good quality video or 1000 copies of Encyclopedia Britannica
(Quintero, et al., 2015).
2 Petabyte: it can be same as approximately one thousand terabytes or one million gigabytes. For example, it
can hold 500 billion pages of standard printed text (Quintero, et al., 2015).
Data analytics and big data 3
Business analytics – which is sub part of both business intelligence and advanced analytics - is
dealing with such data and problems that usually come up in business life. If it is reporting with
the use of historical data and solving less complex problems then we are talking about business
analytics within business intelligence, otherwise it belongs to advanced analytics (Evans, 2017).
There is the possibility that we divide into more sub-groups the previously mentioned areas like
descriptive, predictive and prescriptive
analytics. Descriptive analytics is also using
data from the past and less complex,
sophisticated methods and algorithms to
identify some trend in the data set and to
predict future “numbers”. In this sense,
descriptive analytics constitutes a big part of
business intelligence and dealing with such
popular tasks like SQL analytics, dashboards,
metrics reporting or OLAP query (Rozados &
When we are talking about AI we need to differentiate between “narrow” artificial intelligence
that is applied today by science and deals with carrying out only some specific tasks by machine
learning, data mining or natural language processing. The second type is the so called general
artificial intelligence, where the aim is
to create a machine with a more
advanced intelligence able to perform
human interactions (Goertzel &
Pennachin, 2007). Under machine
learning we understand such systems
which have the ability to learn from
previous patterns discovered in the
data set and apply this “knowledge”
Figure 2: Map of terms relating to Data Science and AI
on similar problems later on. Both AI
and Machine Learning are similar, however AI focuses not only learning from examples,
definitions or behaviour, but it is also about reasoning and problem solving (Kersting, 2018).
Data mining uses different algorithms to produce proper data for later analysis and it gives the
basis for machine learning as it is in the need of appropriate information for the learning
process. Both data mining and machine learning uses the same algorithms, however machine
learning systems can learn on by own, in case of data mining this is led by humans (Brooks &
Dahlke, 2017).
In the last few lines I would like to explain how big data and big data analytics related to all
these terms/concepts I have just written about. It is clear that the techniques, tools and within
that the algorithms of data science or AI are working with data. They have the capability to
store, process, transform, analyse and gain information from any type of data varying from
small to big. If the data considered to be big data according to specific characteristics then all
the advanced analytics which work with these datasets named as big data analytics (Bhagat,
2015).
After this brief overview about the relationship among these new phenomena of data science
and AI I am going to provide a more detailed summary about all their sub-elements, tasks,
techniques and algorithms on the following pages.
some cases, I am also going into details regarding their tools and techniques when I consider
these methods necessary to be classified due to later chapters of my thesis.
2.2.1. Data Science
It applies quantitative and qualitative methods to solve relevant problems and make predictions.
It is a group of methods which consist recording, storing, analysing and effectively gaining
information from structured and unstructured data (Waller & Fawcett, 2013). It is part of a
computers science however instead of including only programing and modelling algorithms it
focuses on data analysis that may not only used by computers. It’s close to statistics in a sense
that its main task incorporating collection, organisation, analysis and presentation of data
(Hernán, et al., 2018). However, professionals from diverse field of science considered
differently the relation of statistics to data science. Some of them arguing that data science is
equal with statistics because it is about analysing data which statisticians originally do for
several decades. Others are more convinced about the fact that statistical analysis is dealing
with rather small data samples in contrast to data science that aim is using scientific method for
creating meaning from large scale raw data (Donoho, 2017). Further critics about why statistics
does not relate significantly to data science:
- Statistics has not changed enough to respond quickly and efficiently of computer age
problems which data science normally does
- Statistics has rather helped with pre-computer data for the people hypothesis-testing
during the centuries, while data science focuses on retrieving, analysing and manipulate
information on large data set (Carmichael & Marron, 2018)
Prowost and Facett also emphasise that this science is not just about data mining and if someone
wants to be successful in that, one needs to examine the problems from a business point of view
too (Provost & Fawcett, 2013).
It includes old and new methods from the field of machine learning, data mining, business
intelligence and analytics – see the definition in the following later chapters - and its
fundamental goal is to continuously discover new techniques for data analysis. Depending on
business goals managers can choose data science either for a more sophisticated analyses or
apply business intelligence for a simple reporting or visualisation of the data (EMC Education
Services, 2015).
The tasks of data science are building three groups:
- Description: Shortly it is a quantitative summary of certain features of the world where
the used techniques ranges from basic calculations (e.g. mean) to more sophisticated
methods (e.g. cluster analysis)
Data analytics and big data 6
- Prediction: This is using data to map inputs to outputs. It starts with easier tasks and
simple analysis (e.g. correlation coefficient) and then become more complex (e.g.
predict joint distribution of multiple variables).
- Causal inference: This means using data to model certain features of the world if it had
been different. Calculations here usually connects to randomized experiments or
generalized methods (Hernán, et al., 2018).
The first phase of the BI&A has been developed from data management and warehousing
and its methods based on such analytical and statistical techniques from the 1970s and in
1980s. To mention more concrete examples ETL 3, OLAP4, database query and reporting
tools are playing important roles with which companies can extract enterprise specific
knowledge from dataset. The statistical analysis and data mining techniques (e.g.
association analysis, regression analysis, classification etc.) are focusing on supporting
different business fields. Numerous huge IT companies - like SAP, IBM, Microsoft -
3
Extract, Transform, Load process is the way of get data out of source systems where the company`s
operational data has already processed and stored in a normalised database. Then they are placed into
business warehouse (Gronwald, 2017).
4
Abbreviation for Online Analytical Processing that access multidimensional or relational data (example? )
from business warehouse for analysis and data mining (Gronwald, 2017).
Data analytics and big data 7
created BI&A platforms in order to help business in data processing and knowledge
extraction (Chen, et al., 2012).
At the beginning of 2000s several new opportunities became available for BI&A due to the
emerging trend of internet and web. It provided online platforms for observing costumers’
preferences and constantly communicate with the buyers – best examples for that online
store of Amazon and eBay. So businesses realised that they could define more accurate the
needs and preferences of their clients by analysing their interaction on the web. BI&A 2.0
has been developed that was characterised by web intelligence, web and text mining
centred around the analysis of unstructured web content. In contrast BI&A 1.0, techniques
haven’t been integrated into enterprise IT systems which created a future challenge to
invent mature and scalable technologies of text mining, web mining or of social network
analysis that could be incorporated accordingly into a business’s system (Chen, et al.,
2012).
BI&A 3.0. are part of nowadays life where the usage of mobile and other technological
devices is rising, and these tools can reach several applications connecting computers and
humanities through internet and web. Popular technologies of this new era are the sensor-
based internet-enabled devices (like RFID5) through which the amount of arrived data are
increasing even more and set such challenges for companies like how they could efficiently
cope with the processing and exploiting the continuous inflow of sensor information or
how they could create a proper integrated commercial system for business intelligence and
analytics (Chen, et al., 2012).
5 Abbreviation for Radio Frequency Identification and it is also called as a transponder that is attached to
objects to count or identify and it has an antenna and a microchip for communication and storing information.
Active tags are able to communicate with each other and with RFID reader, which is a transmitter in this
system (Jia, et al., 2012).
Data analytics and big data 8
As I have already written in the definition we must differentiate between two major concepts
which are business intelligence and analytics where analytics were starting to refer to advanced
analytics during the past years. In the following I will shortly compare them:
We find a definition of business intelligence from a managerial view which says that it collects
the right information for the right people, in a right time in order to enhance company
performance and support decision making. Business intelligence also exits for several years, so
during these time companies could collect vast amount of data which exceeded their storage
capacities. As a result, they organised data bases into data warehouses which are core element
of BI programs nowadays. Originally BI was equal with OLAP and reporting tools but after a
while enterprises realised that if they want to use the gathered information effectively it won’t
be enough to report historical data only, they have to move towards the more adaptive advanced
analytics (Bose, 2008). Examining the time horizon, BI could produce reports and dashboards
referring to past and present events and most of the time it answers questions which are helpful
to examine past cases such as quarterly targets or yearly sales (EMC Education Services, 2015).
In contrast to BI advanced analytics uses more sophisticated modelling techniques and focuses
rather on predicting future events and explore patterns (Bose, 2008). There is also the possibility
to divide BI and analytics into further sub-parts:
- Descriptive analytics try to explore what has happened in the past and make visible
trends and expectations (Rozados & Tjahjono, 2014) The main techniques in this group
are standard reporting and dashboards, ad hoc reporting, OLAP and visualisation (Zeng,
et al., 2011).
- Predictive analytics focuses on the present and analyse real time information from
which it tries to predict future events (Rozados & Tjahjono, 2014). Its tools are
algorithmic based and some of the technologies used here have the capability to learn
from data (Siegel, 2013). The typical examples for that are data mining and machine
learning, predictive analytics also includes advanced forecasting and time series
methods (very popular with supply chain management and marketing), clustering, plus
supervised learning with regression and statistical algorithms (Rozados & Tjahjono,
2014).
- Prescriptive analytics use optimisation and simulation techniques based on data to
forecast different outcomes of future scenarios. In contrast to the other two analytics
here the future and effect of future decisions in the centre for which prescriptive
analytics try to provide advice as well as recommendations. It is embedded uncertainty
Data analytics and big data 9
and variably as the techniques derives from what/if analysis and game theory (Rozados
& Tjahjono, 2014).
where the identification of patterns and hidden relations in a dataset are happening (Bose,
2008). The structure of KDD includes three main steps like data pre-processing, data mining
and data post-processing (Tan, et al., 2006).
The process begins with the collection of input data from different sources, cleaning them and
remove noise. Then we come to data mining where the previously mentioned actions take place.
Finally, the integration of data mining results into the system closes the second step. Under
post-processing we understand the validation of these results for make it sure that only useful
and correct results are put into the decision support system (Tan, et al., 2006). The three main
step of the iterative and interactive KDD process can also be divided into nine steps starting
with defining KDD goals and ending with the integration of the discovered knowledge. At the
time when the process closes the effects are measured in a new data mining repositories. As a
last step, these results are evaluated and then the KDD process runs again. The nine steps can
be described shortly as the followings:
4. Through the transformation, we can get such data that are better used in data mining.
The most common applied methods here are dimension reduction (e.g. selecting out
features) and attribute transformation (Maimon & Rokach, 2005).
The following steps belongs to Data Mining and the algorithms used during it:
5. The previously defined KDD goals and the actions taken in the previous steps can
enable the user to choose a proper data mining task for the knowledge discovery such
as clustering, classification, regression etc. Data mining strategy is developed in this
phase considering the available data and what extent these data can be used for inductive
learning model.
6. After defining the goals and task of data mining the user will chose a concrete algorithm
for searching patterns in the data set while considering several indicators (accuracy vs.
understandability).
7. As a next step the algorithm is launched and run several times until the user reach a
acceptable result (Maimon & Rokach, 2005).
The last two steps fall under the Data Post-Processing phase:
8. The result is going to be evaluated and interpreted in accordance with the goals that
were set in the first step. Here, the attention is on the usefulness and comprehensiveness
of the model and the documentation of the discovered knowledge for later works.
9. As a final step, the knowledge is started to be implemented and used in another system
for further actions. In this last step a user could measure the real efficiency and success
of the KDD process. The integration of the discovered knowledge is a big challenge as
the previous conditions could quickly change (e.g. data structures, change from static to
dynamic data and so on) and the system needs to adapt again to the new environment
as rapidly as it can (Maimon & Rokach, 2005).
We can also distinguish among some storage types where the mining usually takes place. The
most known are relational database, transactional database, data warehouse, advanced database
systems (Han & Kamber, 2000).
query using relational query language (like SQL). It enables the user to carry out
relational operations (like join, select) or use aggregate functions (like sum, avarage).
Finally, it can be also successfully applied for searching trends and data patterns such
as predicting risk in case of different business transactions or detect deviations (Han &
Kamber, 2000).
- Transactional database stores such records which have something to do with a
transaction. It is common to enlarge the database with additional tables which also
connect to the similar transactions (eg. customer ID, sales person ID etc.). It is also
neccessary to mention here the market basket data analysis through which it is possible
to find out which products are sold well together and increase the sales so (eg. dipers-
milk, computer-printer etc.) (Han & Kamber, 2000).
- Data Warehouse is a place where data derived from multiple sources and they are stored
by some unique scheme on a single place. Data are organised here according to subject
(like customer, supplier, item etc.) and have historical perspective (e.g. give insigt in
the last 5 years). Data warehouse is also a place where data cleansing, transformation,
integration, loading and periodical data refreshing could take place. The warehouse is
usally visualised as a cube with multiple dimensions related to an attribute or set of
attributes and the cells in the cube represent a value of some aggregate measure. This
physical structure of data warehouse has different level depending on the complexity of
stored information. The base cuboid is a cube containing primitiv information while
non-base cuboid deals with higher level multidimensional structures. These two cuboids
together shape a data cube (Han & Kamber, 2000).
- Advanced database systems were developed during the years to face and solve the new
challanges for data mining. These are able to handle spatial data, engineering design
data, multimedia data and also time-related data with efficient data structures and
scalable methods. Object-oriented and object-relational database systems are usually
exploited here to gain useful knowladge from the data set (Pandey, et al., 2015).
Understanding the relation among data mining task and methods we should create a good
taxonomy. First of all, we must separate two types of data mining: verification and discovery.
Through verification the evaluation of hypothesis takes place and commonly used such methods
like traditional statistics, t-test of hypothesis or variance analysis. However, it is weakly
connected to data mining as its methods are hardly dealing with discovering new knowladge,
they are rather working on testing. In contrast to verification, discovery types of data mining is
Data analytics and big data 14
dealing with finding new patterns and knowladge from the beginning. Within this we could
make a difference between predictive and descriptive task. The techniques of discovery focus
on inductive learning in which case the model is able to learn from training examples and used
an iductive approach to solve future examples (Maimon & Rokach, 2005).
Figure 6: Data Mining with its tasks and methods (Maimon & Rokach, 2005)
- Regression is a general analytical method which discovers the relationship among the
dependent variable and several other independent variables. With the help of the
regression function and the equitation the model attempt to describe how strong the
independent variables can explain the dependent one and which of the independent
variables have the most intense influence on the outcome. Two commonly used types
of regression analysis are linear and logistic regression. The first one usually observes
the connection between several input variables and a continuous dependent variable if
the model considered to be a linear one. The second one is more able to predict the
probability of the outcome with the help of the input variables. Outcome variable has
multiple values however in most cases we are speaking about a binary variable (EMC
Education Services, 2015).
- Classification is a kind of process where a specialist classifies unstructured data into
structured datasets. The first phase of the method is the learning process when the
analysis on the training data set takes place and then one creates rules and patterns. In
the second step there is an evaluation of dataset and a storage of the classification’s
accuracy (Koturwar, et al., 2015). Classification models are existing in several forms
Data analytics and big data 15
like decision tree, neural network or Bayesian network. One of the most popular
classification model is the decision tree where each node represents a test on an attribute
value, then the branches shows the outcome of these tests and the leaves of the tree
always indicate a classes or class distributions (Han & Kamber, 2000).
- Time Series Analysis is a task from statistics which deals with time series data or trend
analysis. Time series data are chronological order of data that were measured through a
defined time period in the past and its purpose is to forecast future values. This mostly
applies in economics, retail, manufacturing and finance where most specific examples
can be listed such as retail sales forecasting, spare parts planning as well as pair trading
on the stock exchange (EMC Education Services, 2015).
Another main task can be named as descriptive where the aim is to derive patterns, which
explain the relationships in the data. These task are mainly explanatory therefore they require
post-processing activity to examine the validity of the results or explain them. We can
differentiate three main groups of techniques here (Tan, et al., 2006):
association rules mathematically (Agrawal & Srikant, 1994). They defined a set of items
and another set of transactions with a subset of items. In the rule they noted the
antecedent by X and the consequence by Y. Frequency is a key element of the method
which measured in case of appearance of each element in the dataset and comparing it
with a minimum support threshold (Fernandez-Basso, et al., 2016). With the help of this
rules one can clearly see whether X was observed then Y was also observed (EMC
Education Services, 2015).
- Anomaly Detection task is a good way to identify such elements in the data set which
characteristics are considerably different from other data. These “members” of the set
are called anomalies or outliers in practice where the goal is to identify as much them
as possible and avoid to label those which are data with normal features. Here, fraud
and network intrusion detection or signalling ecosystem disturbances gain an especially
important role (Tan, et al., 2006).
help of analysing traditional and logical connection among several events derived from
information data decision-making has been formed from a static to a dynamic one (Hammer, et
al., 2017).
The quick evolution of big data brings several definitions with itself. In the following, I am
going to show the most popular and universally accepted terms that were developed by
scientific researchers through years.
The term “big data” first appeared in 1998 in a presentation hold by John Mashley with the title
of “Big Data and the next wave of InfraStress”. The first book, which mentioned this word,
originally dealt with data mining and issued in 1998. In case of academic papers, we had to
wait two more years when in 2000 Diebold wrote about it in his research (Bhagat, 2015).
Big data is also considered as a situation in which data sets are growing and will continue to
grow so rapidly and so large that conventional technologies are not able to handle this size
anymore. Acquisition, storage, searching, sharing, analytics and visualisation are especially
difficult to manage in case of this vast amount of data. It is also necessary to mention an
interesting fact - the term “big data” became as a synonym for business intelligence, business
analytics or data mining in many literatures (Ohlhorst, 2012).
The most frequently used term created by Gartner analyst company:
“Big Data is a high –volume, high- velocity and high-variety information assets that demand
cost-effective, innovative forms of information processing for enhanced insight and decision
making.” (Gartner, cited in Gandomi & Haider, 2014, pp.138)
This definition is also supported by the Data Protection Party of the European Union which
uses big data for exponentially growing amount of data set - any huge data base which used by
government, national administrative bodes and companies or analysed extensive by computer
algorithms while identifying general trends and connections among them (European Union,
2013).
Andrea DeMauro, Greco and Grimaldi (2016) examined large amount of definitions in their
academic papers and arrived to the following conclusions:
The definition of big data also includes technological and analytical methods, it clearly
defines what kind of tools and techniques we need to use to collect data which main
characteristics are volume, velocity and variety. They also emphasise that big data has
a value, because all the analysis, which could be gained from this data set, produce
value-added information.
Data analytics and big data 18
This definition already reflects the three main characteristics of big data also named as a popular
3 Vs. Later on, other researchers added 2 more Vs to the concept like value and veracity
(Demchenko, et al., 2013):
Volume
It simply means the amount of data which is generated on each day by business
participants. The amount of data is so large that it’s impossible to save and analyse them
by using general data processing methods (Demchenko, et al., 2013). There is not an
accepted threshold for magnitude but the companies and professionals consider data
mainly in volume of terabyte or petabyte as Big Data (McCarthy, et al., 2019).
Velocity
It refers to the speed of data generating, processing and analysing. Big data is also “live”
in the sense that data are continuously created and flowing through the system. Parallel
to this the speed of processing and interpreting them are also increasing (Demchenko,
et al., 2013).
Variety
It refers to the number of types of data. Nowadays more than 50% of the data in the IT
systems are unstructured, which means they are not organised into tables they don’t
have formal structures. As a consequence, it is hard to interpret them or identify any
relations at first (Ohlhorst, 2012).
Value
It means the added value to a company as a lot of corporations invest huge amount of
money to create their own big data platform in order to generate value for their own
businesses (Demchenko, et al., 2013).
Veracity/Validity
This characteristic wants to capture the quality of big data in opposite to Volume
features which is considered as a features lack of quality (Demchenko, et al., 2013).
After I have found several definitions for big data as well as the characteristics of big data there
is still one question remained open – what makes big data big data? In order to find an answer
for that I am summarising the research paper of Kitchin and McArdle (2016) in which the two
researchers are dealing with exactly the same problem. The authors are examining the 3Vs
characteristics and other attributes of a different type of data sets and trying to define specific
characteristics or specific type of data that can clearly fall under the umbrella of big data. In
Data analytics and big data 19
addition to the volume, velocity and variety they are also considering exhaustivity,
extensionality, and scalability. Kitchin has already written about the difference between big
data and general, “small” data - he found out that production of small data is happening in a
traditional, controlled way using sampling for processing. In contrast to big data, the generation
and administration of it considered to be highly inflexible as well as not scalable (Kitchin, cited
in Kitchin & McArdle, 2016). In order to find a more exact answer for the question when we
are talking about Big Data the researchers examined a data sample included 26 data types,
which are considered to belong to the group of big data according to previous literatures and
researches. These data types are from seven different domains, such as social media data,
website, mobile communication data, data from sensors, camera, transaction process generated
data and administrative data. During the research they were examined by each of the previously
mentioned characteristics:
- If we are talking about the volume of big data it is usually considered to be so huge that
the storage capacity of systems must cope with the amount of data in terabytes or
petabytes. This could only be realised in clouds covering several systems and locations.
However, the research gave the result that in case of some data set it is not necessary to
have large storage capacity as the volume of generated data is huge but each of the data
are only in bytes/kilobytes. Kitchin and his colleague concluded that in case of this
characteristic it is impossible to define a threshold for big data.
- In case of velocity big data are normally produced and collected continuously, in real
time and not dealing with samples where there is a temporal gap. Here, frequency in
generation, recording and handling of data are playing an important role and considered
to be as a key characteristic of them.
- The weakest characteristics of all is the variety after the examination of the 26 data set.
Both small and big data are heterogeneous which means any of them can be structured,
semi-structured or unstructured.
- Exhausivity could also characterised all of the examined 26 datasets that actually
includes such features like fine-grained solution, indexicality and relationality. In every
case data go hand in hand with metadata through which it can be identified if data
produced by similar tools but in different time or place as well as if there is any
relationship among datasets. However, in case of normal data from the 26 datasets
sample this relationality is not so obvious in many cases.
Data analytics and big data 20
- In case of extensionality, the researchers observed if the generated big data is variable
and required a highly flexible system that is capable to add or remove fields with respect
of the collected data. After the research, it became already clear that in some cases set
of big data hold the feature of continuality and robustness without the necessity of an
adaptable system.
- The last attribute is scalability where the main question was if big data requires a system
that is able to handle varying amount of data inflow. The research brought a result that
most of the cases inflow of data is connected to specific intervals or it is simply
continuous, therefore scalability could not unambiguously characterise big data (Kitchin
& McArdle, 2016).
Considering all the facts what Kitchin and McArdle (2016) have found during the research we
can say that there are several forms of big data and they cannot be characterised by same
attributes. However, they are arguing that velocity and exhaustivity are the most important
features, which means that if a data holds these two attributes, then we can consider them as
big data. Otherwise, it can be misleading an unclear whether we are facing with big data or not.
Transformation of big data to a value-added information was explained thoroughly first in 2016.
which was completed at some parts by findings of other researchers, too. The following picture
is showing the main steps which will be described more briefly later on (Min, 2016):
Fitting
Finding the Data Data Data
Data sampling Data Filtering Analytical
right data Preparation Segmentation Warehousing
Model to Data
Huge number of data do not explicitly imply that the company will find the right data
for sure. The datasets should be revised and transformed in order to get a valuable
information. In order to reveal all of the information within that, one could examine the
whole population of the gather data which clearly would be very costly, also time
consuming. Data sampling is used to bridging this issue which means one can extract a
smaller group from the whole amount of data, that considered to be representative, and
make conclusions. It is always a current issue how analysts choose the correct sampling
method that fits the most to the business entity’s problem. When it comes to this step
there are two questions which are necessary to be answered: How often and how many
data should be collected?
Data analytics and big data 21
Till now I was written about raw data which has to be transformed to such format that
analyst can gain relevant information from them. The authors call this process data
preparation which includes the following tasks: exploring, cleansing, changing,
shaping, screening, profiling, integrating and publishing. After the transformation,
experts can carry through efficient analysis, furthermore they will easily recognise
patterns within the set.
The fourth step is the segmentation which sounds very familiar from marketing, and its
important task is to classifies costumers into smaller groups according to some aspects.
Here, the enterprises gain insights into customer behaviour, reactions in different
situations or buyers’ opinion about products. Data segmentation simply means that a
dataset is divided into smaller categories by some characteristics (e.g. demographic
data), by privacy (e.g. sensitive), by structure (e.g. codified) or by format (e.g. SPSS6).
The aim is to achieve customisation as well as personalization.
In addition to this, we have data filtering for refining Big Data analysis and cutting
down the amount of data with a help of specific tools and display only those records in
which analysts are interested and meet the criteria previously set up by them.
While a company is continuously collecting this vast amount of information it comes to
question how and where it could be stored correctly. It induces the necessity of a new
storage and architecture which is more state of the art comparing with a traditional data
storage like, data bases, data marts and data warehouses (Bakhsi, 2012).
Data Warehouse is a single space for storing big data. It supports decision makers to
find easily the useful information in a such system where the data have already been
transformed into an applicable format and documented properly. This relational
database already included the subject oriented, time variant, integrated data which is
impossible to removed or altered after it is stored in a structured form. The raw data can
derive from three sources like external, operational and independent data mart. Their
different formats will be integrated into a single, particular one then let them flow into
the data warehouse system by the time they are already ready for query, analysis and
reporting. Extract-Transform-Load process takes place after the extraction of the data
and it prepares the data in a way to avoid inconsistency and shapes to a relational and
multidimensional format, and finally loads them into the warehouse.
6 The acronym SPSS stands for Statistical Package for Social Sciences and broadly used for statistical
analysis tool in academic areas (Burns & Burns, 2008).
Data analytics and big data 22
As a last step business decision makers must choose the best analytical methods which
fits to the collected data in order to gain better insights into business operations,
practices. After the right method was selected the extracted information has to be
presented in a more user-friendly format as it still needs to be understandable for those
who are not technical experts, but still want to gain knowledge from them (Min, 2016).
SQL NoSQL
Record with same attribute Records with different attribute
Data type
Structured data Semi-structured and unstructured data
Store data in several logical table Store data in a form of flat collections
Structure in order to avoid redundancy and where data is duplicated (key-value
duplication pair, JSON data)
Table 1: Differences between SQL and NoSQL (Shetty & Chidimar, 2016)
It is also necessary to mention one of the most famous innovation connecting to big data
phenomena that called MapReduce. It is a programming model for handling large data set with
a map function processing key/value pairs. There are more than 10.000 programs, which used
MapReduce with different algorithms for text processing, machine learning or graph processing
(Dean & Ghemawat, 2010).
Hadoop is an open source project implementation of MapReduce, which help analysts to cope
with problems related to this vast amount of data especially when both unstructured and
structured data belong to a data set. Furthermore, it provides a support in such situations when
analytics are deep and computationally extensive (e.g. clustering) and solve the problems of
storing and accessing to the huge amount of data for decision-makers. Further benefits of
Hadoop are that it can work on several machines, which do not share any memory, as well as
leverage the management from excessive, unclear information by breaking the data into
smaller, more manageable pieces (Ohlhorst, 2012).
7
ACID refers to four characteristics of a data, namely atomicity, consistency, isolation and durability (Yadava,
2007).
Supply chain management 24
- Chritopher and Towill (2001) define this as a management system and a the network of
equipment, tools, and distribution alternatives through which sourcing, shaping of
materials to final products and delivery of them to consumers can be realised.
Nowadays’ rapidly changing word we rather talk about the competition between supply chains
than between businesses. This chain consists all of the members who help to fulfil the customers
need. The following list includes the participants, such as suppliers, manufacturers,
transporters, retailers and customers who close the chain. They are covering different fields of
business processes in the system like production, marketing, finance or sales. We also need to
mention the decisions and decision making processes which are founded in each part of the
chain but different decision horizon regarding time (Chopra & Meindl, 2013).
Supply chain strategy and design cover a longer time horizon with several years and build on a
long term decision making. It’s important to mention that a right decision can only be made if
the company considers the uncertainty generated by different unforeseeable market actions.
This phase includes questions like – what kind of supply chain configuration a company should
Supply chain management 26
create, how the company could allocate the resources correctly between its sub-parts or whether
outsourcing a function or performing it by the company would be the better decision. The
middle phase called supply chain planning where the time horizon extends from a quarter to
one year. One of the main part of the planning is to make prediction about demand, cost and
price for the following quarter (year) as well as decide the inventory policy and delivery
schedule – which location serves which customers. This phase also include uncertainty as the
business doesn’t have a certain information about how the market competition and other extern
effects will influence the demand, exchange rate etc. On the other hand, we are talking about a
much shorter time horizon here in contrast to the first phase, so the companies are able to react
more flexible to unexpected turns in their environment. The literatures refer to the last part as
supply chain operations which is already working with weekly or daily time horizon and the
focus is on the personal customer requests and the most efficient handling of these. This phase
includes the matching of inventory and production parameters to each customer order, also the
scheduling of delivery time and managing replenishment orders. Uncertainty occurs in
vanishing time thanks to the short time period, which enables the decision makers to react more
flexible to unexpected events or customer request as well as to reach better results in
performance optimisation (Chopra & Meindl, 2013).
The above mention three phases are also perfectly match to the three levels of a firm’s activities:
strategic, tactical and operational. Beginning from the first where decisions have long lasting
effect through the quarterly/yearly to the day-to-day decision-making scope. Within a supply
chain Stadler (2004) distinguishes sub-parts with different issues and these elements can also
be connected to one of the three levels:
- Strategic and distribution network configuration means the network of all the
production plants, warehouses and customers on a specific geographical region.
Decisions made here are part of a strategic level, since the creation, redesign or
reorganisation of a distribution network will have an effect for many years ahead. The
most important question is how a network can be created which brings reduction in
production, inventory and transportation costs.
- Product design and development is also a part of the strategic planning and the decision
which are made here can be costly. Creating a product design or redesigning a product
involves large expenses - considering especially the manufacturing - and they could
increase inventory holding or transportation costs comparing it to other designs, too.
The main issues that need to be solved is the timing of redesign in order to reduce
Supply chain management 27
logistics cost or changes on the supply chain system to benefit from the new product
design.
- Production planning is considered to be a tactical action in a life of a company and
belongs to a mid-term planning and decision making phase usually taking a quarter or
a year into account. Here, decision makers need to observe shifts, machine groups, flow
lines or any operations in this level in order to identify any bottleneck in advance. At
the end, leaders of this department should set up a correct work scheduling, sequence
of jobs.
- Distribution planning comes after goods are produced and needed to share them among
production and distribution sites. Managers job is to schedule the delivery to customers
by considering the route of transport – it can happen through warehouses, cross docking
or directly to customers. Another issue is to matching a supply with demand in every
period. It means that supply chain expert should be able to coordinate the flow of goods
that companies are handicapped with a lesser extent from the shortage of supply.
- Demand planning is part of a mid-term decision making where the job is to forecast
demand with the help of famous univariate, multivariate or life cycle methods (e.g.
Winter’s exponential smoothing model). Here the professionals also add influences into
a model which have a high probability to happen in the future (e.g. introduction of a
new product) and try to examine its effect on sales.
- Inventory control belongs to the operational level as the retailer has to monitor and
predict demand of the customers and the change of the inventory level on a daily or
weekly basis. The most important questions of the retailer are: when is the right time to
reorder and how much products are necessary to order to reach a low inventory ordering
and holding costs? Uncertainty in demand is also need to be consider as well as the
impact of forecasting tools.
- Transport planning is usually short-term task as products need to be delivered within
days or week in most of the cases. Challenge is here to pay attention specific customer
request or labour regulations, for example time windows for delivery and working hours
for drivers. Because of these constraints companies facing a vehicle routing problems
on a daily basis.
Supply chain management 28
- Demand fulfilment is the last step in the matrix that ended up in an order execution. It
is part of a short-term level where due date setting and shortage calculations are weekly
or daily jobs of a manager.
Inventory management is such area of supply chain where one can face to many optimisation
problems. Several methods and algorithms were born during the years in order to handle
inventory on a cost effective way and aiming to satisfy costumers needs on a required level. It
indicated a technological development where suppliers and retailers are following the inventory
related data through a common system which enables a better cooperation within the supply
chain (e.g. EDI8 system, POS9 data management system).
In case of inventory management, I need to mention the vendor managed inventory concept
which is one of the most popular research area in this field. This concept gives a greater power
to the supplier as it has a greater control on inventory management in contrast to the retailer or
buyer. Supplier mainly pay attention on optimal order quantity or service level requirement
where its goal is to create a beneficial transportation and inventory holding cost through which
it can optimise its profit. Throughout the years, researcher have invented such models which
are closer to real-life and capable to handle real-life instances. This means these algorithms can
already work with more suppliers, more retailers in a dynamic model (Bichescu & Fry, 2007).
Transport optimisation constructs such part of supply chain management where the focus is on
the proper routing by which the product can arrive in time at the customer. Here, the reserachers
mainly deal with vehicle routing problem and its models. They are always containing a fleet of
vehicles, which are going from depo to costumers and back. The more sophisticated real-life
problems contains intransit points or warehouses where diffrent amount of inventory stocks are
available. Furthermore it is typical that these models must handle real life constraints such as
customers set a time-window for delivery or drivers are not allowed to work more than 8 hours
per day. In this case it is typical that the model must handle the fact that ordered products of
some customers must be delivered by more than one vehicle. Besides, real time information
about weather, traffic and inventory stock helps companies to easier plan capacity or re-
schedule delivery (Cordeau, et al., 2007).
Demand forecast is probably one of the most important part, because it is functioning as a driver
for decisions and processes at any stages of the supply chain. Before action is taken in different
part of the chain the company has to predict their buyers’ need and a future demand as
accurately as possible. Usually forecast is considered to be inaccurate but there are several,more
8 Abbreviation for Electronic Data Interchange which means an exchange of business documents (e.g.
purchase invoices, orders, ship notices) among computers mainly of suppliers, retailers, banks and clients
(Cohen, 2013).
9
POS means Point of Sale, which is a technology, carries information about customer payments. Today it is
mainly used at check-out counters in retail or grocery stores to observe purchase trend and customer needs
(Whitteker, 2014).
Supply chain management 30
advanced method with which this accuracy could be improved and parallel the forecast error
could be reduced. Without being exhaustive, the most popular forecasting method used by both
researchers and businesses are the following: Static time-series forecasting sticks to historical
data and use same historical values for forecasting, so this is a very simple and less accurate
method. In contrast to this there are several adaptive forecasting methods where the data about
trend, seasonality and level are updated in case of every new observation about demand like in
the Moving avarage model, in the Holt’s model or in the Winter’s model (Chopra & Meindl,
2013).
Here I would like to shortly mention an interesting phenomena in demand forecasting which
can highly effect the members of the supply chain on different stages. It is very common in
practice that information flow among stages are irregular, incomplete and distorted, there are
conflicting objective of some functions as their managers want to reach the highest profit and
the most optimised processes only at their own areas. Because of these issues bullwhip effect
is formed which means that fluctuation of orders will become greater and greater between the
stages. Supply chain participants received disfigured number of demand starting from retailers
to wholesalers, then going towards to manufacturers and to suppliers (Chopra & Meindl, 2013).
Causes of bullwhip effect is intensively discussed in literatures and the main obstacles can be
summarized as the following (Whang & Chuu, cited in Hofmann, 2017):
- Demand signal forecasting is working with aggregated data derived from past.
Companies consider them important to predict a future level of inventory as well as
achieve a good service level, however historical data is not always reflecting current
demand of buyers. As a result, supply chain professionals working with incorrect
numbers which will pull through the chain and forecasting result will stay less and less
in accordance with real demand numbers (Lee, et al., cited in Hofmann, 2017).
- Another obstacle which could contribute to bullwhip affect - this is such forecasting
that based on exaggerated orders. In this case current demand on the market is higher
than goods available after production, therefore customers not able to purchase enough
to satisfy their needs. If a buyer company is able to predict produced good beforehand
then it will order over that amount it needs in reality in the hope of receiving a higher
portion from the available products. Again, seller company faces misleading
information resulted in poor forecast, inappropriate planning and scheduling (Whang &
Chuu, cited in Hofmann, 2017).
- Order batching is a beneficial method for firms to save cost occurring during
transaction of transportation or ordering. For this end buyers find financially more
Supply chain management 31
optional to order goods in greater packages and not individually where they can save
money with volume/quantity discounts. Unfortunately, this isn’t show the correct
demand again, since customers only order at specific, unforeseeable time with which
inventory planning is hardly possible (Hofmann, 2017).
Data analytics in supply chain management 32
management. Rapid and real time analysis of large volume of data sets are essential for any
company to keep its competitiveness on a dynamic market. Exploiting 3V characteristics of big
data supports companies to create an adaptive supply chain that possesses agility in action,
responsiveness and resilience. Adaptability feeds from large data sets collected through
information and communication devices and find rooms for improvement on areas, like
restructuring of business actions, rearrangement of resources or coordination of actions
(Rynarzewski & Szymczak, 2016).
I would like to start with the first characteristic, the volume which usually set challenges in case
of a multi-stage supply chain as the flow of the data and information are happening through
several stages which makes problematic to avoid the loss of data.
Velocity get even greater attention among the 3Vs as it can attributable for successfully reducing
the bullwhip effect along the chain. Data analytical tools and techniques make it possible to
collect, process and analyse datasets with an increased speed in real time. As a consequence,
velocity changes the decision making process to be more effective (Hofmann, 2017). Richey
and his colleagues conducted a research focusing on big data 3Vs in supply chain and found
that increased velocity is considered as an opportunity and an obstacle by supply chain experts
at the same time (Richey, et al., 2016).
It is also necessary to mention the last V, variety that means data arrives in different forms and
structures at the company. RFID readers, sensors and GPS devices gather data in different level
of structures, which indicated that the traditional, old way of data storage – which was able to
store only one type of data – has been switched to a more up-to-date storage system adapted to
big data features.
Other reserachers have also dealt with 3Vs of big data in their research paper and they brought
some general examples from supply chain perspective. Volume can be seen as a vast amount
of data generated by sensors, bar codes and ERP systems. Collected big data clearly hold variety
thanks to diverse sensors at manufacturing sites, retailer shops and facilities. Utilising velocity,
as a speed of data collection, can be helpful to faster explore useful knowledge or quickly come
to a decision (Benabdellah, et al., 2016).
Radio Frequency Identification (shortly: RFID), GPS and Point of Sales (shortly: POS) sensors.
Beginning with RFID, it can be described as a type of data that plays an important role in
inventory management at both manufacturing sites and retailer stores. GPS data support the
tracking system of a company and used in transportation control at first. However, there are
several studies about the fact that it is good to exploit this kind of data in production, especially
in production scheduling too. At finally yet importantly, we arrived to the POS data used at the
beginning of the supply chain in order to monitor directly buyers’ purchase behaviour as well
as improve demand forecast (Myerson, 2007). These three previously mentioned data types are
described more thoroughly in the following lines:
chain, then point of sales data might not as relevant for it as for the seller. However,
some researchers proved that if supply chain members contribute in some way to
creating or selling the final product then POS data will never be negligible for them
(Zhu, 2013). Furthermore, POS data is considered to be more accurate in sensing the
actual end-customer activities, so they can reduce the amount of out-of-stock stand as
well as better cope with bullwhip effect that usually occurs if a company use rather the
purchase order-driven approach (Keifer, 2010).
- Third type of data are coming from GPS devices that mainly used in logistics in their
transportation systems to make the flow of information easier and tackle with the
challenge of on-time delivery, just in time supply of material or control production
processes which can also be independent on transportation. Analytics that based on
GPS data can provide real time information for companies about transport interruption,
estimated times of arrival or about travelled routes, so production scheduling can be
optimised with respect of these information. As it can be seen GPS data are not only
useful in transportation, like in cargo tracking, but it also carries valuable information
for manufacturing which supports firms to re-schedule more flexible their production
processes. Future steps will be probably the integration of a global system of mobile
communication, GPS and RFID technologies within one tracking system which could
revolutionized logistics and production as this integrated system will contain transport
planning, warehouse management and production control based on smart materials at
the same time (Klumpp & Kandel, 2011).
Data analytics in supply chain management 36
framework is to measure performance with KPIs designed to segment attributes. Right metrics
are key to a success of a company as they correctly reflect in which part is necessary to optimize.
In most cases strategically aligned metrics decided by all process members are the best for this
purpose as they are not only controlling segment characteristics but also give information about
degree of alignment, integration and cross-enterprise cooperation. Framework can be seen as a
circle thanks to the continuous improvement cycle as enterprises go repeatedly through the
three stages in order to learn and improve their operation progressively. Company metrics and
the formed segments with competitive priorities are connected by a feedback loop that helps to
shape up segment processes. The author also believes that big data analytics is clearly able to
support this step-by-step improvement during the cycle as they make it more efficient to
continuously monitor KPIs and detect or mend deviations in real time (Sanders, 2016).
4.5. Application of data analytics along the supply chain planning matrix
There is a possibility to make a classification of the literatures according to the element of the
supply chain and operation matrix. I have found and read plenty of research papers in the topic
of big data analytics where the researcher focused only one element of the matrix or the model
and wrote about big data applications in that specific field.
As we have seen from the supply chain planning matrix in the previous chapter it can be divided
into long-, mid-, and short-term decision making and planning phase which can be identified
as strategic-, tactical- and operational-level.
responsive and efficient strategic and distribution network for the long run in the hope of better
outcomes on disaster or underdeveloped zones (Prasad, et al., 2016).
Wang, Gunasekaran and Ngai (2016) also dealt with distribution network within supply chain
and the application of big data. They identified that traditional network configurations are not
able to keep up with a quickly changing supply chain operations and there is a need for redesign
of the network structure. Basically distribution operations are working with a vast amount of
data which can be exploited to identify the necessary number of distribution centres and the
right assignment of clients to them, plus reduce operational cost at the centres. They argued that
there is a lack of literature which mention an exact application of analytics in this field, so their
goal was to create a powerful model – this was a mixed-integer nonlinear program - which is
able to solve supply chain and distribution network problems using big data. The source of data
is twofold – first of all, they are deriving from historical data bases, however the author also
used behavioural data gathered from social media, web clicks, comments and so on. All of them
was essential part for identifying proper customer needs and locations for distribution centres
in a supply chain network with more than 2000 stores. The objective of the model is to minimize
the total fixed costs, transportation and handling costs of distribution centres as well as decrease
the penalty cost which occurs in case of unfulfilled customer orders. With the help of simulation
and sensitivity analysis there is a possibility to select better locations for the centres in which
case the model uses randomly generated big data sets for customer demand, warehouse
operation and transportation. All in all, big data enables firms to find additional information
and set up a more complex distribution network.
among customers. Then they tried to gain useful, quality data from the survey and transform
them to a format which is applicable in the model. Through the previously mentioned data
mining techniques they extracted the necessary information for product development and as a
last step they tested the reliability and validity of the integrated rules. With the help of the
decision tree the researches came to several results regarding customer opinion of different
product types. Furthermore, gaining more sophisticated information by data mining not only
provide insight of customers’ need but also support manufacturing innovation. Scientists also
concluded that results of the research need to be generalized through more experiment with
other products as well as exploit greater amount of data and other mining approaches (e.g.
genetic algorithms) in order to identify future product development patterns.
There is also another research which aim is to identify an appropriate competence set by big
data in order to create a competitive supply chain as well as enhance product development of a
company. Tan and his collegues (2015) set up an analytics infrastructure which support
managers to generate new product development ideas through identifying necessary
competences for the production of new goods. The idea based on a deduction graph model
which help firms or departments to combine their competence set with other companies’ or
production unit’s. The whole method can be described as a continuously evolving process
where the conjunction of competences happens for the optimisation and it is also eligible to
handle more than one decision maker at the same time (Li, 1999). The model can be presented
as following:
It has a set of problems which need to be solved, set of needed competence, set of acquired
competence and it also contains intermediate skills which job is to connect the needed and
acquired competences during the process. Then the model builds up a learning network (graph)
by starting from the already acquired set of skills to the needed competences through the
intermediate skills (Tan, et al., 2015).
For better understanding the authors provided figures, see below:
Figure 10: Deduction graph for finding the best competence sets (Tan, et al., 2015)
Data analytics in supply chain management 40
The idea has been tested through a real case of a glass manufacturing company where the
authors tried to identified product development ideas, optimise manufacturing process in a cost
effective way and understand how different enterprises can work effectively together. The
scientist created up the previously mentioned mathematical model for discover the best learning
sequence by generating the greatest profit at the company. After they collected the concrete
data for the different sets they were already able to build up the graph which showed which
competences can be learned from existing skills and which need to be “purchased”. As a final
conclusion they argued that using only big data is not enough for proper decision making, they
have to relate to some supply chain specific problem for better utilization. Like in this case,
these two elements need to be joint for better use (Tan, et al., 2015).
4.5.3. Purchasing
The first element of the mid and short term planning is the purchase and procurement that is
highly dependent on the suppliers of a business and has a direct effect on inventory
management.
Authors of the scientific paper suggest improvement on the procurement process after they
examined the purchasing and sourcing processes by advanced data mining techniques. They
believed that decision makers can gain better insight into procurement processes by analytics
which help them to come up with a more efficient sourcing strategy.
They used a model with text mining based on clustering for which there was a special program
called RapidMiner as a platform. This software is proper for machine learning, data or text
mining and other predictive analytics. Database of a real company from the information and
communication technology sector was used in the research in order to carry out analysis on the
purchase data between the years 2011 and 2014. The company has divided their purchase into
three procurement approach types where the related data were stored in different formats and
spaces in each of the cases. As each of the transactions contain a lot of unnecessary data the
first step was to clear them out and only keep the information about the description of the
purchased items, period of transaction, amount spent on the item, supplier and buyer data. Then
comes the transformation of the description text into such format that can be handled by the
cluster modelling technique. Clustering happens with the K-Means method, which identified
the occurrence frequency of words in the records’ descriptions. This showed that if a word came
up with many times that indicated more transaction, means more purchase of a product or
service. However, there are one big disadvantage of text mining as it cannot clearly show how
much the cost is that belongs to the different transaction occurrence. It could happen that goods
or services with few purchases have high cost while products with high transaction occurrence
Data analytics in supply chain management 41
have only low values. During the methodology, the authors focused on the supplier side and
identified their purchase amount and volume. The results of the two analysis now can
complement each other and give a verified and complete information about the procurement
process that can be used for further improvement. After the experiment, the authors identified
those common goods and services among the purchases that should be aggregate among the
purchases and those which should rather purchase by individual project teams instead of the
whole company. Besides these, they argued that the conducted analyses could enhance visibility
of purchasing goods and services. If the different project teams have access to cluster analysis
with their identified group of purchases then they are able to plan more accurately and
coordinate with a higher efficiency the long-term procurement plan (Tan & Lee, 2015).
Lee, Kao and Yang (2014) were among the first scientist who proposed a machine system
which could handle industrial big data as well as smart production issues. They argued that
companies’ current manufacturing system are not able to cope with mass amount of data, so
there is a necessity to switch from regular to self-aware, self-learning machines with a capability
of self-prediction, self-reconfiguration or self-maintenance. The first issue is that productivity
and production quality is heavily dependent on scheduling and task design, but current
machines are working passively and reach in any case even when assigned task are not optimal
for the machines’ condition. Contrarily to this, smart machine system could advise better task
arrangement and able to modify operational parameters. Another issue is the lack of adaptive
learning and exploitation of big data information. The main cause for that is the absent of
prognostics and health management system, therefore health monitoring algorithms are not
working perfectly. Furthermore, condition monitoring data are usually processed by algorithms
which are incapable for learning or developing themselves along with applying real-time and
optimised big data for the analytics. The authors believe that a cyber physical information
system10 would be appropriate to reach a fleet wide information system and set up self-aware
10Cyber physical system includes computation with physical processes, and it is an intersection of physical
and the cyber. It is characterised by high degree of automation, real –time and securely delivered tasks, it is
networked at a multiple scale, and furthermore it has an integration for learning, adaptation and higher
performance (Sanislav Teodora & Miclea, 2012).
Data analytics in supply chain management 42
and self-maintained machines which are able to estimate their health and degradation. They
examined a cyber physical system which uses knowledge base and related algorithms instead
of the common simulation- or control-oriented ones. Here, knowledge base is built through
clustering with the help of adaptive learning and data mining where learning algorithm creates
groups according to similarities of the machines (e.g. machine performances or working
conditions). While the algorithm is running through the cluster in search of a good health
assessment it can end with two different results – it either finds an already existing cluster and
updates it or create a new one for the newly found “behaviour”. Machines in the same group
will have a very same health condition and will require similar interventions. Some of the key
benefits of this proposed framework are:
- Coping with unprecedented event becomes more easily for prognostics and health
management algorithms
- Companies can optimize production and create energy-saving maintenance scheduling
with this new and improved way of health prediction. It may also cut down on machine
downtime as well as decrease labour cost.
- Finally, this new smart manufacturing system enables industrial management to become
more transparent and achieve a more effective information flow among production lines
in supply chain management (Lee, et al., 2014).
There is another interesting topic that discussed by a research paper: RFID technologies used
in production and on manufacturing sites. Zhong and his collegues (2015a) propose a method
which could support to create a smart, data-driven RFID enabled shop floor manufacturing
using a visualisation approach. They named their model RFID-Cuboid which is able to
restructure raw data deriving from RFID sensors while considering specific production logic
and time series. The invention of the authors belongs to the new Cloud Manufacturing paradigm
that enable companies to convert traditional manufacturing resources to smart manufacturing
objects. After this transformation the new system contains objects which are already able to
sense, react, interact and cooperate applying wireless communication standards. Those data
which collected through this way are ordered into a RFID cuboid according to three dimensions:
RFID data dimension (x-axis), behaviour dimension (y-axis) and key attributes (z-axis). Each
cuboid represents a logistic event and every RFID cuboid are connected together into a chain
from which it is clearly see who, what, where and when did an operation. So this chain helps
to visualise an entire logistic life cycle. As a next step a logistic trajectory can be pictured then
we arrive at a graphical visualisation of logistic operators’ performance as well as production
Data analytics in supply chain management 43
progress. In the last phase, cloud-manufacturing data has already been cleaned from useless
information so statistics, machine-learning procedures and prediction models can obtain
relevant knowledge for decision-making. More exactly, decision makers are able to
successfully exploit this information for creating an intelligent future environment for logistics
planning and scheduling, inventory control, logistical task assignment and promotion strategy.
At the last step of their research, they examined the feasibility and potential improvement areas
of their models on a real-life case from an automotive firm. Concrete findings were that the
visualisation approach helped the daily operation of several users. As an example, mined
average logistics time is helpful for decision-makers in case of material resource planning and
production decisions. Furthermore, the model also showed the performance of different
logistics operators and identified that junior employees should rather work on internal logistics
operations while senior workers deliver tasks in case of external issues. At last but not least, it
also revealed which workers should improve on their efficiency because of high inventory level
at specific departments with a noticeably low logistics efficiency. As a future research area the
authors suggest to enlarge this model for a global or multi-echelon stage in supply chain and
try to cope with an even greater amount of data (Zhong, et al., 2015a).
Scientists use physical internet11 for creating an intelligent shop floor manufacturing in another
research paper. If we examine more thoroughly this new concept it means an adaptation of
networking technology, wireless and cloud-manufacturing in order to create smart
manufacturing objects (shortly: SMO) again from production resources, like manufacturers,
workers or materials. The authors propose here a physical internet based intelligent
manufacturing shop floor and they use RFID technology and wireless communication network
to set up their model.
As a first step RFID readers need to be placed areas from row material production areas till
finished product receiving sites where each raw material with RFID tags matched to a pallet.
Next the materials arrived at a manufacturing shop floors where all of the machines have a
stationary RFID reader as well as the internal and external operators carry a mobile reader
device. By these processes pallets can be monitored all along the production stages. Another
specialty is the transformation of resources to smart manufacturing objects - in a way it was
explained in the previous research by Zhong and his colleagues (2015a) – which are able to
11 Physical Internet is an open, global, interconnected and sustainable logistics system, which is found on
physical, digital and operational interconnectivity through interfaces and protocols. In this system modular
containers in different size are moved through the multimodal transportation networks and are aggregated at
transit sites from different origins to optimise loading on the next segments (Shenle, et al., 2017).
Data analytics in supply chain management 44
interact and behave with each other in accordance with an earlier defined logistics logic. As a
following step, big data generated by SMOs are collected in a data warehouse with the help of
a wireless communication network. Then, big data analytics starts to work on the data sets step
by step from cleansing through pattern interpretation to knowledge representation. At the very
final phase the gained knowledge helps different application in a logistics management, like in
real time decision making, knowledge base prediction and logistics knowledge repository. The
researcher examined their proposed big data analytics for physical internet based logistics data
within the framework of a case study. They successfully identified several behaviours and KPIs
which are accurately reflect and evaluate the performance of the operators and the operations.
Such indicators are for example the total delivered or shipped number of smart pallets (depends
on whether we consider internal or external logistics employee) as well as a total time spent on
logistics from raw material to finished product. Considering future research, they suggest to
develop a mathematical model for physical internet driven logistics in order to make their
proposed model more transparent, accurate and credible (Zhong, et al., 2015b).
4.5.5. Distribution planning
A good delivery scheduling brings higher customer satisfaction, lower delivery attempts plus
companies can ensure that delivered products arrive at the time when customers are able to
receive them. A distribution concept was built up, which can support the above mentioned
advantages of a delivery. This take the sales data of an online stores which is more special case
than examining an original store as here the stores are struggling more to reach a good custmer
satisfaction in many cases. The authors considered data-intensive tools like global positioning
or sensor networks technology. The system contains three elements: the computational center
carries out the analytics using the real-time data from the customer and delivery vehicle to
optimize the distribution. Costumers are sharing their location data through an application that
send them to the center. The delivery vehicles have all the products signed by an RFID code
plus other sensors which are sending all the up-dated information about the location. Besides
these, the system captures and calculates with other real-time data - such as weather condition,
traffic information or road construction – because they could also intensively influence the
distribution and rout planning. In more detailed: after the sensors have sent the location
information of the vehicles and customers, the computational center provides the three most
favourable options for each customers regarding the time and location of the delivery. The
clients could choose among the options on the mobile application and then these decisions are
automatically sent to the center for recalculating the distribution plan of the vehicles as well as
showing the most optimal routes for the drivers. All things considered, big data analytics could
Data analytics in supply chain management 45
provides an effective delivery scheduling with high customer satisfaction by using data
analytics applications (Engel, et al., 2014).
A is writing about advantages of using big data and analytics in supply chain and logistics based
on the survey of consulting firm as well as examples based on logistics and commercial
companies. One area that benefits from big data and predictive analytics is shipping where
companies get the opportunity to utilise better distribution capacity and increase the speed of
delivery time after real time information are captured from the dataset. A special use case of
the analytics is found in freight transport where the parcel volume analytics can forecast more
accurately the expected number of parcels and cargo within the supply chain network. The
technology based historical data derived from internal company environment and from Google
search, weather forecast or shopping behaviour of online customers, on the other hand. Another
potential benefit that was listed by the authors is connected to goods shipment. The main idea
is that applying data analytics enable logistics firms to start a delivery to customers prior to
their orders. Companies can match group of goods to geographical regions where there is a very
high probability of that the customers from different areas will look for products to the assigned
group. The aim of the companies is that the customer order arrives when a product is already
in transit, so they could significantly decrease delivery time and parallel increase customer
satisfaction, plus the number of sales (Leveling, et al., 2014).
Data analytics can also be advantageous is resource planning in a strategic and operational level.
It means the configuration of distribution network on a strategic level, while operational level
includes capacity planning on a daily or monthly basis. What big data techniques could achieve
in both cases is the improvement of the reliability of the planning and optimal matching of
available resources and demand by logistics experts. Authors bring advanced regression and
scenario modelling techniques as a good example. They can utilize the much higher volume
and variety of big data, consequently a business can achieve longer forecast periods or cut down
on the risk of long-term infrastructure investment and contracted external capacities. In the past
operational tasks were planned accordingly to historical data and personal experience which
has changed a lot in the big data era (Mikavica, et al., 2015).
4.5.6. Inventory management
Ittman’s research paper (2015) concentrates on the benefits of big data analytics in retailer and
vendor collaborations and e-commerce vendors. The more exact benefits are: better visibility
for future orders, low in-stock items; reduction the impact of late shipments; predict how
profitable special quantity deals are or provide the ability for retailers to see pricing and
Data analytics in supply chain management 46
allocation strategies where no historical data are available (Ittman, 2015, pp. 6). Most of the
companies already have a platform on the internet too, which help them to easily monitor
customers shopping behaviour by analysing their clicking and browsing on websites.
Furthermore, internet based analytic tools can also generate useful information from web sites’
data for managing inventories at different distribution centres and optimise fulfilment
strategies. All thing considered, predictive analytics at logistics, e-commerce and predictive-
shipping uses recent and historical data at the same time to create “personalised customer
engagement” (Ittman, 2015).
The research paper of Zhou with his colleagues (2017) are discussing an issue about the lack
of intelligent inventory systems that could properly handle the intense inflow form data as well
as effectively do forecasting, carry out anomaly detection and evaluate inventory aging.
According to the previous studies, the companies use mainly statistical analysis on historical
inventory data that bring a less accurate planning. Considering these problems, the authors came
up with a state of the art intelligent inventory system called as iMiner, which is capable to use
data mining approaches on enormous data sets and efficiently help out in inventory planning
and controlling tasks. Before this program has been created and implemented, the researchers
defined all the challenges according to the studies on retail companies that the program must
solve. The four key issues were business big data management, inventory forecasting, inventory
anomaly detection and inventory aging analysis. Data mining technologies in iMiner programs
that were supposed to solve these issues were regression analysis, classification-based learning
plus different visualisation tools for easier interpretation of results. The developed models
according to the different inventory management areas are the following:
- The goal of a good inventory forecasting is to predict future demand as accurately as
possible and reduce inventory loading. Here the authors developed a dynamic prediction
model based on machine learning techniques and time series analysis. The algorithm
defines the hidden patterns on a stock in/stock out time series by using a regression
model based on historical data. In order to make the forecasting more accurate the
algorithm also consider other factors such as long-term trend, seasonality or event
factors (e.g. sales promotions or holidays). The authors emphasise that the stock in and
stock out amounts are highly dependent on each other, so this independency is needed
to be handled by the model. (Stock in amounts have to pay attention on stock out in
short term in order to avoid excess demand, vice versa to avoid out of stock issue.) From
this reason it was completed by a multiple time series prediction which is already able
Data analytics in supply chain management 47
to capture the dynamic relationship among different time series and make proper
forecasting considering the connection between stock in and out.
- Classification-based anomaly detection is applied for better inventory management,
which goal was to look for abnormally high fluctuation of stock or sales data. The
algorithm was designed to find these anomalous elements in the data set and sign them
with labels; these are then converted into a classification problem. The iMiner’s job to
use this classification model on a training set, memorize the patterns and effectively use
what it has learnt on new data arriving on a daily basis (Zhou, et al., 2017).
- Inventory management has also cope with the issue of aging inventory that simply
means avoidance of overstocking items as well as cut down on the number of
overstocked products. Here, feature selection technique was used, which approach built
in a way that it searches for such attributes, which cause overstocking. As a first step a
filter model separate the redundant and unnecessary features from the attributes then try
to define a candidate set of relevant attributes. After that, the random forest as an
ensemble learning algorithm can select the essential attributes, give proper significance
to the selected attributes and return the measures of attribute importance This algorithm
is basically built up with several decision trees where each tree represents a random
subset of features and has the access to a random set of data points (Breiman, 2001). As
a result, the program can detect the correlation between the attributes and the
overstocking as well as show for the businesses which items should be monitored more
intensively in order to successfully tackle with inventory aging (Zhou, et al., 2017).
The authors applied iMiner in a Chinese company getting a good result. The main advantages
that this intelligent inventory application has already brought with itself were realisation of
large-scale and automatic inventory data analysis, provide an intelligent and efficient decision
support system for management and transform the inventory from a demand-driven to data-
driven system (Zhou, et al., 2017).
A very young paper from the year of 2018 proposed an intelligent transportation system through
the vehicle-driving path planning optimization. Considering long term general transport
planning model - which already used in the past by scientific researcher for finding optimal
routing - are providing good and reasonable results, however if we consider short-term
planning the problem becomes more complex with high nonlinearity, time-varying and
uncertainty. It creates a special challenge for researcher to design an effective prediction model
able to handle the previously mentioned complexity as well as use big data in an optimal
manner. One of the experiment’s aims was to set up a good traffic network to which the scientist
used high-volume GPS data collected by 12.000 taxis in Beijing. There are a few popular
methods, which are applied in case of short-term planning and give a good estimation of travel
time and vehicle traffic density in case of many scenarios, however still struggling to handle
real-time data – these are the historical data-base model, time series model, regression model
and machine learning model. The scientists wanted to take a further step and come up with a
model which could solve a short-term path planning and traffic flow problem based on the
above mentioned GPS data. At first, the authors used a clique-based clustering method to get a
more accurate result, so they divided the around 50 GB of GPS data by geographically and
timely. As a next step, they predicted traffic flow with the help of an artificial neural network
(shortly: ANN), then the improved Dijkstra algorithm has found an optimal path using the
traffic speed information derived from ANN based model. An accuracy test was carried out to
examine the difference between the predicted results in case of different prediction models and
the actual data. The authors concluded that their proposed model give the most accurate traffic
situation of all, so it could be used to find an optimal path more effectively (Zhu, et al., 2018).
patterns in traffic data and relational analysis helped to find such factors which affecting these
traffic patterns. As a final step, decision tree was ideal to discover the relationship between the
created clusters and influencing factors as well as forecasting responses to data. The authors
concluded that demand forecasting based on big data could help in future planning of operation
profiles in power systems, as analysts will be able to forecast vehicles’ charging demand at
commercial and residential sites. Furthermore, it can also be seen that the proposed demand-
forecasting model contributes for investment and planning decisions of future infrastructure of
electronic vehicle charging.
Two years later, Hofmann and Rutschmann (2018) issued a paper in which they have looked
for an answer to the question how big data analytics improve demand forecasting. The authors
emphasise that finding a good answer for this question is a challenge as other researchers have
not found a clear relationship between big data analytic techniques and demand forecasting
until nowadays. Moreover, there was a lack in detailed description of methods and applications
in this field as well as it was unclear whether big data analytics is suitable for substitution of
existing techniques in forecasting. In order to find the opportunities and potential of it the
authors have chosen the retail industry and examined the value of big data there. They believe
that different types of analytics match better to different types of forecasting (thinking of
different time horizons for example), plus there is a necessity of aligning between input, scope
or method of certain analytic types for more accurate outcomes. Hoffman and his colleagues
(2018) use their own classification for analytic techniques, which resulted in five sub-groups:
- Data Exploration is a self-service analytics technique used directly by business
participants for gain insight into business operations. Then, this insight is always
discussed with other employees before final validation of results.
- Advanced Analytics is able to give answer in more complex business situations using
data mining, statistical methods or machine learning. Here, computer models
immediately process unstructured data and combine data sets.
- Interactive Analysis and Planning have a connection to business intelligence that is no
more an IT-led consolidation technique but an interactive function available for several
users at the same time. Here, we speak mainly about structured data that loaded into an
enterprise data warehouse 12 (shortly: EDW). EDW combined with advanced analytic
techniques can contribute to the in-load of more extracted data and to better data content.
12
Enterprise Data Warehouse is a repository that gives analytical information about business processes and
core operations. With the help of IT the enterprise-wide business requirements can be easily fulfilled as well
as it could also provide a weapon against the competitors (Tupper, 2011).
Data analytics in supply chain management 50
One of the main findings of the research shows that techniques are used in forecasting with
short-, mid- and long-term. Companies can reach an improvement in forecasting results if they
choose embedded and stream analytics in case of short-term time horizon, as these methods
provide them with data that could influence demand and give information about products in
store. Medium-term forecasting should deal with advanced analytics mainly (descriptive and
predictive type) and in some cases exploit the benefit of interactive analysis and planning as
well as data exploration in order to increase customer insight and trend awareness. Finally, the
long-term forecasting is better to apply data exploration and descriptive advanced analytics.
Stream analytics with their real-time responses and embedded analytics with their focus on
operational decision-making would never be an optimal choice here. When it comes to strategic,
long-term forecasting, top management needs to easily reach previously prepared data by
advanced analytics techniques (Hofmann & Rutschmann, 2018).
A very young research paper is about a product-in-use big data in demand planning. As the
authors focused on an automotive industry this type of collected information also means any
type of data related to vehicle from manufacturing till vehicle service and workshop data (e.g.
maintenance). They also found a more concrete “vehicle in use data” name for that. After they
finished their research, they concluded that product in use data can be exploited in spare parts
of demand planning at a greater extent and have good performance effects. Moreover, it has
become clear that demand planning in this field is a challenging and complex task due to low
frequency, intermittent demand as well as because of large variation in price, criticality and
specificity. In addition to this, supply chain structure is also a critical point at the demand
variation and can be attributable for the bullwhip effect. The traditional spare part forecast in
the automotive industry is working with historical data and time-series forecasting for a non-
13CRM is the abbreviation for Customer Relationship Management that is a part of business strategy with
customer focus aiming to optimise profitability, revenue and customer satisfaction. It measures costs from
marketing, sales and service sides, while applies knowledge about customer needs, behaviour to improve
performance (Vogt, 2009).
Data analytics in supply chain management 51
intermittent demand, but these methods are not appropriate to handle the previously mentioned
challenges. The authors argued that causal based methods are more capable in an aftermarket
context if there is a good availability of high quality data as well as a good knowledge to identify
explanatory variables after data analysis. All in all, the two researchers’ aim was to find answers
how product-in-use data can be effectively used in an automotive industry and how these could
support performance positively. According to their research, they identified three causal-based
forecasting methods intensively discussed in other literatures. The first one is the regression-
based method, which can easily handle the phase in phase out context, the second one named
as reliability based method that matches to the early life cycle phase of the product without any
historical data. Finally, the third condition-based techniques can be useful in case of low
frequency items. After this categorisation, the researchers divided them into eight sub-groups
where they have written about different interventions during the demand planning of the spare
parts. These have three main effect on the demand forecasting which are the following:
- Some of the interventions are able to generate item forecast with the usage of demand
history through which forecast accuracy will be improved.
- Other interventions have effect on demand planning process as they turn forecast
demand to planned distributed demand through which uncertainty could be decreased
- Manual intervention as a sub-group is contribute to alert generation which help supply
chain experts to keep attention on abnormal vehicle increase in a specific region or
identify outliers by different item categories in demand history (Andersson & Jonsson,
2018).
The conclusion of Andersson and Johnsson (2018) is that product in use data cause positive
demand planning performance outcome within a supply chain as there is a clear improvement
of forecast accuracy by causal based methods. The authors also provide ideas for future research
where scientists could test each interventions through a more exact, single case with
quantitative data and examining the implementation of the proposed methods on a real-life
example.
Data analytics in supply chain management 52
Table 2: Summary table of data analytics, methods and techniques in supply chain
Data analytics in supply chain management 53
Furthermore, we can read about how analytics can help in case of personalized service and
improve service quality in the third sub-group. It basically enables companies to make analytics
from social media, mobile and web data, then they will see what products customers have
bought or probably will buy. The authors also added that analysing customer interactions across
all (media) channels could also beneficial to create higher customer engagement (Wang &
Alexander, 2015).
We can also find benefits of big data analytics in the area of retailer and vendor collaborations
and e-commerce vendors. Most of the companies already have a platform on the internet, which
help them to easily monitor customers shopping behaviour by analysing their clicking and
browsing on websites. Furthermore, internet based analytic tools can also generate useful
information from web sites’ data for managing inventories at different distribution centres and
optimise fulfilment strategies. The more exact benefits are: better visibility for future orders,
low in-stock items; reduction the impact of late shipments; predict how profitable special
quantity deals are or provide the ability for retailers to see pricing and allocation strategies
Data analytics in supply chain management 54
where no historical data are available (Ittman, 2015, pp. 6). Data analytics at logistics uses
recent and historical data at the same time to create “personalised customer engagement”
(Ittman, 2015).
At last but not least, big data analytics enables companies to enhance collaboration with supply
chain partners. The data analytical techniques can connect suppliers and customers to a
company’s big-data-driven system and clearly enhance efficiencies throughout the chain. It is
also emphasised that with the help of big data systems they are able to better monitor
procurement trends, pull the procurement and then strengthen their bargaining, negotiating
power. Just-in-time inventory system can be named as a good example for the benefits of big
data. After its appropriate implementation it produces cost savings, reduces stock-outs and
associated opportunity costs, plus achieved optimum inventory levels (Richey, et al., 2016).
4.6.2. Challenges and barriers
There were four groups of hurdles created for different companies in one research – namely
Needle in a Haystack, Islands of Excellence, Measurement Minutiae and Analysis Paralysis.
The aim was to find out what factors are the most challenging companies to take advantage
from big data revolution.
The first group is dealing with companies that are not mature enough to fully leverage the
advantage of big data analytics but they are running after the latest trends and try to follow the
hype. The problem with this approach that employee using analytics before they really
understand how they could successfully apply analytics. Although they can find relationship
and causation in the dataset they are arriving at false conclusions in most of the cases and just
wasting time or money.
The second group, signed as Islands of Excellence, refers to users who are choosing
applications to make a specific process optimized. This means that the company faces an issue
where employees make only a specific process excellent and it is not connected across the
supply chain, therefore it will not have any advantageous effects on other part of the chain.
Measurement Minutiaes are basically such firms which deal with a huge number of metrics and
they try to measure everything internal and external. Problem with that they are lost in the
metrics and sometimes it is hard for them to choose the correct one to get relevant and
informative results. It is a big challenge for these companies to identify which metrics are need
to be eliminated and create fewer customized metrics easier to manage.
The last group is the Analysis Paralysis for those ventures, which are complaining about having
too much data and not being able to “digesting” them. First of all, it is clear for them that they
have to do something with this vast amount of data gathered by Point of Sales terminals,
Data analytics in supply chain management 55
websites or social media, however they are unable to exploit analytics and technologies which
are available for them. Overall, they are in the state of paralysis not even know where to start
big data analytics (Sanders, 2016).
Data quality is another issue to that supply chain and logistics managers must pay attention.
Here, I would like to shortly summarise the dimensions of data quality, which are accuracy,
timeliness, consistency and completeness. The usefulness of data which clearly in connection
with its quality could affect decision-making and company cost with a great extent - in case of
an organisation poor data quality cost can be 8-12 % of the revenue, while in service industry
it is more severe with 40-60 %. The biggest challenge here is that big data are not in common
format that makes data transfer among different systems problematic, and after a while, these
“complications” slowly make data science, predictive analytics and big data unattractive for
firms (Hazen, et al., 2014).
Security obstacles are also coming into light with the inflow of large amount of information
including security issues especially in case of data ownership, data storage/accessibility and
data privilege (who can access the data) (Richey, et al., 2016).
Regarding data storage, the big question is whether a firm possess a private, internal systems
with an appropriate security for all the information. The authors also identified that in emerging
and developed countries strict legal regulations have the highest influence on decision making
considering supply chain’s big data security. On the other hand, countries have a different
measure of concern about data sharing and data ownership, which can be a real obstacle for a
global, multinational supply chain system. Firms operating in emerging economies are more
conscious for protecting customer’s privacy and information; what is more, their governments
put also a pressure on firms to do so. Even though managers from different nations have
different attitude to data security or to data sharing, most of them think data mining with their
discovered knowledge could considerably decrease “risk associated, ill-informed decision-
making” (Richey, et al., 2016, pp. 726).
It is clear that increased volume of data is advantageous to gather more information about
customers and supply chain participants, however firms didn’t expect that they have been so
rapidly drown by them. First problem is that they do not yet possess an adequate hardware to
store all of the inflowing data, plus they are also incapable to select information from them
which will be valuable later on, and get a rid of the useless amount. Furthermore, it is quite
challenging to correctly discard those data, which will never be useful, and keep the proper
ones in data storage, since their “long-term” value is still undiscovered. Nevertheless,
Data analytics in supply chain management 56
companies must pay attention security and data protection issues too when they decided to
throw away data from their systems (Richey, et al., 2016).
5. Conclusions
It takes only few minutes to generate vast amount of data which is continuously happening in
today’s new era of internet of things. The increase development of technology makes possible
that companies process these huge data amount and carry out real time analysis on them. As a
consequence, data driven decision-making gain higher popularity and become an essential asset
for taking the lead on the market as well as managing businesses successfully. This means that
with the usage of advantages of big data and data analytics companies can better reach service
level requirements, cut down on financial expenses ad cost of time regarding several business
processes or deliver optimisation on such areas, which was unimaginable before.
My aim was to collect and read as many research paper as possible in this topic and then try to
find a good categorisation of them, besides that I also show all the challenges, opportunities
and drawbacks of this new phenomena.
It was seen that from the three sub categories of business intelligence and advanced analytics –
descriptive, predictive and prescriptive analytics - most of the researcher used techniques
related to predictive analytics when they set up their own model and algorithms for a specific
supply chain problem. Data mining and machine learning algorithms were also in the focus as
scientists mainly used cluster, regression and relational analysis in order to make their model
functioning on big data set.
Furthermore, smart manufacturing and intelligent shop-floor were two new inventions where
the scientists goal was to create machines with artificial intelligence in such way that they could
already be able to learn from past actions and become a self-aware, self-maintain tools in
production.
There were few cases when the authors came up with less popular methods used in data mining
for making predictive analytics work. Good examples were the intelligent inventory system
with its classification-based anomaly detection and the dynamic forecasting model with
machine learning techniques, which could change a demand-driven inventory system to data-
driven ones or artificial neural network model applied to show the most optimal path within
transportation planning.
Considering managerial perspective there are also several advantages of using data analytics
and exploiting big data along the whole supply chain, beginning from customer demand till
purchasing from suppliers. First of all, the decision-makers can gain a deeper insight into
customer needs, then analytics can help in better managing stock in and stock out
Conclusions 58
interdependencies as well as better predicting the health and maintenance of machines, and
finally they enhance the visibility of purchasing goods. Taking a look at the two ends of the
supply chain it can be said that big data analytics enable businesses to carry out real-time vendor
management through which they can strengthen their negotiating power. At the same time, they
can also complete on-time service and providing just-in time feedbacks to customer complaints
on the other end of the chain.
Nevertheless, some challenges and barriers during the implementation and application of big
data analytics were also discovered by the researchers. One of the main hurdles are the lack of
common systems and standards at the different supply chain participants and the inappropriate
implementation of data protection and security. With the huge inflow of personal private or
commercial secret data, the companies must control intensively data sharing and accessibility
which would require much more skilled workforce in IT field. Moreover, it is usually a case in
immature businesses that they are incapable to choose the proper methods, unable to handle the
enormous inflow of data and also struggling with using only quality data for decision-making.
Finally, yet importantly, managers’ mentality is also a barrier in such cases when the
management does not want to accept and see the benefits of the data-driven environment.
Consequently, these companies could easily loose market and lag behind their competitors after
a while.
Besides all these facts, it has also become clear that there is a lack of literature and good
algorithms in some part of supply chain (e.g. strategic and distribution network planning).
Therefore, the researchers who has already conducted some experiment in these areas are
suggesting some further research questions and motivating their scientific colleagues to try out
other data mining approaches on similar optimization problems.
All in all, it is not doubtful that big data analytics is forging ahead with great intensity and a
large percentage of companies will allocate their resources to invest in it in the future. Parallel
to this, data scientists job will become more popular and other workforce will be substituted by
self-working and self-learning algorithms and machines in many business areas, since they will
be able to deliver tasks with significantly less mistakes. Today’s supply chain leaders need to
identify all these opportunities and challenges to know which capabilities of tomorrow they
must require from their people in order to make an organisation ready for this new emerging
trend.
Bibliography 59
Bibliography
Agrawal, R. & Srikant, R., 1994. Fast Algorithms for Mining Association Rules in Large
Databases. San Francisco, Morgan Kaufmann Publishers Inc., pp. 487-499.
Andersson, J. & Jonsson, P., 2018. Big data in spare parts supply chains: The potential of using
product-in-use data in aftermarket demand planning. International Journal of Physical
Distribution & Logistics Management, 16 February , 48(5), pp. 524-544.
Arias, M. B. & Bae, S., 2016. Electric vehicle charging demand forecasting model based on big
data technologies. Applied Energy, 1 December, Volume 183, pp. 327-339 .
Bae, J. K. & Kim, J., 2011. Product development with data mining techniques: A case on design
of digital camera. Expert Systems with Applications, 1 August, 38(8), pp. 9274-9280.
Bakhsi, K., 2012. Considerations for Big Data: Architecture and Approaches, s.l.: Proceedings
of the IEEE Aerospace Conference.
Benabdellah, A. C., Benghabrit, A., Bouhaddou, I. & Zemmouri, E. M., 2016. Big Data for
Supply Chain Management: Opportunities and Challenges. International Journal of Scientific
& Engineering Research, November, 7(11), pp. 20-26.
Bhagat, A., 2015. Understanding Big Data: Framework and Tools for Massive Data Storage
and Mining. 3(6), pp. 305-308.
Bichescu, B. C. & Fry, M. J., 2007. Vendor-managed inventory and the effect of channel power.
OR Spectrum, 29 September, 31(1), pp. 195-228.
Bose, R., 2008. Advanced analytics: opportunities and challenges. 18 September, 109(2), pp.
155-172.
Breiman, L., 2001. Random Forests. Machine Learning , 45(1), pp. 5-32.
Brooks, R. & Dahlke, K., 2017. Artificial Intelligence vs. Machine Learning vs. Data Mining
101 – What’s the Big Difference?. [Online]
Available at: https://guavus.com/artificial-intelligence-vs-machine-learning-vs-data-mining-
101-whats-big-difference/
[Accessed 20 November 2018].
Burns, R. B. & Burns, R. A., 2008. Business Research Methods and Statistics Using SPSS.
USA: SAGE Publications Ltd.
Carmichael, I. & Marron, J., 2018. Data science vs. statistics: two cultures?. Japanese Journal
of Statistics and Data Science, 14 May, 1(1), pp. 117-138.
Chen, H., Chiang, R. H. & Storey, V. C., 2012. BUSINESS INTELLIGENCE AND
ANALYTICS: FROM BIG DATA TO BIG IMPACT. MIS Quarterly, December, 34(4), pp.
1165-1188.
Bibliography 60
Chopra, S. & Meindl, P., 2013. Supply Chain Management: Strategy, Planning and Operation.
5.ed. s.l.:Peardon Education.
Chritopher, M. & Towill, D., 2001. An Integrated Model for the Design of Agile Supply
Cohen, R. P., 2013. EDI Basics How Successful Businesses Connect, Communicate, and
Collaborate Around the World. Gaithersburg: GXS.
Cordeau, J.-F., Laporte, G., Martin W.P. Savelsbergh, M. W. & Vigo, D., 2007. Vehicle
Routing. Handbooks in Operations Research and Management Science, Volume 14, pp. 367-
428.
Dangeti, P., 2017. Statistics for Machine Learning. Birmingham: Packt Publishing Ltd..
De Mauro, A., Greco, M. & Grimaldi, M., 2016. A formal definition of Big Data based on its
essential features. Library Review, 65(3), pp. 122-135.
Dean, J. & Ghemawat, S., 2010. MapReduce: A Flexible Data Processing Tool.
Communications of the ACM, January, 53(1), pp. 72-77.
Demchenko, Y., Grosso, P. & de Laat, C., 2013. Addressing big data issues in Scientific Data
Infrastructure, San Diego : 2013 International Conference on Collaboration Technologies and
Systems (CTS).
Derakhshan, R., Orlowska, M. E. & Li, X., 2007. RFID Data Management: Challenges and
Opportunities. USA, IEEE Xplore.
Donoho, D., 2017. 50 years of Data Science. Journal of Computational and Graphical Statistics
, 19 December, 26(4), pp. 745-766.
Elgendy, N. & Elragal, A., 2014. Big Data Analytics: A Literature Review Paper. St.
Petersburg, 14th Industrial Conference, pp. 214-227.
EMC Education Services, 2015. Data Science & Big Data Analytics: Discovering, Analyzing,
Visualizing, and Presenting Data. 1.ed. Indianapolis: John Wiley & Sons Inc..
Engel, T. et al., 2014. A Conceptual Approach for Optimizing Distribution Logistics using Big
Data. USA, s.n.
Ertel, W., 2017. Introduction to Artificial Intelligence. 2.ed. Switzerland: Springer International
Publishing AG.
European Union, 2013. Opinion 03/2013 on purpose limitation published by Article 29 Data
Protection Working Party. [Online]
Available at: https://www.lexology.com/library/detail.aspx?g=ddf0de93-3ced-4887-bebd-
af3ed8f62aa2
[Accessed 22 June 2018].
Bibliography 61
Evans, J. R., 2017. Business Analytics. 2.ed. United Kingdom: Pearson Education Limited.
Gandomi, A. & Haider, M., 2014. Beyond the hype: Big Data concepts, methods, and analytics.
International Journal of Information Management, 35(2015), pp. 137-144.
Goertzel, B. & Pennachin, C., 2007. Artificial General Intelligence (Cognitive Technologies).
Germany: Springer-Verlag Berlin Heidelberg.
Gronwald, K.-D., 2017. Integrated Business Information Systems: A Holistic View of the Linked
Business Process Chain ERP-SCM-CRM-BI-Big Data. Berlin: Springer-Verlag GmbH.
Gunasekaran, A., Mcgaughey, R. & Patel, C., 2004. A Framework for Supply Chain
Performance Measurement. International Journal of Production Economics , Fabruary, 87(3),
pp. 333-347.
Hammer, C. L., Kostroch, D. C. & Quirós, G., 2017. Big Data: Potential, Challenges, and
Statistical Implications. Staff Discussion Notes, September, 2017(6).
Han, J. & Kamber, M., 2000. Data Mining: Concepts and Techniques. USA: Morgan
Kaufmann Publishers.
Harrington, P., 2012. Machine Learning in Action. 1. ed. New York: Manning Publications Co.
Hazen, B. T., Boone, C. A., Ezell, J. D. & Jones-Farmer, L. A., 2014. Data quality for data
science, predictive analytics, and big data in supply chain management: An introduction to the
problem and suggestions for research and applications. International Journal of Production
Economics, Volume 154, pp. 72-80.
Hernán, M. A., Hsu, J. & Healy, B., 2018. Data science is science’s second chance to get causal
inference right. A classification of data science tasks. CoRR, abs/1804.10846.
Hofmann, E., 2017. Big data and supply chain decisions: the impact of volume, variety and
velocity properties on the bullwhip effect. 55(17), pp. 5108-5126.
Hofmann, E. & Rutschmann, E., 2018. Big data analytics and demand forecasting in supply
chains: a conceptual analysis. The International Journal of Logistics Management, 29(2), pp.
739-766.
Huang, T.-M., Kecman, V. & Kopriva, I., 2006. Kernel Based Algorithms for Mining Huge
Data Sets Supervised, Semi-supervised, and Unsupervised Learning. Netherland: Springer-
Verlag Berlin Heidelberg .
Hu, H., Wen, Y., Chua, T.-S. & Li, X., 2014. Toward Scalable Systems for Big Data Analytics:
A Technology Tutorial. Issue 2, pp. 652-687.
Ittman, H. W., 2015. The impact of big data and business analytics on supply chain
management. Journal of Transport and Supply Chain Management, 9(1).
Bibliography 62
Jia, X., Feng , Q., Fan, T. & Lei, Q., 2012. RFID technology and its applications in Internet of
Things (IoT). 2012 2nd International Conference on Consumer Electronics, Communications
and Networks (CECNet), pp. 1282-1285.
Keifer, S., 2010. Beyond Point of Sale Data - Looking Forward, Not Backwards for Demand
Forecasting. [Online]
Available at: http://www.gxs.fr/wp-content/uploads/wp_beyond_point_of_sale_data.pdf
[Accessed 11 November 2018].
Kersting, K., 2018. Machine Learning and Artificial Intelligence: Two Fellow Travelers on the
Quest for Intelligent Behavior in Machines, USA: Front. Big Data.
Kitchin , R. & McArdle, G., 2016. What makes Big Data, Big Data? Exploring the ontological
characteristics of 26 datasets. Big Data & Society, 17 February, pp. 1-10.
Klumpp, M. & Kandel, C., 2011. GPS-BASED REAL-TIME TRANSPORT CONTROL FOR
PRODUCTION NETWORK SCHEDULING SIMULATION. Portugal, The 2011 European
Simulation and Modelling Conference.
Koturwar, P., Girase, S. & Mukhopadhyay, D., 2015. A Survey of Classification Techniques in
the Area of Big Data, India: arXiv.
Kurasova, O. et al., 2014. Strategies for Big Data Clustering. Cyprus, 2014 IEEE 26th
International Conference on Tools with Artificial Intelligence.
Lee, J., Kao, H.-A. & Yang, S., 2014. Service innovation and smart analytics for Industry 4.0
and big data environment. s.l., Elsevier B.V..
Leveling, J., Edelbrock, M. & Otto, B., 2014. Big Data Analytics for Supply Chain
Management. Malaysia, IEMM.
Li, H.-L., 1999. Incorporating Competence Sets of Decision Makers by Deduction Graphs.
Operations Research, 1 April, 47(2), pp. 175-344.
Lu, D., 2011. Fundamentals of Supply Chain Management. 1. ed. Denmark: Ventus Publishing
Aps.
Lummus, R. R. & Vokurka, R. J., 1999. Defining supply chain management: a historical
perspective and practical guidelines. Industrial Management & Data Systems, 99(1), pp. 11-17.
Maimon, O., Khmelnitsky, E. & Kogan, K., 1998. Optimal Flow Control in Manufacturing
Systems - Production Planing and Scheduling. Applied Optimization.
Maimon, O. & Rokach, L., 2005. Introduction to Knowledge Discovery in Databases. In: O.
Maimon & L. Rokach, 2005.Data Mining and Knowledge Discovery Handbook. ed. Boston:
Springer, pp. 1-17.
McCarthy, R. V., McCarthy, M. M., Ceccucci, W. & Halawi, L., 2019. Applying Predictive
Analytics: Finidng Value in Data. 1.ed. Basel: Springer International Publishing.
Bibliography 63
Meyr, H., Wagner, M. & Rohde, J., 2002. Structure of Advanced Planning Systems. In: Supply
Chain Management and Advanced Planning. Germany : Springer, Berlin, Heidelberg, pp. 99-
104.
Mikavica, B., Kostić-Ljubisavljević, A. & Radonjić Đogatović, V., 2015. BIG DATA:
CHALLENGES AND OPPORTUNITIES IN LOGISTICS SYSTEMS. Belgrade, s.n.
Min, H., 2016. Global Business Analytics Models. New Jersey: Pearson FT Press.
Myerson, J. M., 2007. RFID in the Supply Chain: A Guide to Selection and Implementation.
1.ed. New York: Auerbach Publications.
Ohlhorst, F., 2012. Big Data Analytics: Turning Big Data into Big Money. USA: John Wiley
& Sons Inc..
Pandey, K. K., Yadu, R. K., Dwivedi, A. & Shukla, P. K., 2015. A Analysis of Different Type
of Advance database System For Data Mining Based on Basic Factor. International Journal on
Recent and Innovation Trends in Computing and Communication, 3(2), pp. 456-460.
Prasad, S., Zakaria, R. & Altay, N., 2016. Big data in humanitarian supply chain networks: a
resource dependence perspective. Annuals of Operations Research; S.I.: Big Data Analytics in
Operations & Supply Chain Management., 4 August.
Provost, F. & Fawcett, T., 2013. Data Science for Business: What You Need to Know about
Data Mining and Data-Analytic Thinking. 1.ed. California: O'Reilly Media Inc..
Quintero, D. et al., 2015. IBM Software Defined Infrastructure for Big Data Analytics
Workloads. 1.ed. USA: IBM Redbooks.
Richey, R. G., Hall, K. L. & Adams, F. G., 2016. A global exploration of Big Data in the supply
chain. International Journal of Physical Distribution & Logistics Management, September,
46(8), pp. 710-739.
Russom, P., 2011. BIG DATA ANALYTICS. TDWI best practices report.
Rynarzewski, T. & Szymczak, M., 2016. Changes adn Challenges in the Modern World
Economy. Poznan: Pueb Press.
Sanders, N. R., 2016. How to Use Big Data to Drive Your Supply Chain. California
Management Review, 58(3), pp. 26-48.
Sanislav Teodora & Miclea, L., 2012. Cyber-physical systems - Concept, challenges and
research areas. Control Engineering and Applied Informatics, 14(2), pp. 28-33.
Sathi, A. D., 2012. Big Data Analytics. Boise: MC Press Online, LLC.
Bibliography 64
Schoenherr, T. & Speier-Pero, C., 2015. Data Science, Predictive Analytics, and Big Data in
Supply Chain Management: Current State and Future Potential. Journal of Business Logistics,
36(1), pp. 120-132.
Shenle, P., Ballot, E., Montreuil, B. & Huang, G. Q., 2017. Physical Internet and Interconnected
Logistics Services: Research and Applications. International Journal of Production Research,
55(9), pp. 2603–2609.
Shetty, D. V. & Chidimar, S. J., 2016. Comparative Study of SQL and NoSQL Databases to
evaluate their suitability for Big Data Application. pp. 314-318.
Siegel, E., 2013. Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die.
1.ed. 2013: Wiley Publishing.
Simchi-Levi, D., Kaminsky, P. & Simchi-Levi, E., 1999. Designing & Managing the Supply
Chain: Concepts, Strategies & Case Studies. 2.ed. s.l.:McGraw-Hill Higher Education.
Stadtler, H., 2004. Supply chain management and advanced planning––basics, overview and
challenges. European Journal of Operational Research, 6 May, p. 575–588.
Tan, K. H. et al., 2015. Harvesting big data to enhance supply chain innovation capabilities:
Ananalytic infrastructure based on deduction graph. International Journal of Production
Economics, 5 January, Band 165, pp. 223-233.
Tan, M. H. & Lee, W. L., 2015. Evaluation and Improvement of Procurement Process with
Data Analytics. International Journal of Advanced Computer Science and Applications, 6(8),
pp. 70-80.
Tan, P.-N., Steinbach, M. & Kumar, V., 2006. Introduction to Data Mining. 1.ed. Boston:
Pearson Education Inc..
Trkman, P., McCormack, K., Valadares De Oliveira, M. P. & Bronzo, M., 2010. The impact of
business analytics on supply chain performance. Decision Support Systems, 30 June.
Tupper, C. D., 2011. 20 - The Enterprise Data Warehouse. In: C. D. Tupper, 2011. Data
Architecture: From Zen to Reality. ed.1. USA: Morgan Kaufmann, pp. 357-368.
Vogt, H., 2009. Open Source Customer Relationship Management Solutions: Potential for an
Impact of Open Source CRM Solutions on Small- and Medium Sized Enterprises. Hamburg:
Diplomica Verlag GmbH.
Waller, M. A. & Fawcett, S. E., 2013. Data Science, Predictive Analytics, and Big Data: A
Revolution That Will Transform Supply Chain Design and Management. Journal of Business
Logistics, 11 June.
Bibliography 65
Wang, G., Gunasekaran, A. & Ngai, E. W., 2016. Distribution network design with big data:
model and analysis. Annals of Operations Research; S.I.: Big Data Analytics in Operations &
Supply Chain Management., 30 June.
Wang, L. & Alexander, C. A., 2015. Big Data Driven Supply Chain Management and Business
Administration. American Journal of Economics and Business Administration.
Whitteker, W., 2014. Point of Sale (POS) Systems and Security, USA: SANS Institute.
Wisner, J. D., Tan, K.-C. & Leon, G. K., 2015. Principles of Supply Chain Management: A
Balanced Approach. 4. ed. s.l.:Cengage Learning.
Yadava, H., 2007. The Berkeley DB Book. New York: Springer Verlag.
Zeng, X., Lin, D. & Xu, Q., 2011. Query Performance Tuning in Supply Chain Analytics,
China: 4th International Joint Conference on Computational Sciences and Optimization.
Zhong, R. Y., Xu, C., Chen, C. & Hunag, G. Q., 2015. Big Data Analytics for Physical Internet-
based intelligent manufacturing shop floors. International Journal of Production Research,
55(9), pp. 2610-2621.
Zhou, Q. et al., 2017. An Advanced Inventory Data Mining System for Business Intelligence.
USA, s.n.
Zhu, D., Du, H., Sun, Y. & Cio, N., 2018. Research on Path Planning Model Based on Short-
Term Traffic Flow Prediction in Intelligent Transportation System. Sensors, 5
December.18(12).
Zhu, J., 2013. POS Data and Your Demand Forecast. Procedia Computer Science, 17(2013),
pp. 8-13.
66
Appendix
Zusammenfassung
Ich erforsche einen sehr populäreren Bereich der heutigen Zeit in meiner Diplomarbeit, wobei
mein Ziel ist, die akademische Literatur zu studieren und ihre Ergebnisse zum Thema
Datenanalyse und Big Data im Supply Chain Management zusammenzufassen.
Im ersten Schritt versuche ich, die Konzepte zu klären, die sich auf die Bereiche Datenanalyse
sowie Big Data beziehen, sowie die Beziehung zwischen ihren Konzepten, Methoden und
Algorithmen. Dies war eine besonders anspruchsvolle Aufgabe, da es zu diesen Elementen der
Big Data Analytics noch keine allgemein anerkannte Terminologie und Taxonomie existieren.
Im nächsten Schritt kategorisiere ich die Forschungsarbeiten nach dem Element von dem
Supply Chain Planning Matrix und demonstriere die Tools, Methoden und Algorithmen, die
eingesetzt wurden, um Prozesse zu optimieren und die Effizienz an einem bestimmten oder
gesamten Teil der Kette zu steigern.
Abschließend gebe ich einen kurzen allgemeinen Überblick über die Vorteile,
Herausforderungen und Schwierigkeiten bei der Verwendung dieser Analysen.