0% found this document useful (0 votes)
12 views

Hora Paper 351 Visual Analytics Methodology

The paper reviews the methodologies of Visual Analytics within the context of big data, emphasizing the need for predictive analytics to support business decision-making. It identifies a gap in coherent methodologies that integrate descriptive, diagnostic, and predictive analytics, highlighting the importance of aligning human, computer, and reality aspects in the analytics process. The study proposes a comprehensive methodology that encompasses various perspectives to improve the implementation of Visual Analytics in organizational contexts.

Uploaded by

Asraf Ahmed II
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Hora Paper 351 Visual Analytics Methodology

The paper reviews the methodologies of Visual Analytics within the context of big data, emphasizing the need for predictive analytics to support business decision-making. It identifies a gap in coherent methodologies that integrate descriptive, diagnostic, and predictive analytics, highlighting the importance of aligning human, computer, and reality aspects in the analytics process. The study proposes a comprehensive methodology that encompasses various perspectives to improve the implementation of Visual Analytics in organizational contexts.

Uploaded by

Asraf Ahmed II
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/361594614

The Review for Visual Analytics Methodology

Conference Paper · June 2022


DOI: 10.1109/HORA55278.2022.9800100

CITATIONS READS
0 288

4 authors, including:

Suraya Yaacob Roslina Ibrahim


Universiti Teknologi Malaysia Universiti Teknologi Malaysia
46 PUBLICATIONS 175 CITATIONS 86 PUBLICATIONS 1,044 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

User Acceptance View project

Usability View project

All content following this page was uploaded by Suraya Yaacob on 04 July 2022.

The user has requested enhancement of the downloaded file.


The Review for Visual Analytics Methodology
Zaifulasraf Ahmad Suraya Yaacob Roslina Ibrahim Wan Farahwani Wan Fakhruddin
Advanced Informatics Advanced Informatics Advanced Informatics Social Science and Humanity
Department, Universiti Teknologi Department, Universiti Teknologi Department, Universiti Teknologi Faculty, Universiti Teknologi
Malaysia, Malaysia, Malaysia, Kuala Lumpur Malaysia,
Kuala Lumpur Malaysia Kuala Lumpur Malaysia Malaysia Kuala Lumpur Malaysia
[email protected] [email protected] [email protected] [email protected]

Abstract— Big data usage evolves from previously looking into the an essential role in understanding and fitting the analytics
capacity of big data's descriptive and diagnostic perspectives into prediction in their business decisions. Hence, there is a need to
currently feeding the demands for predictive big data analytics. embed prediction in visual analytics and becomes balanced to
The needs come about due to organizations that crave predictive provide understandable predictive insights. When carefully
analytics capabilities to reduce risk, make intelligent decisions, and executed, it can provide practical insights and predictions by
generate different customer experiences. Similarly, visual analytics
play an essential role in understanding and fitting the analytics
analyzing current and historical data.
prediction in their business decision. Hence, the combination of Identifying a clear and practical methodology is critical to
descriptive, diagnostics and predictive within Visual Analytics moving the demand for prediction into valuable business
emerges as a balanced field to provide understandable predictive
practice. To a larger extent, it strengthens and clarifies the usage
insight. Due to the organizational demand and multi-discipline
area, the approach to developing visual analytics is still uncertain and implementation of prediction in the big data lifecycle. Due
in the Big Data Project Lifecycle from methodological perspectives. to the organizational demand and multi-discipline area,
While there are a few potential methodological approaches that developing Visual Analytics still lacks in the Big Data Project
could be used for visual analytics, they are scattered across Lifecycle from methodological perspectives. The methodology
numerous academic research and industrial practice. To date, should encompass a multi-perspective approach that
there is no coherent review and analysis of the work that has been incorporates business understanding, data integration, statistics,
explored specifically for Visual Analytics methodology. This paper assumptions, modelling, visualization and analytical reasoning
reports on a review of previous literature concerning how Visual within the big data lifecycle. Few potential methodological
Analytics has been executed in the big data life cycle to address the
approaches could be used. However, previous findings obtained
gap. The review is organized in this study from three perspectives:
i) general ICT-related methodology (e.g. SDLC, Agile, DevOps), ii) in past studies are scattered across numerous conferences and
Data Science-related methodology (e.g. CRISP-DM, SEMMA, industrial practices. Consequently, there seems to be a lack of
KDD) and iii) Visual Analytics-related methodologies in which coherent review and analysis of the work that specifically
each method will be benchmarked based on the Visual Analytics explored the use of Visual Analytics methodology. Thus, this
major part of reality, computer and human, in terms of its width, paper reports on a review of previous literature concerning how
depth, and flows. This study found insufficiencies, non-specific and Visual Analytics has been executed in the big data lifecycle.
vague conditions in handling the Visual Analytics when using
current methodological approaches based on the review conducted. II. VISUAL ANALYTICS
The paper also highlights the Visual Analytics-related
methodological review, which can shed some light on the Visual Analytics combines automated analysis techniques
approaches and ways of implementing analytics in the big data with interactive visualization and concerned with the science of
lifecycle, which can be beneficial for future studies in proposing a analytical reasoning from raw data, which is often presented in
more comprehensive methodology for Visual Analytics in the big dynamic and interactive visual interfaces (e.g. dashboard, graph
data lifecycle. or map) [5]. In current demand, visual analytics is regarded as a
Keywords—process; methodology; visual analytics; big data visual analytics platform that incorporates or enhances
analytics. diagnostics and predictive analytics, providing a predictive
event pattern with interactive visual representation[3]. From the
I. INTRODUCTION automated analysis perspectives, it focuses on the analysis of the
historical data to diagnose the situation or predict future events.
Big data refers to massive, complex structured and Hence, it is concerned in identifying the root causes for the
unstructured data sets that are rapidly generated and transmitted situation as a basic to predict future probabilities and trends
from a wide variety of sources. It extracts value from the data based on observed events, encompassing a multi-perspective
and analyzes insights that lead to better decisions and strategic approach that includes integrated reasoning, pattern recognition,
business moves. Big data usage evolves from descriptive, and predictive modelling associated with the domain knowledge
diagnostics, and more recently to predictions capabilities. As a [4].
result, Predictive Visual Analytics is currently in high demand
for business and organization [1]. This is because organizations Visual analytics focuses more on the computational part and
require predictive capabilities to reduce risk, make intelligent goes in-depth for its technicality and statistics. It involves
decisions, and generate different customer experiences. It extracting information from large data sets to identify the pattern
attracts many industrial players to implement predictive and trends used to generate models and predict future outcomes
analytics in their business [2]. In parallel, visual analytics play and behaviors of interest [6]. It aims to anticipate unknown
future actions through data mining, statistics, modeling, deep

IEEE 978-1-6654-6835-0/22/$31.00 ©2022 IEEE


learning, artificial intelligence, and machine learning [7]. At the For Visual Analytics to become valuable in its use, it
other hand, Visual Analytics also focuses on the human element involves more than analytics modeling and precision. A report
and consists of a combination of visualization techniques and from Gartner (2017) [13] mentioned that “60% of Big Data
the process to extract usable data from a more extensive set of project will fail to go beyond piloting and experimentation and
raw data to further enhancdecision-making efficiency and will be abandoned”. One of the reasons is that the
competence [8]. Another study [9] describes visual analytics as implementation of big data projects put less emphasis on the
an interactive knowledge discovery system; it purposely guides context and business requirement of advanced analytics in the
the user for critical insights via data exploration. However, VA organization. It must come with the ability to presuppose the
coverage is broader than merely focusing on the computational availability of knowledge and put it to use. Since analytics often
part. Understanding the situation of analytics context and human involves a business decision and direction, it tends to answer the
is essential for a complete ecosystem of analytics development. questions of “how” and “why”, which require an adaptation of
the analytics results to the organizational context. The most
Based on [10] (Sedig and Parsons 2013) and [12], generally, important part is to check the data availability. Data checking
there are three major parts of visual analytics. These parts are will help ensure the outcomes from Visual Analytics can fulfil
then been aligned with the big data lifecycle in an abstractive the organization’s requirements and context.
manner. There are four stages of the lifecycle; data collection,
data storage, data analytics, and knowledge creation (Alshboul, B. Computer
2015). In more recent research by Taleb (2018), he posited that A computer is an analytical engine that focuses on the
the lifecycle consists of five stages; data generation, data technical and scientific elements. It involves data understanding,
acquisition, data storage, data processing, and analytics and data cleansing, integration and uses of machine learning algorithms
visualization. and statistical analysis technique. Due to the latest analytics
Part 1 is concerned with the analytics context's reality, which engine types, the analytics engine must accommodate the latest
Sedig [11] termed as the ‘information space’. This part concerns analytics engine types, and the analytics engine must be able to
the natural and authentic organizational settings for analytics handle the consequences and relationship of data between
preparation that involve business activities and organizational descriptive-diagnostic-predictive analytics. The sequential is
settings. It is critical to understand the business process and important because descriptive analytics help the users
stakeholders as this will clarify their requirements and data understand the current situation, and diagnostic analytics help
availability in the organization. Part 2 is the computer part, the developer identify the root cause of the problem or situation.
which concurs with the automated analysis techniques and Hence, it is vital to have the iteration between data
mainly involves an analytical process and data. It is an analytical understanding, data integration, modelling and data
engine that involves data cleansing, integration and uses of visualization to get different analytical outputs. Furthermore,
machine learning algorithms, statistical analysis techniques and due to the complex domain's uncertainties, there will be many
how this output has been translated into representation and possibilities for prediction, trending, and pattern. Thus, the
interaction for the usage. Part 3 is the human part that acts as the process in the complex domain needs to be iterative. The sub-
final analytics process that concerns the users' practical analytics spiral process between data understanding-preparation-
knowledge and how it becomes valuable for the business and modeling-testing will help the developer and decision-maker
organization. When presenting and evaluating the analytics decide on the most relevant prediction outcomes based on the
outcomes to the users, the visualization and interaction elements organizational context.
to manipulate the cognitive and mental schema ensure that the C. Human
predictive outcomes will become easier to comprehend and
facilitate the business decision. Detailed descriptions for each The human part is concerned with the mental space and
part are presented in the subsequent sections. schema in understanding and perceiving the analytics outcomes
from Visual Analytics. The value of the outcomes is not only
A. Reality about how accurate the prediction results are, but it also involves
Reality is the context of use for Visual Analytics. Sedig [11] assessing the visualizations, interactions, and how these outputs
used the term ‘information space’ while Andrienko [10] used the can become knowledge to the users in improving the business
term reality. Reality is a crucial part of an analytics preparation decision. Since the evaluation of Visual Analytics in the real
that involves understanding the environment, activities, business environment is a critical aspect of moving research into practice
process and analytics requirements and facilitating business [14], a practical evaluation and acceptance of the Visual
decisions. As data are interconnected from one to the other, the Analytics process can help reduce the failures and increase value
relationship is quite complex to be implemented in an in its context use. One of the crucial challenges is that the nature
organization. Generally, an organization has a large number of of analytics related activities usually involves complex cognitive
entities (e.g departments, stakeholders, clients) which the data activities (CCA) such as decision making, strategy planning,
has non-linear relations and evolution over time. Hence, the sense-making and analytical reasoning [15]. The challenge
elicitation process to identify the analytics requirement needs to increases when more than one person does the analysis. Usually,
include an element of sensibility to the organizational context's the experts and decision-makers collaboratively communicate,
initial condition and validates correlation diagnostics and use, apply, and manipulate Visual Analytics to support and
analytics prediction. Therefore, it is imperative to study the convince the business decision. They all rely heavily on the
organization and business context as it can gauge a better analytical use of information, combine their human flexibility,
understanding of the organizational needs and requirements. creativity, and background knowledge with today's computers'
enormous storage and processing capacities to gain insight into

IEEE 978-1-6654-6835-0/22/$31.00 ©2022 IEEE


complex problem and challenges. In this condition, [5] SDLC is the software developers' methodology to produce
emphasized the importance of analytics to move well beyond high-quality software by establishing the whole development
human-user interface concerns and consider the situational process [16]. The SDLC also described problem-solving
context. In this condition, Visual Analytics usage needs to approaches, adopting general concepts and philosophies to
convey many people and each with varying information needs overcome problems in the software development process [17].
and may require varying interactions. Thorough understanding SDLC comprises five distinct phases: requirement analysis and
and characterization are needed for the human parts. It is not specification, design, implementation and unit testing,
only to assess the analytics but also to figure out the integration and system testing, and operation and maintenance
complexities of information, users, and their activities that [18]. The selection of the appropriate SDLC is essential when
Visual Analytics is meant to communicate. the system interface is built through a well-defined software
development process by ensuring a usable configuration and
D. Alignment between Reality-Computer-Human scalability [19], and its benefits to guide the developer in writing
There are different input and output from Parts 1, 2 and 3. a program [20]. However, the current executions and tools of
Demenchenko (2012) and Casado (2014) proposed that each SLDC are mainly for implementation purposes and have not
part and its level must be observed, and the lifecycle between covered the business requirement part [21], which is one of the
these parts does not adhere to the level of sequence alone, but study's primary motivations.
certain levels can be reversed to the previous step. Furthermore, As discovered by [22], Development & Operations
since the users are inside the reality part, it is important to align (DevOps) is “a software development methodology which looks
back between how the users can benefit from computer's to integrate all the software development functions, from
analytics that turned into knowledge that benefits the business development to operations, within the same cycle”. DevOps is
reality. The research proposed the methodology to cover four often viewed as an extension to the agile delivery of applications
parts based on the literature available for related methodology covering both operations and maintenance [23]. DevOps is
such as processes, approach, techniques, mode, and model extended from certain agile practices with a mix of patterns
building. There is a lack of standard and coverage from the intended to improve collaboration between development and
vertical range of these parts - reality, computer and human. The operation teams [24]. DevOps methodology seeks to provide
processes should have broader and deeper coverage from continuous integration, continuous testing, early analysis, etc.,
reality, computer and human parts. More importantly, the with short design/development cycles [25]. Many firms have
alignment, flow and inter-relationship between these parts must accelerated this process by adopting software delivery concepts,
also be considered. such as DevOps,, to coordinate and align disparate
communications and opposing development and operations
In the following section of the paper, the adequacy of current goals [26]. DevOps enables quick delivery of stable software
methodologies based on the required methodology coverage through continuous delivery, continuous deployment, and
will be analyzed. The review will be analyzed by benchmarking solving disagreements between developers and operations
current approaches based on the perspectives of in-width through shared responsibility for software delivery processes
(overall human-computer-reality parts), in-depth (the details [27].
guidelines for each part) and alignment between these parts. Meanwhile, the methodology of agile development
Moreover, the literature will describe the methodology emphasizes people rather than processes [28]. Kotaiah [29]
approaches from three perspectives following the Visual defines agile as the convergence of processes, guidelines and
Analytics closeness fields. First, the paper will look into multiple tasks from a different team to develop software. Agile
Information Communication Technology (ICT)-related approaches' uniqueness does not lie in their application or
methodologies in which general ICT project will follow. Since process, but in understanding people as crucial drivers for
Visual Analytics is one of the fields under the ICT umbrella, we project progress, combined with an intense focus on productivity
can see how ICT-related methodology might fit it. Secondly, the and flexibility [30]. The use of personal interactions encourages
paper will narrow down into Data Science-related methodology. exchanging information and quickly changes the process when
Among them, data mining-related methodology is the nearest changes are needed. The agile methodology aims to involve
and the stable field. Thirdly, the paper will zoom into Visual customers and eliminate communication gaps between
Analytics-related methodology that is new and varied in its customers and developers [31]. Moreover, agile methodology is
processes and flow. designed toward efficiency and provides slow error rates but
high quality and high customer loyalty by maintaining effective
II. ICT RELATED METHODOLOGIES
customer-project developer collaboration. [30] underlines the
Generally, Visual Analytics in the Big Data lifecycle is double vision of agile: the working code's constant integrity
categorized under the Information Communication (what exactly the team wants to achieve) and the goodwill of the
Technology's (ICT) umbrella field. Hence, ICT-related workforce's efficiency.
methodology is compulsory to be covered within analytics
The research found some strengths and weaknesses of
approaches. For further analysis, this research selects three
current ICT-related methodologies in handling Visual
prominent ICT methodologies: the Software Development Life
Analytics. In summary, Figure 1 compares each ICT-related
Cycle (SDLC), DevOps and Agile.
methodology with expected coverage in the vertical, horizontal
and alignment between the processes. As a well-established
methodology, most of CT-related methodologies have full

IEEE 978-1-6654-6835-0/22/$31.00 ©2022 IEEE


coverage from horizontal perspectives that cover reality, for Data Mining (CRISP-DM) remains the most popular
computer and human parts. The methodologies show a similar methodological process for analytics, data mining, and data
process that starts with reality in the organization, towards the science projects [32][33]. More than 40% of the developers use
design and development application based on users’ CRISP-DM from 2007 (42%) to 2014 (43%). Other than
requirement.. Following this, the application will be tested for CRISP-DM, 27.5% of the developers use their methodologies.
human usage. Hence, this situation indicates the importance of Meanwhile, a total of 8.5% of the developers use SEMMA,
these parts for a more comprehensive and whole ecosystem for 7.5% of the developers use the KDD Process, and 2% of the
ICT-related application and systems. Integrating business developers use a domain-specific methodology [34]. Hence,
analytics with the recognition of business requirements, the this research selects five top methodological processes from the
planning stage of DevOps, the requirement of agile, and the ranking: CRISP-DM, KDD, ASUM-DP, SEMMA, and TDSP
feasibility study of SDLC can be mapped for that purpose. for further analysis.
However, the analysis also found that DevOps and Agile
applications have an iterative cycle between processes but little CRISP-DM refers to the Cross-Industry Standard Process
regards for the alignment between reality and human parts. The for Data Mining and is a technique established in 1996 to
analysis found they emphasized the importance of iterative develop data mining projects. It consists of six main phases for
lifecycle between handling the dynamic changes and developing a Data Mining project and can include cycle
maintaining the system's operation. One of the interesting repetitions based on the developers' current requirements. These
findings is how Agile place requirements as the root part of the phases set up the business problem (Business Understanding),
iterative lifecycle. By giving priorities to the required part, the review the available data (Data Understanding), develop
ICT project development will change dynamically. analytic models (Data Preparation and Modeling), evaluate
results against business need (Evaluation) and deploy the model
Human Computer Reality (Deployment). The whole cycle is designed to be iterative,
Has wide coverage for PVA Phases repeating as necessary to keep models current and effective.
SDLC
CRISP-DM is still the existing standard for analytics, data
Feasibility Analysis Design Development Implementation mining, and data science projects [36] to conduct data analytics
DEVOPS
in industrial applications [37]. Despite its popularity, CRISP-
Plan Code Develop Deploy Operate DM has a few shortcomings to handle current BDA projects and
its environment. Since CRISP-DM has not been amended since
AGILE
its creation [38], the six high-level phases of CRISP-DM still
Requirement Design Implement Testing remain a good description of the analytics process and the
details' specifics be updated to become compatible with current
demand. The first shortcoming of CRISP-DM is it does not have
Shallow and no specific coverage for technical-analytics
process development. E.g. analytics insights, data quality,
Has Iterative cycle for dynamic
changes and maintenance an analytics approach. According to Foroughi [39], the analytic
modelling and data visualization. approach must recognize appropriate statistical or machine
Figure 1. The overall ICT-Related Methodologies learning techniques before entering the data-gathering steps.

The biggest shortcoming of ICT-related methodologies in Furthermore, Saltz & Shamsuhurin [40] mentioned that the
CRISP-DM might not be suitable for a big data project because
handling Visual Analytics is that the processes are too shallow
of the 5Vs characteristics of Big Data. Another shortcoming of
for Visual Analytics applications and needs more focus on
CRISP-DM is the lack of Business and Data Understanding
analytics specification. From a reality context, managing guidelines. The business understanding in CRISP-DM does not
business requirement must focus on the strategic-business indicate a data acquisition phase [37], the process of converting
process instead of operational-business processes since Visual data from the real world to be displayed, analyzed, and stored in
Analytics is meant to support the decision making instead of the the digital domain. Next, the limitation of the CRISP-DM is that
operational business workload. More specific technical, the project management tasks are not carried out. CRISP-DM is
modelling, and statistical elements need to be covered in-depth not an accurate method for project management since it contains
within the computer part processes. Finally, for the human part, the presumption that its user consists of only one person or a
the ICT-related methodologies focus more on the evaluation small scale project [41] whereby the team coordination,
part based on the User Acceptance Test of the application communication and prioritization needed for larger projects are
functionalities. It is contradictory since Visual Analytics ignored [42]. Some project management practises such as
focuses more on how the analytics outcomes should be quality management, or change management are also not
understood by the users and become knowledge that can be included in CRISP-DM [43]. Ponsard [44] realized that CRISP-
applied to facilitate the business in its context of use. DM suffers to deliver a good management viewpoint on
communications, knowledge and project aspects. As a result,
III. BDA RELATED METHODOLOGIES CRISP-DM has failed to underline more important steps and
BDA-related methodologies are the nearest fields for Visual milestones that can be enhanced progressively. Lastly, CRISP-
DM suffered from the absence of techniques (process or
Analytics. Most Visual Analytics use these methodologies to
procedure that need to follow) and tools (devices or application)
facilitate them during project development. Based on the real
that are recommended [45], obstructing an effective process and
usage and development, the Cross-Industry Standard Process

IEEE 978-1-6654-6835-0/22/$31.00 ©2022 IEEE


progress without spending more on work activities, revenue and solutions and intelligent applications efficiently.” (Microsoft,
workforce. 2020). TDSP breaks the analytics process into five high-level
The Knowledge Discovery in Databases (KDD) is a research phases: business understanding, data acquisition, modelling,
field that studies advanced technology and methodologies for deployment, and customer acceptance. TDSP is well designed
extracting previously unknown and possibly useful information for predictive analytics by using machine learning and artificial
from data. KDD breaks the analytics process into five high-level models [56]. Hence, it has improved CRISP-DM and KDD by
phases: selection, pre-processing, transformation, data mining, linking everything together in the TDSP stages. As its primary
and interpretation. KDD is a repetitive process in which focus lies on the computer part and data science technicalities,
assessment steps and mining can be improved, and new data can TDSP gives less attention to business understanding and
be consolidated and modified to get different and more accurate evaluation. Thus, there is a lack of guidance to offer insights
results. Napoli [46] describes the KDD process as interactive throughout communication, knowledge and project
and iterative, whereby the analyst will have complete control in management [40].
guiding and validating the extraction process. However,
previous research [45] pointed out a few shortcomings and To obtain a more detailed analysis of current BDA
disadvantages of KDD, such as lack of vital feedback loops, no methodologies, a thorough look at each process's limitation and
data understanding process offered by KDD, and do not clearly strengths must be made. All BDA methodologies discussed
define each method's function or technique process. Moreover, consist of three major parts of reality, computer and human,
Sone [47] realizes that KDD also does not cover the deployment except for SEMMA methodology as shown in Figure 2.
stage. Similar to CRISP-DM, KDD also does not clearly define Compared with others, SEMMA is even more limited in its
the techniques and tools used to perform each process in this scope on data mining's technical steps. SEMMA leaves out all
methodology. processes that involve understanding the current business and all
business-related processes [45] under reality part to proceed
The SEMMA acronym refers to the process of conducting a
with data sampling processes. All analytics methodology is
data mining project; Sample, Exploration,
designed to improve the business in some way. Subsequently, it
Modification/Manipulate, Model, and Assessment. This is critical to understand the current business process before
framework's main features are data extraction for random improving the new system.
sampling and the exploration of data trends [48]. Based on the
literature survey, the study found that SEMMA has its This initial business understanding process focuses on a
advantages, such as offering an easy understanding of the business method for identifying project goals and criteria and
process that allows the development of projects [49] and offers then translating information into a description of a data mining
solutions to business problems [50]. Besides that, SEMMA also challenge. Apart from SEMMA, all BDA methodologies have
focuses more on data mining's technical parts and ignores other been implemented in this process. However, not all of them are
business understanding processes. SEMMA starts with a given much emphasis on this process. For example, the KDD
phase and sampling deal with business issues [57] define the
statistically representative sample of data using sampling
detail process in business requirements. Even CRISP-DM has
strategies. Sampling is a method used to pick a group subset to
been involved with understanding the project objectives and
draw statistical conclusions and approximate the entire requirements from a business perspective but still lack business
population's characteristics [51]. Therefore, client information guidance [58], such as from the business analytics view. Based
and understanding must be handled externally [48], [49] and on the analysis, the study also found that the analytics point of
[36]. Meanwhile, the disadvantages of SEMMA include a view also lacks ASUM-DM (which has the same steps as
disregard for the business understanding [45], excluding the CRISP-DM) and TDSP.
deployment aspects [48] and the evaluation pattern [45].
As Muller [59] mentioned, the emergence of big data big has
The Analytics Solutions Unified Method for Data Mining boosted the need for business analytics from organizations. The
research stream centered on strategically tailored analytics to
(ASUM-DM) is an extended and refined version of CRISP-DM
build long-term business value to understand BDA better [60].
for implementing data mining and predictive analytics projects,
Implementing the right business analytics strategy creates a
which IBM created in 2015. ASUM-DM breaks the analytics basis for improvement and agility in the current business
process into five high-level phases: analyze, design, configure, environment [61]. In the technical and scientific part, the study
deploy, and operate. However, few limitations have been observed that most BDA methodologies had shown solid
recognized from previous studies, including its tendency to be guidelines for the most advanced data practises, but there are
very comprehensive, which inadvertently allows developers to still challenges for them. For example, CRISP-DM and ASUM-
bypass it to something uncomplicated [52]. Next, ASUM-DM DM are involved with present-day technologies such as machine
is an improvement in CRISP-DM deployment, but the learning [52] and adapted new challenges in data mining [58].
weakness in development still exists [54]. Furthermore, Michalczyk [45] discovered that CRISP-DM and KDD did not
Schafer, [55] recognized that quality management's viewpoint provide techniques and tools for each method and process used
is not included to assure the methodology's quality. by the teams, which might need to know how to do the process
correctly. SEMMA also showed the same trend as there is little
Team Data Science Process - TDSP is “an agile, iterative guidance for technique selection presented in this methodology
data science methodology to deliver predictive analytics [62]. In contrast, ASUM-DM follows the same steps as CRISP-
DM but emphasizes operational, deployment, and project

IEEE 978-1-6654-6835-0/22/$31.00 ©2022 IEEE


management phases [63]. Hence, Saltz [42] noticed that TDSP teams. Due to this, feedback loops should be an integral
neglect how they can iterate over a sequence of analyses to component of every highly effective and beneficial method. The
achieve a deeper interpretation of the data and give insights and most crucial aspect is how all the methodologies encounter the
recommendations to their customers. In the meantime, many current demand of the 5 V’s of Big Data; volume, velocity,
data scientists struggle with TDSP's forced fixed-length variety, veracity and value. Shoikova [64] mentioned that the
planning sprints, which are complex and exhausting [42]. current methodology cannot satisfy those 5 V’s demands due to
lack of standardization in the implementation of Big Data
initiatives. The other methodologies also do not emphasize
enough the current demand for Big Data since it has become a
prominent part of business and technology. Moreover, the
absence of the deployment step through actual data validation
results executed [48] is another significant limitation in
SEMMA associated with the human part. This missing part not
only evident in SEMMA but also in KDD methodology [65].
Both methodologies are linked to evaluation and assessment
stages, but in the end, those methodologies still lack another
essential part which is the deployment stage [65].
The deployment of BDA application is important in order to
enable the use of the model to carry out future activities such as
to sustain competitive advantage and strategy guidelines [66] by
making forecasts and business decisions, incorporating data
mining functions into the model [67] and providing reports that
help users to forecast, view patterns or compare models [68]. In
conclusion, the enhancement of the current methodology should
focus on the business understanding phase, recommend the
appropriate techniques and tools for each phase, provide the
effective loops between phases and the details and specific needs
to be updated and that is compatible with the current demand of
5V’s characteristics.
IV. VISUAL ANALYTICS RELATED METHODOLOGIES
There are five top process for Visual Analytics (VA). VA
Process 1 is, according to Alomar [69], comprises few methods
such as information gathering, data pre-processing, data
analysis, data visualization, interaction and decision making.
VA Process 2 is following Asli [70]. Their research was
proposing a visual analytics design that is expected to support
exploratory analytics. The process started with the raw data,
data abstraction, data model, output and insights. Those
processes will be supported by the proposed visual analytics
design. For VA Process 3, Andrienko [10] proposed a
representation of the visual analytics process as a goal-oriented
Figure 2. The overall BDA-Related Methodologies workflow, where the primary goal is to create a "behavioral
Apart from that, most of the current methodology also model of a subject". The proposed visual analytics process was
struggle with their feedback loops. The KDD and SEMMA divided into three main categories, (i) reality, which refers to
analysis outcome shows that there is no feedback loop that the real world: (ii) computer; which represents the computer
clearly defines those methodologies. While CRISP-DM also activity and (iii) human, which relies on human capability.
lacks iterative feedback, for example, after the deployment step, On the other hand, VA Process 4 is based on the latest
there is no feedback loop to the previous steps if anything goes research of Cui [71]. The research proposed a visual-analytics
wrong between that process. However, the TDSP seems to process as a sense-making loop. There are six steps in this
provide a complete feedback loop by showing the interaction in visual-analytics process: pre-processing, algorithmic analysis,
each of their processes. The successful feedback loop should visualization, insightful knowledge, interactions, and lastly, to
delegate the right task to the right person to complete a specific regenerate an updated visualization. Finally, VA Process 5 is
outcome. Integrating feedback loops into the methodology based on Nguyen [72] who proposed the visual analytics
project workflow is important to ensure that the developer can framework for complex genomics data to achieve an efficient
collect fast and frequent feedback from the users and increases knowledge discovery process. The proposed framework consists
the chances to adapt to emerging changes. Feedback loops are of multiple components that reflect a complete analytic cycle
the way to increase productivity by identifying areas for starting from the goal or question, in which the knowledge is
improvement and encouraging collaborative works among gained through the components/process of (i) pre-observation,

IEEE 978-1-6654-6835-0/22/$31.00 ©2022 IEEE


(ii) automated analysis and (iii) visual analytics. All these have a product – the approaches for each section is narrowing down,
been summarized as in Figure 3. in which ICT-related methodology is the most general one,
followed by BDA-related methodology and then Visual
Analytics-related methodology. As shown in Figures 2 to 5,
most available processes focus on scientific or straight forward
domains. Most of the approaches focus on the computer part,
which involves data, machine learning algorithms, and statistical
analysis. Less concern has been given to the reality and human
parts such as organizational context, business activities and
analytics acceptance among the users and decision-makers.
Thus far, the analysis found a lack of methodology to
comprehensively guide the Visual Analytics in the big data
lifecycle. The reviews have shown lacking of three perspectives
which are i) in-width of the overall processes, ii) in-depth of the
details guidelines for each process, and iii) the alignment and
flow among the processes.
From the overall perspectives, it is essential to relate
between Visual Analytics and business context since it is in high
demand by the business and organization [1], [81]. The
proposed methodology should then identify the requirements
from the business requirements and ensure the analytics output
must meet the business needs. However, the review showed that
the current approaches are less focused and not handling the
Figure 3. The overall Visual Analytics-Related Methodologies business context's reality. The focus is still more on the
Based on the study of the top 5 current Visual Analytics computer part and only partly handle the human part.
processes, the study observes that all studies focus more on the Furthermore, due to early inter-relation between visualization
human part, which is visualization. The visualization process in and diagnostics and predictive components, the current
visual analytics is involved with the data query and machine approaches from Visual Analytics focus on visualization in the
learning methods and interact with human-computer interaction human part while others are weighted on the modelling in the
to enrich user with insights as an outcome. This human part that computational part. Furthermore, there are none of the
relies on human capabilities such as interaction, decision- methodologies and processes mentioned about the analytical
making, knowledge, and validation will improve the analysis reasoning component. As in the analytics field, it is critical to
process. However, in reality, only half of them participate with emphasise analytical reasoning elements as the main component
business understanding, such as Asli through problem of Visual Analytics outcomes. Thus, it is still inadequate for big
characterization, Andrienko et al. by understanding the reality, data projects to rely on the current approaches.
and Nguyen et al. starting with question process. Compared with Since there is still no yet completed on in-width of the field's
other processes, the VA process also does not emphasize the overall approach, most of the Visual Analytics implementation
computer part. Most of them only involved data pre-processing relies on Data Science related methodology such as CRISP-DM,
or observation and data analysis. There is no specific process to SEMMA and TDSP. However, these approaches lack depth and
understand data assets better through data understanding or details guidelines that need to be updated to become compatible
cleaning and transforming raw data to analyze data preparation. with current demand. There is a lack of techniques (process,
Most of the VA process relies on feedback loops to maintain or procedure or phases), tools (devices or application) and analytics
adopt emerging changes in each process. However, there is still approaches such as recognizing appropriate statistical,
a lack between those processes. identifying machine learning techniques and data quality
management [39]. Furthermore, the deficiency of details in the
phases of business understanding and data understanding is
V. THE FINDINGS critical to be improvised, such as the absence of the acquisition
The analysis, and shortcomings for each section of ICT, for predictive requirement, vague on the business problem, and
BDA and Visual Analytics related methodologies have been unfit in handling the complexities of data evolution [37].
highlighted and explained in their respective sections. Moreover, these methodologies cannot carry out the project
Therefore, the findings section focuses on providing management tasks and lack guidelines for communication,
explanations on the cross-analysis between these related roles, knowledge, and quality management. [45].
methodologies. The alignment and flow among the processes need to be
There are similarities of ICT, BDA and Visual Analytics reconstructed for a more effective methodology. In general, the
related methodologies as the lifecycle to produce the ICT research found that most of the approaches have not given more
products as the output (e.g. software, application, analytics - attention to the iterative process except for a few of them. Other
diagnostics, analytics – prediction). It generally started with than that, the processes' flow is more about a single linear path
analyzing the context, design, development, deployment and from project kick-off to deployment. Since no standard can be
testing of the product. When focusing on analytics-prediction as identified from those researches, an ongoing study is needed to

IEEE 978-1-6654-6835-0/22/$31.00 ©2022 IEEE


identify the specific sub-spiral for critical processes. CRISP-DM methodological approaches. Hence, proposing a clear and
can be a proper reference since it offers the loops/iterations for practical methodology is a critical aspect to handle the demand
a particular process, if necessary. Furthermore, the current of Visual Analytics as hands-on usage of big data in the business
methodological approaches' whole cycle is too broad—no and organization.
checkpoint to handle progress mistakes. When the end product
is only shown to the stakeholder at the end of the cycle, the ACKNOWLEDGMENT
Visual Analytics must revoke again if there are any mistakes. The authors gratefully acknowledge the Ministry of
The research found that the combination of CRISP-DM with Education (MOE) and Universiti Teknologi Malaysia (UTM)
other agile project management is likely more effective to cope for the financial support given to carry out this study. This work
with their shortcomings. However, assessing the proposed is funded by UTM (RUG: Q.K130000.2656.17J23).
methodologies in the actual business situation has to be
prioritized. Finally, as in most technology project, poor REFERENCES
coordination and unclear objectives were described as key [1] Y. Lu, R. Garcia, B. Hansen, M. Gleicher, and R. Maciejewski, “The
problems because the level of alignment between technology State-of-the-Art in Predictive Visual Analytics,” Comput. Graph.
and business is always uncertain (Marr, 2016; Philbin, 2015). Forum, vol. 36, no. 3, pp. 539–562, 2017.
Hence, the proposed methodology should have clarity on the [2] M. Attaran and S. Attaran, “Opportunities and challenges of
processes, their flow and deliverables. It can promote coherent implementing predictive analytics for competitive advantage,” Int. J.
Bus. Intell. Res., vol. 9, no. 2, pp. 1–26, 2018.
and transparency of expected outcomes and changes due to the
[3] A. Malik, R. Maciejewski, S. Towers, S. McCullough, and D. S.
actual condition of the BDA project’s progress. All these Ebert, “Proactive spatiotemporal resource allocation and predictive
findings should be considered when proposing Visual Analytics visual analytics for community policing and law enforcement,” IEEE
methodology in the big data lifecycle. Trans. Vis. Comput. Graph., vol. 20, no. 12, pp. 1863–1872, 2014.
[4] J. Yue, A. Raja, D. Liu, X. Wang, and W. Ribarsky, “A blackboard-
VI. CONCLUSIONS based approach towards predictive analytics,” AAAI Spring Symp. -
Tech. Rep., vol. SS-09-09, pp. 154–161, 2009.
To date, there is no comprehensive methodology to guide the
[5] J. J. Caban and D. Gotz, “Visual analytics in healthcare -
Visual Analytics in the big data project lifecycle. Most opportunities and research challenges,” J. Am. Med. Informatics
developers and organizations rely on general ICT-related Assoc., vol. 22, no. 2, pp. 260–262, 2015.
methodology (e.g. SDLC, Agile, DevOps) or Data Science- [6] J. Lu et al., “Recent progress and trends in predictive visual
related methodology (e.g. CRISP-DM, SEMMA, KDD) or analytics,” Front. Comput. Sci., vol. 11, no. 2, pp. 192–207, 2017.
Visual Analytics processes that are yet insufficient to handle it. [7] S. Mukherjee, “Predictive Analytics and Predictive Modeling in
In some instances, the BDA project developer creates its ad-hoc Healthcare,” SSRN Electron. J., no. June, 2019.
methodology to manage the analytics situation's work dynamics. [8] D. Keim et al., “Visual Analytics : Definition , Process and
For practical reasons, the proposed methodology needs to Challenges To cite this version : Visual Analytics : Definition ,
concentrate not just on the systematic analysis of data or Process , and Challenges,” 2008.
analytics but also on people, process and technologies [40]. [9] S. Amri, H. Ltifi, and M. Ben Ayed, “A Predictive Visual Analytics
Evaluation Approach Based on Adaptive Neuro-Fuzzy Inference
Therefore, this research has considered that proposed System,” Comput. J., vol. 62, no. 7, pp. 977–1000, 2019.
methodology must be composed and fulfil the conditions of; (i) [10] N. Andrienko et al., “Viewing Visual Analytics as Model Building,”
in-width overall parts of reality, computer and human. Extra Comput. Graph. Forum, vol. 37, no. 6, pp. 275–299, 2018.
consideration should be put on the elicitation of business [11] K. Sedig and P. Parsons, “Interaction Design for Complex Cognitive
requirements from the actual use of context, enabling users to Activities with Visual Representations: A Pattern-Based Approach,”
AIS Trans. Human-Computer Interact., vol. 5, no. 2, pp. 84–133,
interact with the analytics outcomes, support both prediction and 2013.
visual explanation, and use the predictive analysis method to [12] D. Keim and J. Thomas, “Scope and Challenges of Visual Analytics,”
guide visualization as the Visual Analytics output. (ii) in-depth IEEE Vis. Conf. 2007, vol. 4404, no. 4404, pp. 1–58, 2008.
and details for each processes, especially for analytics engine [13] Gartner, “Gartner Data & Analytics Summit (November 2017)
and statistics in making predictions. The processes should be Frankfurt, Germany,” 2017. [Online]. Available:
updated in terms of techniques, tools, and analytics approaches https://gartner/eu/datade.
compatible with current analytics demand. Finally, (iii) the [14] J. Scholtz, C. Plaisant, M. Whiting, and G. Grinstein, “Evaluation of
alignment and flow among the processes suggest specific sub- visual analytics environments: The road to the Visual Analytics
Science and Technology challenge evaluation methodology,” Inf.
spiral needs in the critical processes as a checkpoint to handle Vis., vol. 13, no. 4, pp. 326–335, 2014.
progress mistakes. Hence, a precise alignment between business [15] S. Yaacob, N. M. Ali, H. N. Liang, N. Z. A. Rahim, N. Maarop, and
and technology can be improvised when the methodology R. Ali, “Giving the boss the big picture: Demonstrating convergence
emphasized the coordination when deliverables are flowing visualization design principles using business intelligence and
between the three parts of reality, computer and human. The analytical tools,” J. Fundam. Appl. Sci., pp. 1–8, 2018.
methodology should consider the project management tasks [16] A. O. Elfaki and Z. Bassfar, “Construction of a Software
such as quality, communication, roles, and knowledge for more Development Model for Managing Final Year Projects in
Information Technology Programmes,” Int. J. Emerg. Technol.
efficient coordination to a more significant extent. Learn., vol. 15, no. 21, pp. 4–23, 2020.
It is hoped that this methodological review can shed some [17] Z. Ibrahim, M. D. G. M. D. Johar, and N. R. A. Rahman, “An
light on the approaches and ways in implementing the analytics efficiency and effectively of methodology in software development
workflow based on Malaysia,” Int. J. Eng. Technol., vol. 7, no. 4, pp.
in the big data lifecycle. There are insufficiencies and vague 526–536, 2018.
condition in handling the Visual Analytics when using current
[18] S. Malik, “Software Testing: Essential Phase of SDLC and a

IEEE 978-1-6654-6835-0/22/$31.00 ©2022 IEEE


Comparative Study of Software Testing Techniques.,” Int. J. Syst. [39] B. Carlo, B. Daniele, C. Federico, and G. Simone, “a D Ata Q Uality
Softw. Eng., vol. 5, no. 2, pp. 38–45, 2017. M Ethodology for,” J. Database Manag., vol. 3, no. 1, pp. 27–43,
[19] F. C. Aguboshim and G. S. Miles, “Well-Defined Interface 2011.
Development Process: An Important Interface Design Strategy to [40] J. S. Saltz and I. Shamshurin, “Big data team process methodologies:
Create Easy-to-use Banking ATM System Interfaces in Nigeria,” Int. A literature review and the identification of key factors for a project’s
J. Eng. Sci. Comput., vol. 8, no. 15, pp. 19514–19525, 2018. success,” Proc. - 2016 IEEE Int. Conf. Big Data, Big Data 2016, pp.
[20] D. K. Ahmad, M. F. Ahmad, M. N. Ahmad, and A. S. Ahmad, “An 2872–2879, 2016.
Experiment of Animation Development in Hypertext Preprocessor [41] J. Saltz, I. Shamshurin, and C. Connors, “Predicting data science
(PHP) and Hypertext Markup Language (HTML),” Int. J. Sci. Res. sociotechnical execution challenges by categorizing data science
Comput. Sci. Eng., vol. 8, no. 2, pp. 45–51, 2020. projects,” J. Assoc. Inf. Sci. Technol., 2017.
[21] H. Khalajzadeh, M. Abdelrazek, J. Grundy, J. Hosking, and Q. He, [42] J. Saltz and A. Suthrland, “SKI: An Agile Framework for Data
“A Survey of Current End-User Data Analytics Tool Support,” Proc. Science,” in Proceedings - 2019 IEEE International Conference on
- 2018 IEEE Int. Congr. Big Data, BigData Congr. 2018 - Part 2018 Big Data, Big Data 2019, 2019.
IEEE World Congr. Serv., pp. 41–48, 2018. [43] Ó. Marbán, G. Mariscal, E. Menasalvas, and J. Segovia, “An
[22] L. BANICA, M. RADULESCU, D. ROSCA, and A. HAGIU, “Is engineering approach to data mining projects,” Lect. Notes Comput.
DevOps another Project Management Methodology?,” Inform. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes
Econ., vol. 21, no. 3/2017, pp. 39–51, 2017. Bioinformatics), vol. 4881 LNCS, pp. 578–588, 2007.
[23] A. Hemon, B. Lyonnet, F. Rowe, and B. Fitzgerald, “Conceptualizing [44] C. Ponsard, A. Majchrowski, S. Mouton, and M. Touzani, “Process
the transition from agile to DevOps: A maturity model for a smarter guidance for the successful deployment of a big data project: Lessons
is function,” in IFIP Advances in Information and Communication learned from industrial cases,” IoTBDS 2017 - Proc. 2nd Int. Conf.
Technology, 2019. Internet Things, Big Data Secur., no. IoTBDS, pp. 350–355, 2017.
[24] P. Perera, R. Silva, and I. Perera, “Improve software quality through [45] S. Michalczyk and S. Scheu, “Designing an Analytical Information
practicing DevOps,” 17th Int. Conf. Adv. ICT Emerg. Reg. ICTer Systems Engineering Method,” ECIS 2020 Res. Pap., no. June, 2020.
2017 - Proc., vol. 2018-Janua, no. September 2017, pp. 13–18, 2017. [46] A. Napoli, “A Smooth Introduction to Symbolic Methods for
[25] B. Meyers, K. Gadeyne, B. Oakes, M. Bernaerts, H. Vangheluwe, and Knowledge Discovery,” in Handbook of Categorization in Cognitive
J. Denil, “A model-driven engineering framework to support the Science, 2005.
functional safety process,” Proc. - 2019 ACM/IEEE 22nd Int. Conf. [47] M. M. Shinde and R. R. Issac, “International Research Journal of
Model Driven Eng. Lang. Syst. Companion, Model. 2019, pp. 619– Modernization in Engineering Technology and Science International
623, 2019. Research Journal of Modernization in Engineering Technology and
[26] O. Krancher, P. Luther, and M. Jost, “Key Affordances of Platform- Science,” no. 05, pp. 179–181, 2020.
as-a-Service: Self-Organization and Continuous Feedback,” J. [48] H. Nagashima and Y. Kato, “APREP-DM: A Framework for
Manag. Inf. Syst., 2018. Automating the Pre-Processing of a Sensor Data Analysis based on
[27] G. Kim, J. Humble, P. Debois, and J. Willis, “The DevOps CRISP-DM,” 2019 IEEE Int. Conf. Pervasive Comput. Commun.
Handbook : How to Create World-Class Agility, Reliability, and Work. PerCom Work. 2019, pp. 555–560, 2019.
Security in Technology Organizations,” The DevOps handbook. [49] H. J. G. Palacios, R. A. J. Toledo, G. A. H. Pantoja, and Á. A. M.
2016. Navarro, “A comparative between CRISP-DM and SEMMA through
[28] A. Mateen, M. Tabassum, and A. Rehan, “Combining Agile with the construction of a MODIS repository for studies of land use and
Traditional V Model for Enhancement of Maturity in Software cover change,” Adv. Sci. Technol. Eng. Syst., vol. 2, no. 3, pp. 598–
Development,” vol. 7, no. 2, pp. 280–296, 2017. 604, 2017.
[29] B. Kotaiah and M. A. Khalil, “Approaches for development of [50] A. Azevedo and M. F. Santos, “KDD, semma and CRISP-DM: A
Software Projects: Agile methodology,” Int. J. Adv. Res. Comput. parallel overview,” MCCSIS’08 - IADIS Multi Conf. Comput. Sci. Inf.
Sci., vol. 8, no. 1, p. 6, 2017. Syst. Proc. Informatics 2008 Data Min. 2008, no. June, pp. 182–185,
[30] J. Highsmith, C. Consortium, and A. Cockburn, “Development : The 2008.
Business of Innovation.” [51] M. K. Obenshain, “Application of Data Mining Techniques to
[31] M. M. Jha, R. M. F. Vilardell, and J. Narayan, “Scaling agile scrum Healthcare Data,” Infect. Control Hosp. Epidemiol., vol. 25, no. 8, pp.
software development: Providing agility and quality to platform 690–695, 2004.
development by reducing time to market,” Proc. - 11th IEEE Int. [52] B. Ahmed, T. Dannhauser, and N. Philip, “A Lean Design Thinking
Conf. Glob. Softw. Eng. ICGSE 2016, pp. 84–88, 2016. Methodology (LDTM) for Machine Learning and Modern Data
[32] J. Segovia, “DEFINITION AND INSTANTIATION OF AN Projects,” 2018 10th Comput. Sci. Electron. Eng. Conf. CEEC 2018
INTEGRATED DATA MINING PROCESS 1 Project objectives,” - Proc., pp. 11–14, 2019.
Jornadas Seguim. Proy., 2007. [53] S. Angée, S. I. Lozano-Argel, E. N. Montoya-Munera, J. D. Ospina-
[33] S. Huber, H. Wiemer, D. Schneider, and S. Ihlenfeldt, “DMME: Data Arango, and M. S. Tabares-Betancur, “Towards an improved ASUM-
mining methodology for engineering applications - A holistic DM process methodology for cross-disciplinary multi-organization
extension to the CRISP-DM model,” Procedia CIRP, vol. 79, pp. big data & analytics projects,” Commun. Comput. Inf. Sci., vol. 877,
403–408, 2019. no. July, pp. 613–624, 2018.
[34] G. Piatetsky, “CRISP-DM , still the top methodology for analytics , [54] E. Kristoffersen, O. O. Aremu, F. Blomsma, P. Mikalef, and J. Li,
data mining , or data science projects,” KDnuggets.com, 2015. . “Exploring the Relationship Between Data Science and Circular
Economy: An Enhanced CRISP-DM Process Model,” Lect. Notes
[35] M. F. Roldan and J. Debnath, “A Methodology Based on Business
Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes
Intelligence for the Development of Predictive Applications in Self-
Bioinformatics), vol. 11701 LNCS, pp. 177–189, 2019.
Adapting Environments,” no. Iccsee, pp. 155–159, 2013.
[55] F. Schafer, C. Zeiselmair, J. Becker, and H. Otten, “Synthesizing
[36] F. Martinez-Plumed et al., “CRISP-DM Twenty Years Later: From CRISP-DM and Quality Management: A Data Mining Approach for
Data Mining Processes to Data Science Trajectories,” IEEE Trans. Production Processes,” 2018 IEEE Int. Conf. Technol. Manag. Oper.
Knowl. Data Eng., vol. 4347, no. c, pp. 1–1, 2019. Decis. ICTMOD 2018, pp. 190–195, 2018.
[37] C. Model, H. Wiemer, and L. Drowatzky, “Applied Aciences Data [56] F. Foroughi and P. Luksch, “Data science methodology for
Mining Methodology for Engineering Applications ( DMME )— A cybersecurity projects,” arXiv, no. March, 2018.
Holistic Extension,” Appl. Sci., 2019.
[57] E. Olarte, M. Panizzi, and R. Bertone, “Market Segmentation Using
[38] G. Piatetsky, “CRISP-DM, still the top methodology for analytics, Data Mining Techniques in Social Networks,” Commun. Comput. Inf.
data mining, or data science projects,” KDD News, 2014. Sci., vol. 995, no. April 2020, pp. 221–231, 2019.

IEEE 978-1-6654-6835-0/22/$31.00 ©2022 IEEE


[58] F. Martínez-Plumed et al., “Context aware standard process for data [70] M. F. Asli, M. Hamzah, A. A. A. Ibrahim, and A. J. Embug, “Visual
mining,” arXiv, 2017. analytics: Design study for exploratory analytics on peer profiles,
[59] O. Müller, M. Fay, and J. vom Brocke, “The Effect of Big Data and activity and learning performance for MOOC forum activity
Analytics on Firm Performance: An Econometric Analysis assessment,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 9, no. 1, pp. 66–
Considering Industry Characteristics,” J. Manag. Inf. Syst., vol. 35, 72, 2019.
no. 2, pp. 488–509, 2018. [71] W. Cui, “Visual Analytics: A Comprehensive Overview,” IEEE
[60] S. Akter and S. F. Wamba, “Big data analytics in E-commerce: a Access, vol. 7, pp. 81555–81573, 2019.
systematic review and agenda for future research,” Electron. Mark., [72] Q. V. Nguyen et al., “Visual analytics of complex genomics data to
vol. 26, no. 2, pp. 173–194, 2016. guide effective treatment decisions,” J. Imaging, vol. 2, no. 4, 2016.
[61] A. Popovic, R. Hackney, and M. Castelli, “Information Systems [73] R. V. McCarthy, M. M. McCarthy, W. Ceccucci, and L. Halawi,
Frontiers – ( accepted Oct , 2016 ) The Impact of Big Data Analytics “Applying Predictive Analytics,” Appl. Predict. Anal., pp. 1–25,
on Firms High Value Business Performance,” no. October, 2016. 2019.
[62] D. Schuff, K. Corral, R. D. St. Louis, and G. Schymik, “Enabling [74] O. AlFarraj, A. AlZubi, and A. Tolba, “Optimized feature selection
self-service BI: A methodology and a case study for a model algorithm based on fireflies with gravitational ant colony algorithm
management warehouse,” Inf. Syst. Front., 2018. for big data predictive analytics,” Neural Comput. Appl., vol. 31, no.
[63] S. Tripathi, D. Muhr, B. Manuel, F. Emmert-Streib, H. Jodlbauer, and 5, pp. 1391–1403, 2019.
M. Dehmer, “Ensuring the robustness and reliability of data-driven [75] R. Kumar, “Predictive Analytics,” in Machine Learning and
knowledge discovery models in production and manufacturing,” Cognition in Enterprises: Business Intelligence Transformed,
arXiv. 2020. Berkeley, CA: Apress, 2017, pp. 75–97.
[64] E. Shoikova, R. Nikolov, E. Kovatcheva, B. Jekov, and L. Gotsev, [76] S. M. Idrees, M. A. Alam, P. Agarwal, and L. Ansari, “Effective
“Big Data Framework overview,” Electrotech. Electron., vol. 55, pp. Predictive Analytics and Modeling Based on Historical Data,” in
22–34, 2020. Advances in Computing and Data Sciences, 2019, pp. 552–564.
[65] A. Rotondo and F. Quilligan, “Evolution Paths for Knowledge [77] F. Rabhi, M. Bandara, A. Namvar, and O. Demirors, “Big Data
Discovery and Data Mining Process Models,” SN Comput. Sci., vol. Analytics Has Little to Do with Analytics,” Lect. Notes Bus. Inf.
1, no. 2, pp. 1–19, 2020. Process., vol. 234, pp. 3–17, 2018.
[66] M. J. Niland, “Towards the Influence of the Organisation on Big data [78] P. Nair, J. Krishna, and D. K. Srivastava, “Visual Analytics Toward
Analytics,” GIBS Res. Proj., no. November, p. 173, 2017. Prediction of Employee Erosion Through Data Science Tools,” in
[67] I. Haider, M. A. Haider, and A. Saeed, “Big Data in Internet of Advances in Intelligent Systems and Computing, 2020.
Things: Architecture and Open Research Challenges,” 2021. [79] F. Stoffel, H. Post, M. Stewen, and D. A. Keim, “polimaps :
[68] E. Hofmann and E. Rutschmann, “Big data analytics and demand Supporting Predictive Policing with Visual Analytics,”
forecasting in supply chains: a conceptual analysis,” Int. J. Logist. EuroVisWorkshop Vis. Anal., pp. 1–5, 2018.
Manag., vol. 29, no. 2, pp. 739–766, 2018. [80] L. Sara, O. Younes, R. Amine, and A. Mohamed, “Using
[69] A. Alomar, N. Alrashed, I. Alturaiki, and H. Altwaijry, “How visual visualization and predictive analysis to predict train delays,” Period.
analytics unlock insights into traffic incidents in urban areas,” in Eng. Nat. Sci., vol. 6, no. 2, pp. 389–393, 2018.
Lecture Notes in Computer Science (including subseries Lecture
Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),
2017.

IEEE 978-1-6654-6835-0/22/$31.00 ©2022 IEEE

View publication stats

You might also like